أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Murad Tukan

Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

114 - Murad Tukan , Alaa Maalouf , Matan Weksler 2020

A common technique for compressing a neural network is to compute the $k$-rank $ell_2$ approximation $A_{k,2}$ of the matrix $Ainmathbb{R}^{ntimes d}$ that corresponds to a fully connected layer (or embedding layer). Here, $d$ is the number of the ne urons in the layer, $n$ is the number in the next one, and $A_{k,2}$ can be stored in $O((n+d)k)$ memory instead of $O(nd)$. This $ell_2$-approximation minimizes the sum over every entry to the power of $p=2$ in the matrix $A - A_{k,2}$, among every matrix $A_{k,2}inmathbb{R}^{ntimes d}$ whose rank is $k$. While it can be computed efficiently via SVD, the $ell_2$-approximation is known to be very sensitive to outliers (far-away rows). Hence, machine learning uses e.g. Lasso Regression, $ell_1$-regularization, and $ell_1$-SVM that use the $ell_1$-norm. This paper suggests to replace the $k$-rank $ell_2$ approximation by $ell_p$, for $pin [1,2]$. We then provide practical and provable approximation algorithms to compute it for any $pgeq1$, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage. For example, our approach achieves $28%$ compression of RoBERTas embedding layer with only $0.63%$ additive drop in the accuracy (without fine-tuning) in average over all tasks in GLUE, compared to $11%$ drop using the existing $ell_2$-approximation. Open code is provided for reproducing and extending our results.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Coresets for Near-Convex Functions

122 - Murad Tukan , Alaa Maalouf , Dan Feldman 2020

Coreset is usually a small weighted subset of $n$ input points in $mathbb{R}^d$, that provably approximates their loss function for a given set of queries (models, classifiers, etc.). Coresets become increasingly common in machine learning since exis ting heuristics or inefficient algorithms may be improved by running them possibly many times on the small coreset that can be maintained for streaming distributed data. Coresets can be obtained by sensitivity (importance) sampling, where its size is proportional to the total sum of sensitivities. Unfortunately, computing the sensitivity of each point is problem dependent and may be harder to compute than the original optimization problem at hand. We suggest a generic framework for computing sensitivities (and thus coresets) for wide family of loss functions which we call near-convex functions. This is by suggesting the $f$-SVD factorization that generalizes the SVD factorization of matrices to functions. Example applications include coresets that are either new or significantly improves previous results, such as SVM, Logistic regression, M-estimators, and $ell_z$-regression. Experimental results and open source are also provided.

التعلم الآلي التعلم الالي

Faster PAC Learning and Smaller Coresets via Smoothed Analysis

133 - Alaa Maalouf , Ibrahim Jubran , Murad Tukan 2020

PAC-learning usually aims to compute a small subset ($varepsilon$-sample/net) from $n$ items, that provably approximates a given loss function for every query (model, classifier, hypothesis) from a given set of queries, up to an additive error $varep silonin(0,1)$. Coresets generalize this idea to support multiplicative error $1pmvarepsilon$. Inspired by smoothed analysis, we suggest a natural generalization: approximate the emph{average} (instead of the worst-case) error over the queries, in the hope of getting smaller subsets. The dependency between errors of different queries implies that we may no longer apply the Chernoff-Hoeffding inequality for a fixed query, and then use the VC-dimension or union bound. This paper provides deterministic and randomized algorithms for computing such coresets and $varepsilon$-samples of size independent of $n$, for any finite set of queries and loss function. Example applications include new and improved coreset constructions for e.g. streaming vector summarization [ICML17] and $k$-PCA [NIPS16]. Experimental results with open source code are provided.

التعلم الآلي التعلم الالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد