ترغب بنشر مسار تعليمي؟ اضغط هنا

Learning-Based Low-Rank Approximations

315   0   0.0 ( 0 )
 نشر من قبل Ali Vakilian
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce a learning-based algorithm for the low-rank decomposition problem: given an $n times d$ matrix $A$, and a parameter $k$, compute a rank-$k$ matrix $A$ that minimizes the approximation loss $|A-A|_F$. The algorithm uses a training set of input matrices in order to optimize its performance. Specifically, some of the most efficient approximate algorithms for computing low-rank approximations proceed by computing a projection $SA$, where $S$ is a sparse random $m times n$ sketching matrix, and then performing the singular value decomposition of $SA$. We show how to replace the random matrix $S$ with a learned matrix of the same sparsity to reduce the error. Our experiments show that, for multiple types of data sets, a learned sketch matrix can substantially reduce the approximation loss compared to a random matrix $S$, sometimes by one order of magnitude. We also study mixed matrices where only some of the rows are trained and the remaining ones are random, and show that matrices still offer improved performance while retaining worst-case guarantees. Finally, to understand the theoretical aspects of our approach, we study the special case of $m=1$. In particular, we give an approximation algorithm for minimizing the empirical loss, with approximation factor depending on the stable rank of matrices in the training set. We also show generalization bounds for the sketch matrix learning problem.



قيم البحث

اقرأ أيضاً

The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task. However, existing provable reinforcement learning algorithms with linear function approximation often assume the feature representation is known and fixed. In order to understand how representation learning can improve the efficiency of RL, we study representation learning for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose a provably efficient algorithm called ReLEX that can simultaneously learn the representation and perform exploration. We show that ReLEX always performs no worse than a state-of-the-art algorithm without representation learning, and will be strictly better in terms of sample efficiency if the function class of representations enjoys a certain mild coverage property over the whole state-action space.
We consider adaptive approximations of the parameter-to-solution map for elliptic operator equations depending on a large or infinite number of parameters, comparing approximation strategies of different degrees of nonlinearity: sparse polynomial exp ansions, general low-rank approximations separating spatial and parametric variables, and hierarchical tensor decompositions separating all variables. We describe corresponding adaptive algorithms based on a common generic template and show their near-optimality with respect to natural approximability assumptions for each type of approximation. A central ingredient in the resulting bounds for the total computational complexity are new operator compression results for the case of infinitely many parameters. We conclude with a comparison of the complexity estimates based on the actual approximability properties of classes of parametric model problems, which shows that the computational costs of optimized low-rank expansions can be significantly lower or higher than those of sparse polynomial expansions, depending on the particular type of parametric problem.
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the continual learning experience is finished. The prior art in CL uses episodic memory, parameter r egularization or extensible network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space. We believe this invariably leads to interference among different tasks. We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Further, to keep the gradients of different tasks coming from these subspaces orthogonal to each other, we learn isometric mappings by posing network training as an optimization problem over the Stiefel manifold. To the best of our understanding, we report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning. The code is made publicly available.
365 - Da Yu , Huishuai Zhang , Wei Chen 2021
We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dep endence. Specifically, we reparametrize each weight matrix with two emph{gradient-carrier} matrices of small dimension and a emph{residual weight} matrix. We argue that such reparametrization keeps the forward/backward process unchanged while enabling us to compute the projected gradient without computing the gradient itself. To learn with differential privacy, we design emph{reparametrized gradient perturbation (RGP)} that perturbs the gradients on gradient-carrier matrices and reconstructs an update for the original weight from the noisy gradients. Importantly, we use historical updates to find the gradient-carrier matrices, whose optimality is rigorously justified under linear regression and empirically verified with deep learning tasks. RGP significantly reduces the memory cost and improves the utility. For example, we are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9%$ on four downstream tasks with $epsilon=8$, which is within $5%$ loss compared to the non-private baseline but enjoys much lower privacy leakage risk.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا