A general method for regularizing tensor decomposition methods via pseudo-data

171 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Omer Gottesman

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Omer Gottesman - Weiwei Pan - Finale Doshi-Velez

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Tensor decomposition methods allow us to learn the parameters of latent variable models through decomposition of low-order moments of data. A significant limitation of these algorithms is that there exists no general method to regularize them, and in the past regularization has mostly been performed using bespoke modifications to the algorithms, tailored for the particular form of the desired regularizer. We present a general method of regularizing tensor decomposition methods which can be used for any likelihood model that is learnable using tensor decomposition methods and any differentiable regularization function by supplementing the training data with pseudo-data. The pseudo-data is optimized to balance two terms: being as close as possible to the true data and enforcing the desired regularization. On synthetic, semi-synthetic and real data, we demonstrate that our method can improve inference accuracy and regularize for a broad range of goals including transfer learning, sparsity, interpretability, and orthogonality of the learned parameters.

قيم البحث

888 - Ryota Tomioka , Taiji Suzuki 2013

We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms (overlapped and latent) for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured s parsity. Based on the properties of the structured Schatten norms, we mathematically analyze the performance of latent approach for tensor decomposition, which was empirically found to perform better than the overlapped approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific mode, this approach performs as good as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structures Schatten norms, establish the consistency, and discuss the identifiability of this approach. We confirm through numerical simulations that our theoretical prediction can precisely predict the scaling behavior of the mean squared error.

التعلم الالي التعلم الآلي التحليل العددي

Learning Good State and Action Representations via Tensor Decomposition

147 - Chengzhuo Ni , Anru Zhang , Yaqi Duan 2021

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representat ions from empirical trajectories. The method exploits the MDPs tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstraction. We provide sharp statistical error bounds for tensor concentration and the preservation of diffusion distance after embedding.

التعلم الالي التعلم الآلي

Tensor Methods for Nonlinear Matrix Completion

128 - Greg Ongie , Daniel Pimentel-Alarcon , Laura Balzano 2018

In the low-rank matrix completion (LRMC) problem, the low-rank assumption means that the columns (or rows) of the matrix to be completed are points on a low-dimensional linear algebraic variety. This paper extends this thinking to cases where the col umns are points on a low-dimensional nonlinear algebraic variety, a problem we call Low Algebraic Dimension Matrix Completion (LADMC). Matrices whose columns belong to a union of subspaces are an important special case. We propose a LADMC algorithm that leverages existing LRMC methods on a tensorized representation of the data. For example, a second-order tensorized representation is formed by taking the Kronecker product of each column with itself, and we consider higher order tensorizations as well. This approach will succeed in many cases where traditional LRMC is guaranteed to fail because the data are low-rank in the tensorized representation but not in the original representation. We provide a formal mathematical justification for the success of our method. In particular, we give bounds of the rank of these data in the tensorized representation, and we prove sampling requirements to guarantee uniqueness of the solution. We also provide experimental results showing that the new approach outperforms existing state-of-the-art methods for matrix completion under a union of subspaces model.

التعلم الالي التعلم الآلي

Beyond Lazy Training for Over-parameterized Tensor Decomposition

121 - Xiang Wang , Chenwei Wu , Jason D. Lee 2020

Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{otimes l}$ of rank $r$ (where $rll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}log d)$. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.

التعلم الالي التعلم الآلي

A Kernel Perspective for Regularizing Deep Neural Networks

94 - Alberto Bietti , Gregoire Mialon , Dexiong Chen 2018

We propose a new point of view for regularizing deep neural networks by using the norm of a reproducing kernel Hilbert space (RKHS). Even though this norm cannot be computed, it admits upper and lower approximations leading to various practical strat egies. Specifically, this perspective (i) provides a common umbrella for many existing regularization principles, including spectral norm and gradient penalties, or adversarial training, (ii) leads to new effective regularization penalties, and (iii) suggests hybrid strategies combining lower and upper bounds to get better approximations of the RKHS norm. We experimentally show this approach to be effective when learning on small datasets, or to obtain adversarially robust models.

التعلم الالي التعلم الآلي