Understanding Deflation Process in Over-parametrized Tensor Decomposition

90 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiang Wang

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Rong Ge - Yunwei Ren - Xiang Wang

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems. Empirically, such training process often first fits larger components and then discovers smaller components, which is similar to a tensor deflation process that is commonly used in tensor decomposition algorithms. We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components. Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy low-rank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of over-parametrized models for low-rank tensors.

قيم البحث

121 - Xiang Wang , Chenwei Wu , Jason D. Lee 2020

Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{otimes l}$ of rank $r$ (where $rll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}log d)$. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.

التعلم الالي التعلم الآلي

Convex Tensor Decomposition via Structured Schatten Norm Regularization

581 - Ryota Tomioka , Taiji Suzuki 2013

We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms (overlapped and latent) for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured s parsity. Based on the properties of the structured Schatten norms, we mathematically analyze the performance of latent approach for tensor decomposition, which was empirically found to perform better than the overlapped approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific mode, this approach performs as good as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structures Schatten norms, establish the consistency, and discuss the identifiability of this approach. We confirm through numerical simulations that our theoretical prediction can precisely predict the scaling behavior of the mean squared error.

التعلم الالي التعلم الآلي التحليل العددي

Learning Good State and Action Representations via Tensor Decomposition

147 - Chengzhuo Ni , Anru Zhang , Yaqi Duan 2021

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representat ions from empirical trajectories. The method exploits the MDPs tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstraction. We provide sharp statistical error bounds for tensor concentration and the preservation of diffusion distance after embedding.

التعلم الالي التعلم الآلي

Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition

94 - Ben Adlam , Jeffrey Pennington 2020

Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, with simpler models exhibiting high bias and more complex models exhibiting high variance of the predictive function. However, such a simple trade-off does not adequately describe deep learning models that simultaneously attain low bias and variance in the heavily overparameterized regime. A primary obstacle in explaining this behavior is that deep learning algorithms typically involve multiple sources of randomness whose individual contributions are not visible in the total variance. To enable fine-grained analysis, we describe an interpretable, symmetric decomposition of the variance into terms associated with the randomness from sampling, initialization, and the labels. Moreover, we compute the high-dimensional asymptotic behavior of this decomposition for random feature kernel regression, and analyze the strikingly rich phenomenology that arises. We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior and can diverge at the interpolation boundary, even in the absence of label noise. The divergence is caused by the emph{interaction} between sampling and initialization and can therefore be eliminated by marginalizing over samples (i.e. bagging) emph{or} over the initial parameters (i.e. ensemble learning).

التعلم الالي التعلم الآلي

A general method for regularizing tensor decomposition methods via pseudo-data

170 - Omer Gottesman , Weiwei Pan , Finale Doshi-Velez 2019

Tensor decomposition methods allow us to learn the parameters of latent variable models through decomposition of low-order moments of data. A significant limitation of these algorithms is that there exists no general method to regularize them, and in the past regularization has mostly been performed using bespoke modifications to the algorithms, tailored for the particular form of the desired regularizer. We present a general method of regularizing tensor decomposition methods which can be used for any likelihood model that is learnable using tensor decomposition methods and any differentiable regularization function by supplementing the training data with pseudo-data. The pseudo-data is optimized to balance two terms: being as close as possible to the true data and enforcing the desired regularization. On synthetic, semi-synthetic and real data, we demonstrate that our method can improve inference accuracy and regularize for a broad range of goals including transfer learning, sparsity, interpretability, and orthogonality of the learned parameters.

التعلم الالي التعلم الآلي