Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA

82 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yue Lu

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chuang Wang - Jonathan Mattingly -

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a framework for analyzing the exact dynamics of a class of online learning algorithms in the high-dimensional scaling limit. Our results are applied to two concrete examples: online regularized linear regression and principal component analysis. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measures of the target feature vector and its estimates provided by the algorithms will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE can be efficiently obtained. These solutions lead to precise predictions of the performance of the algorithms, as many practical performance metrics are linear functionals of the joint empirical measures. In addition to characterizing the dynamic performance of online learning algorithms, our asymptotic analysis also provides useful insights. In particular, in the high-dimensional limit, and due to exchangeability, the original coupled dynamics associated with the algorithms will be asymptotically decoupled, with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight for nonconvex optimization problems may prove an interesting line of future research.

قيم البحث

190 - Mohsen Ghassemi , Zahra Shakeri , Anand D. Sarwate 2019

This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separa ble dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for local identifiability of the underlying dictionary are derived in each case. Moreover, computational algorithms are developed to solve the problem of learning mixture of separable dictionaries in both batch and online settings. Numerical experiments are used to show the usefulness of the proposed model and the efficacy of the developed algorithms.

التعلم الآلي نظرية المعلومات معالجة الإشارات

On Tilted Losses in Machine Learning: Theory and Applications

76 - Tian Li , Ahmad Beirami , Maziar Sanjabi 2021

Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use i n machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to a superquantile method. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization

110 - Rad Niazadeh 2021

Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor ap proximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds.

التعلم الآلي بنى وهياكل البيانات والخوارزميات علوم الكمبيوتر ونظرية الألعاب

Connecting Graph Convolutional Networks and Graph-Regularized PCA

209 - Lingxiao Zhao , Leman Akoglu 2020

Graph convolution operator of the GCN model is originally motivated from a localized first-order approximation of spectral graph convolutions. This work stands on a different view; establishing a textit{mathematical connection between graph convoluti on and graph-regularized PCA} (GPCA). Based on this connection, GCN architecture, shaped by stacking graph convolution layers, shares a close relationship with stacking GPCA. We empirically demonstrate that the textit{unsupervised} embeddings by GPCA paired with a 1- or 2-layer MLP achieves similar or even better performance than GCN on semi-supervised node classification tasks across five datasets including Open Graph Benchmark footnote{url{https://ogb.stanford.edu/}}. This suggests that the prowess of GCN is driven by graph based regularization. In addition, we extend GPCA to the (semi-)supervised setting and show that it is equivalent to GPCA on a graph extended with ghost edges between nodes of the same label. Finally, we capitalize on the discovered relationship to design an effective initialization strategy based on stacking GPCA, enabling GCN to converge faster and achieve robust performance at large number of layers. Notably, the proposed initialization is general-purpose and applies to other GNNs.

التعلم الآلي التعلم الالي

An Online Riemannian PCA for Stochastic Canonical Correlation Analysis

111 - Zihang Meng , Rudrasis Chakraborty , Vikas Singh 2021

We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents a n opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifolds. Our developments nicely complement existing methods for this problem which either require $O(d^3)$ time complexity per iteration with $O(frac{1}{sqrt{t}})$ convergence rate (where $d$ is the dimensionality) or only extract the top $1$ component with $O(frac{1}{t})$ convergence rate. In contrast, our algorithm offers a strict improvement for this classical problem: it achieves $O(d^2k)$ runtime complexity per iteration for extracting the top $k$ canonical components with $O(frac{1}{t})$ convergence rate. While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising. We also explore a potential application in training fair models where the label of protected attribute is missing or otherwise unavailable.

التعلم الآلي الذكاء الاصطناعي التعلم الالي