Online Robust and Adaptive Learning from Data Streams

165 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shintaro Fukushima

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Shintaro Fukushima - Atsushi Nitanda - Kenji Yamanishi

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In online learning from non-stationary data streams, it is both necessary to learn robustly to outliers and to adapt to changes of underlying data generating mechanism quickly. In this paper, we refer to the former nature of online learning algorithms as robustness and the latter as adaptivity. There is an obvious tradeoff between them. It is a fundamental issue to quantify and evaluate the tradeoff because it provides important information on the data generating mechanism. However, no previous work has considered the tradeoff quantitatively. We propose a novel algorithm called the Stochastic approximation-based Robustness-Adaptivity algorithm (SRA) to evaluate the tradeoff. The key idea of SRA is to update parameters of distribution or sufficient statistics with the biased stochastic approximation scheme, while dropping data points with large values of the stochastic update. We address the relation between two parameters, one of which is the step size of the stochastic approximation, and the other is the threshold parameter of the norm of the stochastic update. The former controls the adaptivity and the latter does the robustness. We give a theoretical analysis for the non-asymptotic convergence of SRA in the presence of outliers, which depends on both the step size and the threshold parameter. Since SRA is formulated on the majorization-minimization principle, it is a general algorithm including many algorithms, such as the online EM algorithm and stochastic gradient descent. Empirical experiments for both synthetic and real datasets demonstrated that SRA was superior to previous methods.

قيم البحث

102 - Christopher Strohmeier , Hanbaek Lyu , Deanna Needell 2020

Online Tensor Factorization (OTF) is a fundamental tool in learning low-dimensional interpretable features from streaming multi-modal data. While various algorithmic and theoretical aspects of OTF have been investigated recently, general convergence guarantee to stationary points of the objective function without any incoherence or sparsity assumptions is still lacking even for the i.i.d. case. In this work, we introduce a novel OTF algorithm that learns a CANDECOMP/PARAFAC (CP) basis from a given stream of tensor-valued data under general constraints, including nonnegativity constraints that induce interpretability of learned CP basis. We prove that our algorithm converges almost surely to the set of stationary points of the objective function under the hypothesis that the sequence of data tensors is generated by some underlying Markov chain. Our setting covers the classical i.i.d. case as well as a wide range of application contexts including data streams generated by independent or MCMC sampling. Our result closes a gap between OTF and Online Matrix Factorization in global convergence analysis. Experimentally, we show that our OTF algorithm converges much faster than standard algorithms for nonnegative tensor factorization tasks on both synthetic and real-world data. Also, we demonstrate the utility of our algorithm on a diverse set of examples from image, video, and time-series data, illustrating how one may learn qualitatively different CP-dictionaries from the same tensor data by exploiting the tensor structure in multiple ways.

التعلم الالي التعلم الآلي التحسين والتحكم

Rank aggregation for non-stationary data streams

68 - Ekhine Irurozki , Jesus Lobo , Aritz Perez 2019

We consider the problem of learning over non-stationary ranking streams. The rankings can be interpreted as the preferences of a population and the non-stationarity means that the distribution of preferences changes over time. Our goal is to learn, i n an online manner, the current distribution of rankings. The bottleneck of this process is a rank aggregation problem. We propose a generalization of the Borda algorithm for non-stationary ranking streams. Moreover, we give bounds on the minimum number of samples required to output the ground truth with high probability. Besides, we show how the optimal parameters are set. Then, we generalize the whole family of weighted voting rules (the family to which Borda belongs) to situations in which some rankings are more textit{reliable} than others and show that this generalization can solve the problem of rank aggregation over non-stationary data streams.

التعلم الالي التعلم الآلي

Online Graph Topology Learning from Matrix-valued Time Series

375 - Yiye Jiang , Jeremie Bigot , Sofian Maabout 2021

This paper is concerned with the statistical analysis of matrix-valued time series. These are data collected over a network of sensors (typically a set of spatial locations), recording, over time, observations of multiple measurements. From such data , we propose to learn, in an online fashion, a graph that captures two aspects of dependency: one describing the sparse spatial relationship between sensors, and the other characterizing the measurement relationship. To this purpose, we introduce a novel multivariate autoregressive model to infer the graph topology encoded in the coefficient matrix which captures the sparse Granger causality dependency structure present in such matrix-valued time series. We decompose the graph by imposing a Kronecker sum structure on the coefficient matrix. We develop two online approaches to learn the graph in a recursive way. The first one uses Wald test for the projected OLS estimation, where we derive the asymptotic distribution for the estimator. For the second one, we formalize a Lasso-type optimization problem. We rely on homotopy algorithms to derive updating rules for estimating the coefficient matrix. Furthermore, we provide an adaptive tuning procedure for the regularization parameter. Numerical experiments using both synthetic and real data, are performed to support the effectiveness of the proposed learning approaches.

التعلم الالي التعلم الآلي

Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits

158 - Xialiang Dou , Tengyuan Liang 2019

Consider the problem: given the data pair $(mathbf{x}, mathbf{y})$ drawn from a population with $f_*(x) = mathbf{E}[mathbf{y} | mathbf{x} = x]$, specify a neural network model and run gradient flow on the weights over time until reaching any stationa rity. How does $f_t$, the function computed by the neural network at time $t$, relate to $f_*$, in terms of approximation and representation? What are the provable benefits of the adaptive representation by neural networks compared to the pre-specified fixed basis representation in the classical nonparametric literature? We answer the above questions via a dynamic reproducing kernel Hilbert space (RKHS) approach indexed by the training process of neural networks. Firstly, we show that when reaching any local stationarity, gradient flow learns an adaptive RKHS representation and performs the global least-squares projection onto the adaptive RKHS, simultaneously. Secondly, we prove that as the RKHS is data-adaptive and task-specific, the residual for $f_*$ lies in a subspace that is potentially much smaller than the orthogonal complement of the RKHS. The result formalizes the representation and approximation benefits of neural networks. Lastly, we show that the neural network function computed by gradient flow converges to the kernel ridgeless regression with an adaptive kernel, in the limit of vanishing regularization. The adaptive kernel viewpoint provides new angles of studying the approximation, representation, generalization, and optimization advantages of neural networks.

التعلم الالي التعلم الآلي نظرية الإحصاء

Learning from Similarity-Confidence Data

113 - Yuzhou Cao , Lei Feng , Yitian Xu 2021

Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data. In this paper, we investigate a novel weakly supervised learning problem of learning from similarity-con fidence (Sconf) data, where we aim to learn an effective binary classifier from only unlabeled data pairs equipped with confidence that illustrates their degree of similarity (two examples are similar if they belong to the same class). To solve this problem, we propose an unbiased estimator of the classification risk that can be calculated from only Sconf data and show that the estimation error bound achieves the optimal convergence rate. To alleviate potential overfitting when flexible models are used, we further employ a risk correction scheme on the proposed risk estimator. Experimental results demonstrate the effectiveness of the proposed methods.

التعلم الالي التعلم الآلي