Statistical Analysis of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

143 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhengling Qi

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhengling Qi - Ying Cui - Yufeng Liu

نظرية الإحصاء التعلم الآلي التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper has two main goals: (a) establish several statistical properties---consistency, asymptotic distributions, and convergence rates---of stationary solutions and values of a class of coupled nonconvex and nonsmoothempirical risk minimization problems, and (b) validate these properties by a noisy amplitude-based phase retrieval problem, the latter being of much topical interest.Derived from available data via sampling, these empirical risk minimization problems are the computational workhorse of a population risk model which involves the minimization of an expected value of a random functional. When these minimization problems are nonconvex, the computation of their globally optimal solutions is elusive. Together with the fact that the expectation operator cannot be evaluated for general probability distributions, it becomes necessary to justify whether the stationary solutions of the empirical problems are practical approximations of the stationary solution of the population problem. When these two features, general distribution and nonconvexity, are coupled with nondifferentiability that often renders the problems non-Clarke regular, the task of the justification becomes challenging. Our work aims to address such a challenge within an algorithm-free setting. The resulting analysis is therefore different from the much of the analysis in the recent literature that is based on local search algorithms. Furthermore, supplementing the classical minimizer-centric analysis, our results offer a first step to close the gap between computational optimization and asymptotic analysis of coupled nonconvex nonsmooth statistical estimation problems, expanding the former with statistical properties of the practically obtained solution and providing the latter with a more practical focus pertaining to computational tractability.

قيم البحث

80 - Jingzhao Zhang , Hongzhou Lin , Stefanie Jegelka 2020

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $epsilon$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(delta, epsilon)$-stationarity, which allows for an $epsilon$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $delta$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(delta, epsilon)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $delta$. Empirically, our methods perform well for training ReLU neural networks.

التحسين والتحكم التعلم الآلي

Solving Nonsmooth Nonconvex Compound Stochastic Programs with Applications to Risk Measure Minimization

68 - Junyi Liu , Ying Cui , Jong-Shi Pang 2020

This paper studies a structured compound stochastic program (SP) involving multiple expectations coupled by nonconvex and nonsmooth functions. We present a successive convex-programming based sampling algorithm and establish its subsequential converg ence. We describe stationarity properties of the limit points for several classes of the compound SP. We further discuss probabilistic stopping rules based on the computable error-bound for the algorithm. We present several risk measure minimization problems that can be formulated as such a compound stochastic program; these include generalized deviation optimization problems based on optimized certainty equivalent and buffered probability of exceedance (bPOE), a distributionally robust bPOE optimization problem, and a multiclass classification problem employing the cost-sensitive error criteria with bPOE risk measure.

التحسين والتحكم

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

133 - Kenji Kawaguchi , Haihao Lu 2019

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss. In contrast, we develop a computationally efficient method to construct a gradient estimator that is purposely biased toward those observations with higher current losses. On the theory side, we show that the proposed method minimizes a new ordered modification of the empirical average loss, and is guaranteed to converge at a sublinear rate to a global optimum for convex loss and to a critical point for weakly convex (non-convex) loss. Furthermore, we prove a new generalization bound for the proposed algorithm. On the empirical side, the numerical experiments show that our proposed method consistently improves the test errors compared with the standard mini-batch SGD in various models including SVM, logistic regression, and deep learning problems.

التعلم الالي التعلم الآلي التحسين والتحكم

Efficiently Bounding Optimal Solutions after Small Data Modification in Large-Scale Empirical Risk Minimization

97 - Hiroyuki Hanada , Atsushi Shibagaki , Jun Sakuma 2016

We study large-scale classification problems in changing environments where a small part of the dataset is modified, and the effect of the data modification must be quickly incorporated into the classifier. When the entire dataset is large, even if t he amount of the data modification is fairly small, the computational cost of re-training the classifier would be prohibitively large. In this paper, we propose a novel method for efficiently incorporating such a data modification effect into the classifier without actually re-training it. The proposed method provides bounds on the unknown optimal classifier with the cost only proportional to the size of the data modification. We demonstrate through numerical experiments that the proposed method provides sufficiently tight bounds with negligible computational costs, especially when a small part of the dataset is modified in a large-scale classification problem.

التعلم الالي التعلم الآلي

Differentially Private Empirical Risk Minimization

417 - Kamalika Chaudhuri , Claire Monteleoni , Anand D. Sarwate 2009

Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of c lassifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the $epsilon$-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن