Restricted Strong Convexity Implies Weak Submodularity

64 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ethan R. Elenberg

تاريخ النشر 2016

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Ethan R. Elenberg - Rajiv Khanna - Alexandros G. Dimakis

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe (2011) from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form under standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connection between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure.

قيم البحث

120 - Rajiv Khanna , Ethan Elenberg , Alexandros G. Dimakis 2017

Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization. Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high. This is because for each gree dy step we need to refit a model or calculate a function using the previously selected choices and the new candidate. Two algorithms that are faster approximations to the greedy forward selection were introduced recently ([Mirzasoleiman et al. 2013, 2015]). They achieve better performance by exploiting distributed computation and stochastic evaluation respectively. Both algorithms have provable performance guarantees for submodular functions. In this paper we show that divergent from previously held opinion, submodularity is not required to obtain approximation guarantees for these two algorithms. Specifically, we show that a generalized concept of weak submodularity suffices to give multiplicative approximation guarantees. Our result extends the applicability of these algorithms to a larger class of functions. Furthermore, we show that a bounded submodularity ratio can be used to provide data dependent bounds that can sometimes be tighter also for submodular functions. We empirically validate our work by showing superior performance of fast greedy approximations versus several established baselines on artificial and real datasets.

التعلم الالي نظرية المعلومات التعلم الآلي

Convergence rate of Bayesian tensor estimator: Optimal rate without restricted strong convexity

452 - Taiji Suzuki 2014

In this paper, we investigate the statistical convergence rate of a Bayesian low-rank tensor estimator. Our problem setting is the regression problem where a tensor structure underlying the data is estimated. This problem setting occurs in many pract ical applications, such as collaborative filtering, multi-task learning, and spatio-temporal data analysis. The convergence rate is analyzed in terms of both in-sample and out-of-sample predictive accuracies. It is shown that a near optimal rate is achieved without any strong convexity of the observation. Moreover, we show that the method has adaptivity to the unknown rank of the true tensor, that is, the near optimal rate depending on the true rank is achieved even if it is not known a priori.

التعلم الالي التعلم الآلي

Ensemble Defense with Data Diversity: Weak Correlation Implies Strong Robustness

92 - Renjue Li , Hanwei Zhang , Pengfei Yang 2021

In this paper, we propose a framework of filter-based ensemble of deep neuralnetworks (DNNs) to defend against adversarial attacks. The framework builds an ensemble of sub-models -- DNNs with differentiated preprocessing filters. From the theoretical perspective of DNN robustness, we argue that under the assumption of high quality of the filters, the weaker the correlations of the sensitivity of the filters are, the more robust the ensemble model tends to be, and this is corroborated by the experiments of transfer-based attacks. Correspondingly, we propose a principle that chooses the specific filters with smaller Pearson correlation coefficients, which ensures the diversity of the inputs received by DNNs, as well as the effectiveness of the entire framework against attacks. Our ensemble models are more robust than those constructed by previous defense methods like adversarial training, and even competitive with the classical ensemble of adversarial trained DNNs under adversarial attacks when the attacking radius is large.

التعلم الآلي

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

77 - Marco Mondelli , Andrea Montanari 2017

In phase retrieval we want to recover an unknown signal $boldsymbol xinmathbb C^d$ from $n$ quadratic measurements of the form $y_i = |langle{boldsymbol a}_i,{boldsymbol x}rangle|^2+w_i$ where $boldsymbol a_iin mathbb C^d$ are known sensing vectors a nd $w_i$ is measurement noise. We ask the following weak recovery question: what is the minimum number of measurements $n$ needed to produce an estimator $hat{boldsymbol x}(boldsymbol y)$ that is positively correlated with the signal $boldsymbol x$? We consider the case of Gaussian vectors $boldsymbol a_i$. We prove that - in the high-dimensional limit - a sharp phase transition takes place, and we locate the threshold in the regime of vanishingly small noise. For $nle d-o(d)$ no estimator can do significantly better than random and achieve a strictly positive correlation. For $nge d+o(d)$ a simple spectral estimator achieves a positive correlation. Surprisingly, numerical simulations with the same spectral estimator demonstrate promising performance with realistic sensing matrices. Spectral methods are used to initialize non-convex optimization algorithms in phase retrieval, and our approach can boost the performance in this setting as well. Our impossibility result is based on classical information-theory arguments. The spectral algorithm computes the leading eigenvector of a weighted empirical covariance matrix. We obtain a sharp characterization of the spectral properties of this random matrix using tools from free probability and generalizing a recent result by Lu and Li. Both the upper and lower bound generalize beyond phase retrieval to measurements $y_i$ produced according to a generalized linear model. As a byproduct of our analysis, we compare the threshold of the proposed spectral method with that of a message passing algorithm.

التعلم الالي نظرية المعلومات نظرية المعلومات

Convexity Conditions for 802.11 WLANs

388 - Vijay G. Subramanian , Douglas J. Leith 2012

In this paper we characterise the maximal convex subsets of the (non-convex) rate region in 802.11 WLANs. In addition to being of intrinsic interest as a fundamental property of 802.11 WLANs, this characterisation can be exploited to allow the wealth of convex optimisation approaches to be applied to 802.11 WLANs.

بنية الشبكات والإنترنت نظرية المعلومات نظرية المعلومات