ترغب بنشر مسار تعليمي؟ اضغط هنا

Online Asynchronous Distributed Regression

98   0   0.0 ( 0 )
 نشر من قبل Gerard Biau
 تاريخ النشر 2014
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Distributed computing offers a high degree of flexibility to accommodate modern learning constraints and the ever increasing size of datasets involved in massive data issues. Drawing inspiration from the theory of distributed computation models developed in the context of gradient-type optimization algorithms, we present a consensus-based asynchronous distributed approach for nonparametric online regression and analyze some of its asymptotic properties. Substantial numerical evidence involving up to 28 parallel processors is provided on synthetic datasets to assess the excellent performance of our method, both in terms of computation time and prediction accuracy.

قيم البحث

اقرأ أيضاً

We consider a sparse multi-task regression framework for fitting a collection of related sparse models. Representing models as nodes in a graph with edges between related models, a framework that fuses lasso regressions with the total variation penal ty is investigated. Under a form of restricted eigenvalue assumption, bounds on prediction and squared error are given that depend upon the sparsity of each model and the differences between related models. This assumption relates to the smallest eigenvalue restricted to the intersection of two cone sets of the covariance matrix constructed from each of the agents covariances. We show that this assumption can be satisfied if the constructed covariance matrix satisfies a restricted isometry property. In the case of a grid topology high-probability bounds are given that match, up to log factors, the no-communication setting of fitting a lasso on each model, divided by the number of agents. A decentralised dual method that exploits a convex-concave formulation of the penalised problem is proposed to fit the models and its effectiveness demonstrated on simulations against the group lasso and variants.
Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which is the expe cted proportion of type I error, is an important challenge for making meaningful inferences. In this paper, we consider the problem of controlling FDR in an online manner. Concretely, we consider an ordered, possibly infinite, sequence of hypotheses, arriving one at each timestep, and for each hypothesis we observe a p-value along with a set of features specific to that hypothesis. The decision whether or not to reject the current hypothesis must be made immediately at each timestep, before the next hypothesis is observed. The model of multi-dimensional feature set provides a very general way of leveraging the auxiliary information in the data which helps in maximizing the number of discoveries. We propose a new class of powerful online testing procedures, where the rejections thresholds (significance levels) are learnt sequentially by incorporating contextual information and previous results. We prove that any rule in this class controls online FDR under some standard assumptions. We then focus on a subclass of these procedures, based on weighting significance levels, to derive a practical algorithm that learns a parametric weight function in an online fashion to gain more discoveries. We also theoretically prove, in a stylized setting, that our proposed procedures would lead to an increase in the achieved statistical power over a popular online testing procedure proposed by Javanmard & Montanari (2018). Finally, we demonstrate the favorable performance of our procedure, by comparing it to state-of-the-art online multiple testing procedures, on both synthetic data and real data generated from different applications.
This article is concerned with the Bridge Regression, which is a special family in penalized regression with penalty function $sum_{j=1}^{p}|beta_j|^q$ with $q>0$, in a linear model with linear restrictions. The proposed restricted bridge (RBRIDGE) e stimator simultaneously estimates parameters and selects important variables when a prior information about parameters are available in either low dimensional or high dimensional case. Using local quadratic approximation, the penalty term can be approximated around a local initial values vector and the RBRIDGE estimator enjoys a closed-form expression which can be solved when $q>0$. Special cases of our proposal are the restricted LASSO ($q=1$), restricted RIDGE ($q=2$), and restricted Elastic Net ($1< q < 2$) estimators. We provide some theoretical properties of the RBRIDGE estimator under for the low dimensional case, whereas the computational aspects are given for both low and high dimensional cases. An extensive Monte Carlo simulation study is conducted based on different prior pieces of information and the performance of the RBRIDGE estiamtor is compared with some competitive penalty estimators as well as the ORACLE. We also consider four real data examples analysis for comparison sake. The numerical results show that the suggested RBRIDGE estimator outperforms outstandingly when the prior is true or near exact
We study distributed estimation methods under communication constraints in a distributed version of the nonparametric random design regression model. We derive minimax lower bounds and exhibit methods that attain those bounds. Moreover, we show that adaptive estimation is possible in this setting.
We investigate whether in a distributed setting, adaptive estimation of a smooth function at the optimal rate is possible under minimal communication. It turns out that the answer depends on the risk considered and on the number of servers over which the procedure is distributed. We show that for the $L_infty$-risk, adaptively obtaining optimal rates under minimal communication is not possible. For the $L_2$-risk, it is possible over a range of regularities that depends on the relation between the number of local servers and the total sample size.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا