Statistical Model Aggregation via Parameter Matching

98 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mikhail Yurochkin

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Mikhail Yurochkin - Mayank Agarwal - Soumya Ghosh

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of aggregating models learned from sequestered, possibly heterogeneous datasets. Exploiting tools from Bayesian nonparametrics, we develop a general meta-modeling framework that learns shared global latent structures by identifying correspondences among local model parameterizations. Our proposed framework is model-independent and is applicable to a wide range of model types. After verifying our approach on simulated data, we demonstrate its utility in aggregating Gaussian topic models, hierarchical Dirichlet process based hidden Markov models, and sparse Gaussian processes with applications spanning text summarization, motion capture analysis, and temperature forecasting.

قيم البحث

186 - Jun Han , Qiang Liu 2016

In distributed, or privacy-preserving learning, we are often given a set of probabilistic models estimated from different local repositories, and asked to combine them into a single model that gives efficient statistical estimation. A simple method i s to linearly average the parameters of the local models, which, however, tends to be degenerate or not applicable on non-convex models, or models with different parameter dimensions. One more practical strategy is to generate bootstrap samples from the local models, and then learn a joint model based on the combined bootstrap set. Unfortunately, the bootstrap procedure introduces additional noise and can significantly deteriorate the performance. In this work, we propose two variance reduction methods to correct the bootstrap noise, including a weighted M-estimator that is both statistically efficient and practically powerful. Both theoretical and empirical analysis is provided to demonstrate our methods.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Active Model Aggregation via Stochastic Mirror Descent

356 - Ravi Ganti 2015

We consider the problem of learning convex aggregation of models, that is as good as the best convex aggregation, for the binary classification problem. Working in the stream based active learning setting, where the active learner has to make a decis ion on-the-fly, if it wants to query for the label of the point currently seen in the stream, we propose a stochastic-mirror descent algorithm, called SMD-AMA, with entropy regularization. We establish an excess risk bounds for the loss of the convex aggregate returned by SMD-AMA to be of the order of $Oleft(sqrt{frac{log(M)}{{T^{1-mu}}}}right)$, where $muin [0,1)$ is an algorithm dependent parameter, that trades-off the number of labels queried, and excess risk.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Sequential model aggregation for production forecasting

72 - Raphael Deswarte 2018

Production forecasting is a key step to design the future development of a reservoir. A classical way to generate such forecasts consists in simulating future production for numerical models representative of the reservoir. However, identifying such models can be very challenging as they need to be constrained to all available data. In particular, they should reproduce past production data, which requires to solve a complex non-linear inverse problem. In this paper, we thus propose to investigate the potential of machine learning algorithms to predict the future production of a reservoir based on past production data without model calibration. We focus more specifically on robust online aggregation, a deterministic approach that provides a robust framework to make forecasts on a regular basis. This method does not rely on any specific assumption or need for stochastic modeling. Forecasts are first simulated for a set of base reservoir models representing the prior uncertainty, and then combined to predict production at the next time step. The weight associated to each forecast is related to its past performance. Three different algorithms are considered for weight computations: the exponentially weighted average algorithm, ridge regression and the Lasso regression. They are applied on a synthetic reservoir case study, the Brugge case, for sequential predictions. To estimate the potential of development scenarios, production forecasts are needed on long periods of time without intermediary data acquisition. An extension of the deterministic aggregation approach is thus proposed in this paper to provide such multi-step-ahead forecasts.

التعلم الالي التعلم الآلي نظرية الإحصاء

Statistical Estimation and Inference via Local SGD in Federated Learning

116 - Xiang Li , Jiadong Liang , Xiangyu Chang 2021

Federated Learning (FL) makes a large amount of edge computing devices (e.g., mobile phones) jointly learn a global model without data sharing. In FL, data are generated in a decentralized manner with high heterogeneity. This paper studies how to per form statistical estimation and inference in the federated setting. We analyze the so-called Local SGD, a multi-round estimation procedure that uses intermittent communication to improve communication efficiency. We first establish a {it functional central limit theorem} that shows the averaged iterates of Local SGD weakly converge to a rescaled Brownian motion. We next provide two iterative inference methods: the {it plug-in} and the {it random scaling}. Random scaling constructs an asymptotically pivotal statistic for inference by using the information along the whole Local SGD path. Both the methods are communication efficient and applicable to online data. Our theoretical and empirical results show that Local SGD simultaneously achieves both statistical efficiency and communication efficiency.

التعلم الالي التعلم الآلي

Parameter Estimation for the SEIR Model Using Recurrent Nets

192 - Chun Fan , Yuxian Meng , Xiaofei Sun 2021

The standard way to estimate the parameters $Theta_text{SEIR}$ (e.g., the transmission rate $beta$) of an SEIR model is to use grid search, where simulations are performed on each set of parameters, and the parameter set leading to the least $L_2$ di stance between predicted number of infections and observed infections is selected. This brute-force strategy is not only time consuming, as simulations are slow when the population is large, but also inaccurate, since it is impossible to enumerate all parameter combinations. To address these issues, in this paper, we propose to transform the non-differentiable problem of finding optimal $Theta_text{SEIR}$ to a differentiable one, where we first train a recurrent net to fit a small number of simulation data. Next, based on this recurrent net that is able to generalize SEIR simulations, we are able to transform the objective to a differentiable one with respect to $Theta_text{SEIR}$, and straightforwardly obtain its optimal value. The proposed strategy is both time efficient as it only relies on a small number of SEIR simulations, and accurate as we are able to find the optimal $Theta_text{SEIR}$ based on the differentiable objective. On two COVID-19 datasets, we observe that the proposed strategy leads to significantly better parameter estimations with a smaller number of simulations.

التعلم الالي التعلم الآلي