ترغب بنشر مسار تعليمي؟ اضغط هنا

The Perturbed Prox-Preconditioned SPIDER algorithm for EM-based large scale learning

59   0   0.0 ( 0 )
 نشر من قبل Gersende Fort
 تاريخ النشر 2021
والبحث باللغة English
 تأليف Gersende Fort




اسأل ChatGPT حول البحث

Incremental Expectation Maximization (EM) algorithms were introduced to design EM for the large scale learning framework by avoiding the full data set to be processed at each iteration. Nevertheless, these algorithms all assume that the conditional expectations of the sufficient statistics are explicit. In this paper, we propose a novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPIDER), which builds on the Stochastic Path Integral Differential EstimatoR EM (SPIDER-EM) algorithm. The 3P-SPIDER algorithm addresses many intractabilities of the E-step of EM; it also deals with non-smooth regularization and convex constraint set. Numerical experiments show that 3P-SPIDER outperforms other incremental EM methods and discuss the role of some design parameters.

قيم البحث

اقرأ أيضاً

72 - Gersende Fort 2021
A novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPIDER) is introduced. It is a stochastic variancereduced proximal-gradient type algorithm built on Stochastic Path Integral Differential EstimatoR (SPIDER), an algorithm known to achie ve near-optimal first-order oracle inequality for nonconvex and nonsmooth optimization. Compared to the vanilla prox-SPIDER, 3P-SPIDER uses preconditioned gradient estimators. Preconditioning can either be applied explicitly to a gradient estimator or be introduced implicitly as in applications to the EM algorithm. 3P-SPIDER also assumes that the preconditioned gradients may (possibly) be not known in closed analytical form and therefore must be approximated which adds an additional degree of perturbation. Studying the convergence in expectation, we show that 3P-SPIDER achieves a near-optimal oracle inequality O(n^(1/2) /epsilon) where n is the number of observations and epsilon the target precision even when the gradient is estimated by Monte Carlo methods. We illustrate the algorithm on an application to the minimization of a penalized empirical loss.
We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes one process at a time and obtains a noisy binary indicator of whether or not the corresponding pro cess is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the process to be observed at a given time instant, decides when to stop taking observations, and makes a decision regarding the anomalous processes. The objective of the detection algorithm is to arrive at a decision with an accuracy exceeding a desired value while minimizing the delay in decision making. Our algorithm relies on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithm using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithm has computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of our algorithm using numerical experiments by comparing it with the state-of-the-art methods.
Radio signal classification has a very wide range of applications in the field of wireless communications and electromagnetic spectrum management. In recent years, deep learning has been used to solve the problem of radio signal classification and ha s achieved good results. However, the radio signal data currently used is very limited in scale. In order to verify the performance of the deep learning-based radio signal classification on real-world radio signal data, in this paper we conduct experiments on large-scale real-world ACARS and ADS-B signal data with sample sizes of 900,000 and 13,000,000, respectively, and with categories of 3,143 and 5,157 respectively. We use the same Inception-Residual neural network model structure for ACARS signal classification and ADS-B signal classification to verify the ability of a single basic deep neural network model structure to process different types of radio signals, i.e., communication bursts in ACARS and pulse bursts in ADS-B. We build an experimental system for radio signal deep learning experiments. Experimental results show that the signal classification accuracy of ACARS and ADS-B is 98.1% and 96.3%, respectively. When the signal-to-noise ratio (with injected additive white Gaussian noise) is greater than 9 dB, the classification accuracy is greater than 92%. These experimental results validate the ability of deep learning to classify large-scale real-world radio signals. The results of the transfer learning experiment show that the model trained on large-scale ADS-B datasets is more conducive to the learning and training of new tasks than the model trained on small-scale datasets.
With the depletion of spectrum, wireless communication systems turn to exploit large antenna arrays to achieve the degree of freedom in space domain, such as millimeter wave massive multi-input multioutput (MIMO), reconfigurable intelligent surface a ssisted communications and cell-free massive MIMO. In these systems, how to acquire accurate channel state information (CSI) is difficult and becomes a bottleneck of the communication links. In this article, we introduce the concept of channel extrapolation that relies on a small portion of channel parameters to infer the remaining channel parameters. Since the substance of channel extrapolation is a mapping from one parameter subspace to another, we can resort to deep learning (DL), a powerful learning architecture, to approximate such mapping function. Specifically, we first analyze the requirements, conditions and challenges for channel extrapolation. Then, we present three typical extrapolations over the antenna dimension, the frequency dimension, and the physical terminal, respectively. We also illustrate their respective principles, design challenges and DL strategies. It will be seen that channel extrapolation could greatly reduce the transmission overhead and subsequently enhance the performance gains compared with the traditional strategies. In the end, we provide several potential research directions on channel extrapolation for future intelligent communications systems.
138 - Yao HengShuai 2012
This paper has been withdrawn by the author. This draft is withdrawn for its poor quality in english, unfortunately produced by the author when he was just starting his science route. Look at the ICML version instead: http://icml2008.cs.helsinki.fi/papers/111.pdf

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا