ترغب بنشر مسار تعليمي؟ اضغط هنا

Geometric median (textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeas ible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying textsc{Gm} to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 0.5 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with textsc{Gm}.
Distributed source coding is the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder that has no access to the correlated side info rmation can asymptotically achieve the same compression rate as when the side information is available at both the encoder and the decoder. While there is significant prior work on this topic in information theory, practical distributed source coding has been limited to synthetic datasets and specific correlation structures. Here we present a general framework for lossy distributed source coding that is agnostic to the correlation structure and can scale to high dimensions. Rather than relying on hand-crafted source-modeling, our method utilizes a powerful conditional deep generative model to learn the distributed encoder and decoder. We evaluate our method on realistic high-dimensional datasets and show substantial improvements in distributed compression performance.
Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomena like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task and show that the dialogue simulator is an essential component of the system that leads to over $50%$ improvement in turn-level action signature prediction accuracy.
In this paper, we propose texttt{FedGLOMO}, the first (first-order) FL algorithm that achieves the optimal iteration complexity (i.e matching the known lower bound) on smooth non-convex objectives -- without using clients full gradient in each round. Our key algorithmic idea that enables attaining this optimal complexity is applying judicious momentum terms that promote variance reduction in both the local updates at the clients, and the global update at the server. Our algorithm is also provably optimal even with compressed communication between the clients and the server, which is an important consideration in the practical deployment of FL algorithms. Our experiments illustrate the intrinsic variance reduction effect of texttt{FedGLOMO} which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication-efficiency. As a prequel to texttt{FedGLOMO}, we propose texttt{FedLOMO} which applies momentum only in the local client updates. We establish that texttt{FedLOMO} enjoys improved convergence rates under common non-convex settings compared to prior work, and with fewer assumptions.
In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {em lossy compresse
32 - Anish Acharya 2018
This work proposes a novel feature selection algorithm to classify Songs into different groups. Classification of musical content is often a non-trivial job and still relatively less explored area. The main idea conveyed in this article is to come up with a new feature selection scheme that does the classification job elegantly and with high accuracy but with simpler but wisely chosen small number of features thus being less prone to over-fitting. This uses a very basic general idea about the structure of the audio signal which is generally in the shape of a trapezium. So, using this general idea of the Musical Community we propose three frames to be considered and analyzed for feature extraction for each of the audio signal -- opening, stanzas and closing -- and it has been established with the help of a lot of experiments that this scheme leads to much efficient classification with less complex features in a low dimensional feature space thus is also a computationally less expensive method. Step by step analysis of feature extraction, feature ranking, dimensionality reduction using PCA has been carried in this article. Sequential Forward selection (SFS) algorithm is used to explore the most significant features both with the raw Fisher Discriminant Ratio (FDR) and also with the significant eigen-values after PCA. Also during classification extensive validation and cross validation has been done in a monte-carlo manner to ensure validity of the claims.
Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce signif icant latency. We propose a compression method that leverages low rank matrix factorization during training,to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.
Multi-wing chaotic attractors are highly complex nonlinear dynamical systems with higher number of index-2 equilibrium points. Due to the presence of several equilibrium points, randomness and hence the complexity of the state time series for these m ulti-wing chaotic systems is much higher than that of the conventional double-wing chaotic attractors. A real-coded Genetic Algorithm (GA) based global optimization framework has been adopted in this paper as a common template for designing optimum Proportional-Integral-Derivative (PID) controllers in order to control the state trajectories of four different multi-wing chaotic systems among the Lorenz family viz. Lu system, Chen system, Rucklidge (or Shimizu Morioka) system and Sprott-1 system. Robustness of the control scheme for different initial conditions of the multi-wing chaotic systems has also been shown.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا