Subscribe to the gold package and get unlimited access to Shamra Academy

Continual Distributed Learning for Crisis Management

59 0 0.0 ( 0 )

Download Cite

Added by Aman Priyanshu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Aman Priyanshu - Mudit Sinha - Shreyans Mehta

Machine Learning Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Social media platforms such as Twitter, Facebook etc can be utilised as an important source of information during disaster events. This information can be used for disaster response and crisis management if processed accurately and quickly. However, the data present in such situations is ever-changing, and using considerable resources during such a crisis is not feasible. Therefore, we have to develop a low resource and continually learning system that incorporates text classification models which are robust against noisy and unordered data. We utilised Distributed learning which enabled us to learn on resource-constrained devices, then to alleviate catastrophic forgetting in our target neural networks we utilized regularization. We then applied federated averaging for distributed learning and to aggregate the central model for continual learning.

rate research

On Biased Compression for Distributed Learning

309 - Aleksandr Beznosikov , Samuel Horvath , Peter Richtarik andn Mher Safaryan 2020

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact {em biased} compressors often show superior performance in practice when compared to the much more studied and understood {em unbiased} compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. Our {em distributed} SGD method enjoys the ergodic rate $mathcal{O}left(frac{delta L exp(-K) }{mu} + frac{(C + D)}{Kmu}right)$, where $delta$ is a compression parameter which grows when more compression is applied, $L$ and $mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose a new highly performing biased compressor---combination of Top-$k$ and natural dithering---which in our experiments outperforms all other compression techniques.

Machine Learning Distributed Parallel and Cluster Computing Optimization and Control

Resource Management for Blockchain-enabled Federated Learning: A Deep Reinforcement Learning Approach

140 - Nguyen Quang Hieu , Tran The Anh , Nguyen Cong Luong 2020

Blockchain-enabled Federated Learning (BFL) enables mobile devices to collaboratively train neural network models required by a Machine Learning Model Owner (MLMO) while keeping data on the mobile devices. Then, the model updates are stored in the blockchain in a decentralized and reliable manner. However, the issue of BFL is that the mobile devices have energy and CPU constraints that may reduce the system lifetime and training efficiency. The other issue is that the training latency may increase due to the blockchain mining process. To address these issues, the MLMO needs to (i) decide how much data and energy that the mobile devices use for the training and (ii) determine the block generation rate to minimize the system latency, energy consumption, and incentive cost while achieving the target accuracy for the model. Under the uncertainty of the BFL environment, it is challenging for the MLMO to determine the optimal decisions. We propose to use the Deep Reinforcement Learning (DRL) to derive the optimal decisions for the MLMO.

Machine Learning Distributed Parallel and Cluster Computing Networking and Internet Architecture

Divide-and-Shuffle Synchronization for Distributed Machine Learning

102 - Weiyan Wang , Cengguang Zhang , Liu Yang 2020

Distributed Machine Learning suffers from the bottleneck of synchronization to all-reduce workers updates. Previous works mainly consider better network topology, gradient compression, or stale updates to speed up communication and relieve the bottleneck. However, all these works ignore the importance of reducing the scale of synchronized elements and inevitable serial executed operators. To address the problem, our work proposes the Divide-and-Shuffle Synchronization(DS-Sync), which divides workers into several parallel groups and shuffles group members. DS-Sync only synchronizes the workers in the same group so that the scale of a group is much smaller. The shuffle of workers maintains the algorithms convergence speed, which is interpreted in theory. Comprehensive experiments also show the significant improvements in the latest and popular models like Bert, WideResnet, and DeepFM on challenging datasets.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

Distributed Learning and its Application for Time-Series Prediction

115 - Nhuong V. Nguyen , Sybille Legitime 2021

Extreme events are occurrences whose magnitude and potential cause extensive damage on people, infrastructure, and the environment. Motivated by the extreme nature of the current global health landscape, which is plagued by the coronavirus pandemic, we seek to better understand and model extreme events. Modeling extreme events is common in practice and plays an important role in time-series prediction applications. Our goal is to (i) compare and investigate the effect of some common extreme events modeling methods to explore which method can be practical in reality and (ii) accelerate the deep learning training process, which commonly uses deep recurrent neural network (RNN), by implementing the asynchronous local Stochastic Gradient Descent (SGD) framework among multiple compute nodes. In order to verify our distributed extreme events modeling, we evaluate our proposed framework on a stock data set S&P500, with a standard recurrent neural network. Our intuition is to explore the (best) extreme events modeling method which could work well under the distributed deep learning setting. Moreover, by using asynchronous distributed learning, we aim to significantly reduce the communication cost among the compute nodes and central server, which is the main bottleneck of almost all distributed learning frameworks. We implement our proposed work and evaluate its performance on representative data sets, such as S&P500 stock in $5$-year period. The experimental results validate the correctness of the design principle and show a significant training duration reduction upto $8$x, compared to the baseline single compute node. Our results also show that our proposed work can achieve the same level of test accuracy, compared to the baseline setting.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data

166 - Kevin Hsieh 2019

The usability and practicality of any machine learning (ML) applications are largely influenced by two critical but hard-to-attain factors: low latency and low cost. Unfortunately, achieving low latency and low cost is very challenging when ML depends on real-world data that are highly distributed and rapidly growing (e.g., data collected by mobile phones and video cameras all over the world). Such real-world data pose many challenges in communication and computation. For example, when training data are distributed across data centers that span multiple continents, communication among data centers can easily overwhelm the limited wide-area network bandwidth, leading to prohibitively high latency and high cost. In this dissertation, we demonstrate that the latency and cost of ML on highly-distributed and rapidly-growing data can be improved by one to two orders of magnitude by designing ML systems that exploit the characteristics of ML algorithms, ML model structures, and ML training/serving data. We support this thesis statement with three contributions. First, we design a system that provides both low-latency and low-cost ML serving (inferencing) over large-scale and continuously-growing datasets, such as videos. Second, we build a system that makes ML training over geo-distributed datasets as fast as training within a single data center. Third, we present a first detailed study and a system-level solution on a fundamental and largely overlooked problem: ML training over non-IID (i.e., not independent and identically distributed) data partitions (e.g., facial images collected by cameras varies according to the demographics of each cameras location).

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

Continual Distributed Learning for Crisis Management

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions