Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Energy-Efficient and Federated Meta-Learning via Projected Stochastic Gradient Ascent

181 0 0.0 ( 0 )

Download Cite

Added by Anis Elgabli

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Anis Elgabli - Chaouki Ben Issaid - Amrit S. Bedi

Machine Learning Artificial Intelligence Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose an energy-efficient federated meta-learning framework. The objective is to enable learning a meta-model that can be fine-tuned to a new task with a few number of samples in a distributed setting and at low computation and communication energy consumption. We assume that each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model. Assuming each task was trained offline on the agents local data, we propose a lightweight algorithm that starts from the local models of all agents, and in a backward manner using projected stochastic gradient ascent (P-SGA) finds a meta-model. The proposed method avoids complex computations such as computing hessian, double looping, and matrix inversion, while achieving high performance at significantly less energy consumption compared to the state-of-the-art methods such as MAML and iMAML on conducted experiments for sinusoid regression and image classification tasks.

rate research

Meta-Learning Bandit Policies by Gradient Ascent

125 - Branislav Kveton , Martin Mladenov , Chih-Wei Hsu 2020

Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters. The former are often too conservative in practical settings, while the latter require assumptions that are hard to verify in practice. We study bandit problems that fall between these two extremes, where the learning agent has access to sampled bandit instances from an unknown prior distribution $mathcal{P}$ and aims to achieve high reward on average over the bandit instances drawn from $mathcal{P}$. This setting is of a particular importance because it lays foundations for meta-learning of bandit policies and reflects more realistic assumptions in many practical domains. We propose the use of parameterized bandit policies that are differentiable and can be optimized using policy gradients. This provides a broadly applicable framework that is easy to implement. We derive reward gradients that reflect the structure of bandit problems and policies, for both non-contextual and contextual settings, and propose a number of interesting policies that are both differentiable and have low regret. Our algorithmic and theoretical contributions are supported by extensive experiments that show the importance of baseline subtraction, learned biases, and the practicality of our approach on a range problems.

Machine Learning Machine Learning

Stochastic Gradient Push for Distributed Deep Learning

123 - Mahmoud Assran , Nicolas Loizou , Nicolas Ballas 2018

Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only performs approximate distributed averaging. This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates. We prove that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus. We empirically validate the performance of SGP on image classification (ResNet-50, ImageNet) and machine translation (Transformer, WMT16 En-De) workloads. Our code will be made publicly available.

Machine Learning Artificial Intelligence Distributed Parallel and Cluster Computing

More Industry-friendly: Federated Learning with High Efficient Design

111 - Dingwei Li , Qinglong Chang , Lixue Pang 2020

Although many achievements have been made since Google threw out the paradigm of federated learning (FL), there still exists much room for researchers to optimize its efficiency. In this paper, we propose a high efficient FL method equipped with the double head design aiming for personalization optimization over non-IID dataset, and the gradual model sharing design for communication saving. Experimental results show that, our method has more stable accuracy performance and better communication efficient across various data distributions than other state of art methods (SOTAs), makes it more industry-friendly.

Machine Learning Artificial Intelligence Distributed Parallel and Cluster Computing

Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

76 - Vivek Veeriah , Shangtong Zhang , Richard S. Sutton 2016

Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning the representations directly from the incoming data stream reduces the human labour involved in designing a learning system. More importantly, this allows in scaling of a learning system for difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.

Machine Learning Artificial Intelligence Machine Learning

FedSkel: Efficient Federated Learning on Heterogeneous Systems with Skeleton Gradients Update

106 - Junyu Luo , Jianlei Yang , Xucheng Ye 2021

Federated learning aims to protect users privacy while performing data analysis from different participants. However, it is challenging to guarantee the training efficiency on heterogeneous systems due to the various computational capabilities and communication bottlenecks. In this work, we propose FedSkel to enable computation-efficient and communication-efficient federated learning on edge devices by only updating the models essential parts, named skeleton networks. FedSkel is evaluated on real edge devices with imbalanced datasets. Experimental results show that it could achieve up to 5.52$times$ speedups for CONV layers back-propagation, 1.82$times$ speedups for the whole training process, and reduce 64.8% communication cost, with negligible accuracy loss.

Machine Learning Artificial Intelligence Distributed Parallel and Cluster Computing

Energy-Efficient and Federated Meta-Learning via Projected Stochastic Gradient Ascent

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions