Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

K-Beam Minimax: Efficient Optimization for Deep Adversarial Learning

123 0 0.0 ( 0 )

Download Cite

Added by Jihun Hamm

Publication date 2018

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Jihun Hamm - Yung-Kyun Noh

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Minimax optimization plays a key role in adversarial training of machine learning algorithms, such as learning generative models, domain adaptation, privacy preservation, and robust learning. In this paper, we demonstrate the failure of alternating gradient descent in minimax optimization problems due to the discontinuity of solutions of the inner maximization. To address this, we propose a new epsilon-subgradient descent algorithm that addresses this problem by simultaneously tracking K candidate solutions. Practically, the algorithm can find solutions that previous saddle-point algorithms cannot find, with only a sublinear increase of complexity in K. We analyze the conditions under which the algorithm converges to the true solution in detail. A significant improvement in stability and convergence speed of the algorithm is observed in simple representative problems, GAN training, and domain-adaptation problems.

rate research

Efficient Multi-Objective Optimization for Deep Learning

86 - Michael Ruchte , Josif Grabocka 2021

Multi-objective optimization (MOO) is a prevalent challenge for Deep Learning, however, there exists no scalable MOO solution for truly deep neural networks. Prior work either demand optimizing a new network for every point on the Pareto front, or induce a large overhead to the number of trainable parameters by using hyper-networks conditioned on modifiable preferences. In this paper, we propose to condition the network directly on these preferences by augmenting them to the feature space. Furthermore, we ensure a well-spread Pareto front by penalizing the solutions to maintain a small angle to the preference vector. In a series of experiments, we demonstrate that our Pareto fronts achieve state-of-the-art quality despite being computed significantly faster. Furthermore, we showcase the scalability as our method approximates the full Pareto front on the CelebA dataset with an EfficientNet network at a tiny training time overhead of 7% compared to a simple single-objective optimization. We make our code publicly available at https://github.com/ruchtem/cosmos.

Machine Learning

K-TanH: Efficient TanH For Deep Learning

371 - Abhisek Kundu , Alex Heinecke , Dhiraj Kalamkar 2019

We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical formats, such as, Float32 and BFloat16. High quality approximations to other activation functions, e.g., Sigmoid, Swish and GELU, can be derived from K-TanH. Our AVX512 implementation of K-TanH demonstrates $>5times$ speed up over Intel SVML, and it is consistently superior in efficiency over other approximations that use floating point arithmetic. Finally, we achieve state-of-the-art Bleu score and convergence results for training language translation model GNMT on WMT16 data sets with approximate TanH obtained via K-TanH on BFloat16 inputs.

Machine Learning Neural and Evolutionary Computing Machine Learning

Deep Minimax Probability Machine

104 - Lirong He , Ziyi Guo , Kaizhu Huang 2019

Deep neural networks enjoy a powerful representation and have proven effective in a number of applications. However, recent advances show that deep neural networks are vulnerable to adversarial attacks incurred by the so-called adversarial examples. Although the adversarial example is only slightly different from the input sample, the neural network classifies it as the wrong class. In order to alleviate this problem, we propose the Deep Minimax Probability Machine (DeepMPM), which applies MPM to deep neural networks in an end-to-end fashion. In a worst-case scenario, MPM tries to minimize an upper bound of misclassification probabilities, considering the global information (i.e., mean and covariance information of each class). DeepMPM can be more robust since it learns the worst-case bound on the probability of misclassification of future data. Experiments on two real-world datasets can achieve comparable classification performance with CNN, while can be more robust on adversarial attacks.

Machine Learning Machine Learning

Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

67 - Tiancheng Yu , Suvrit Sra 2019

A Markov Decision Process (MDP) is a popular model for reinforcement learning. However, its commonly used assumption of stationary dynamics and rewards is too stringent and fails to hold in adversarial, nonstationary, or multi-agent problems. We study an episodic setting where the parameters of an MDP can differ across episodes. We learn a reliable policy of this potentially adversarial MDP by developing an Adversarial Reinforcement Learning (ARL) algorithm that reduces our MDP to a sequence of emph{adversarial} bandit problems. ARL achieves $O(sqrt{SATH^3})$ regret, which is optimal with respect to $S$, $A$, and $T$, and its dependence on $H$ is the best (even for the usual stationary MDP) among existing model-free methods.

Machine Learning Machine Learning

Efficient Deep Learning of GMMs

102 - Shirin Jalali , Carl Nuzman , Iraj Saniee 2019

We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(exp(n))$ neurons or possibly exponentially large coefficients. Given the universality of the Gaussian distribution in the feature spaces of data, e.g., in speech, image and text, our result sheds light on the observed efficiency of deep neural networks in practical classification problems.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

K-Beam Minimax: Efficient Optimization for Deep Adversarial Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions