Disentangling Action Sequences: Discovering Correlated Samples

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jiantao Wu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jiantao Wu - Lin Wang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Disentanglement is a highly desirable property of representation due to its similarity with humans understanding and reasoning. This improves interpretability, enables the performance of down-stream tasks, and enables controllable generative models. However, this domain is challenged by the abstract notion and incomplete theories to support unsupervised disentanglement learning. We demonstrate the data itself, such as the orientation of images, plays a crucial role in disentanglement and instead of the factors, and the disentangled representations align the latent variables with the action sequences. We further introduce the concept of disentangling action sequences which facilitates the description of the behaviours of the existing disentangling approaches. An analogy for this process is to discover the commonality between the things and categorizing them. Furthermore, we analyze the inductive biases on the data and find that the latent information thresholds are correlated with the significance of the actions. For the supervised and unsupervised settings, we respectively introduce two methods to measure the thresholds. We further propose a novel framework, fractional variational autoencoder (FVAE), to disentangle the action sequences with different significance step-by-step. Experimental results on dSprites and 3D Chairs show that FVAE improves the stability of disentanglement.

قيم البحث

134 - William F. Whitney , Rob Fergus 2019

We propose an unsupervised variational model for disentangling video into independent factors, i.e. each factors future can be predicted from its past without considering the others. We show that our approach often learns factors which are interpretable as objects in a scene.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

TequilaGAN: How to easily identify GAN samples

61 - Rafael Valle , Wilson Cai , Anish Doshi 2018

In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The o ther strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fake samples produced with GANs have a universal signature that can be used to identify fake samples. We provide results on MNIST, CIFAR10, music and speech data.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

326 - Byeongho Heo , Minsik Lee , Sangdoo Yun 2018

Many recent works on knowledge distillation have provided ways to transfer the knowledge of a trained network for improving the learning process of a new one, but finding a good technique for knowledge distillation is still an open problem. In this p aper, we provide a new perspective based on a decision boundary, which is one of the most important component of a classifier. The generalization performance of a classifier is closely related to the adequacy of its decision boundary, so a good classifier bears a good decision boundary. Therefore, transferring information closely related to the decision boundary can be a good attempt for knowledge distillation. To realize this goal, we utilize an adversarial attack to discover samples supporting a decision boundary. Based on this idea, to transfer more accurate information about the decision boundary, the proposed algorithm trains a student classifier based on the adversarial samples supporting the decision boundary. Experiments show that the proposed method indeed improves knowledge distillation and achieves the state-of-the-arts performance.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning

84 - Yuanyi Zhong , Alexander Schwing , Jian Peng 2020

In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e.g., the players avatar in video games and the robotic arm in visual grasping and manipulation. Leveraging action-conditioned video p rediction, we propose an end-to-end learning framework to disentangle the controllable object from the observation signal. The disentangled representation is shown to be useful for RL as additional observation channels to the agent. Experiments on a set of Atari games with the popular Double DQN algorithm demonstrate improved sample efficiency and game performance (from 222.8% to 261.4% measured in normalized game scores, with prediction bonus reward).

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Disentangling Neural Architectures and Weights: A Case Study in Supervised Classification

232 - Nicolo Colombo , Yang Gao 2020

The history of deep learning has shown that human-designed problem-specific networks can greatly improve the classification performance of general neural models. In most practical cases, however, choosing the optimal architecture for a given task rem ains a challenging problem. Recent architecture-search methods are able to automatically build neural models with strong performance but fail to fully appreciate the interaction between neural architecture and weights. This work investigates the problem of disentangling the role of the neural structure and its edge weights, by showing that well-trained architectures may not need any link-specific fine-tuning of the weights. We compare the performance of such weight-free networks (in our case these are binary networks with {0, 1}-valued weights) with random, weight-agnostic, pruned and standard fully connected networks. To find the optimal weight-agnostic network, we use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.More specifically, we look at the optimal task-specific architectures as the optimal configuration of binary networks with {0, 1}-valued weights, which can be found through an approximate gradient descent strategy. Theoretical convergence guarantees of the proposed algorithm are obtained by bounding the error in the gradient approximation and its practical performance is evaluated on two real-world data sets. For measuring the structural similarities between different architectures, we use a novel spectral approach that allows us to underline the intrinsic differences between real-valued networks and weight-free architectures.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي