بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Disentangling Video with Independent Prediction

135 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل William Whitney

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف William F. Whitney - Rob Fergus

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose an unsupervised variational model for disentangling video into independent factors, i.e. each factors future can be predicted from its past without considering the others. We show that our approach often learns factors which are interpretable as objects in a scene.

قيم البحث

84 - Yuanyi Zhong , Alexander Schwing , Jian Peng 2020

In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e.g., the players avatar in video games and the robotic arm in visual grasping and manipulation. Leveraging action-conditioned video p rediction, we propose an end-to-end learning framework to disentangle the controllable object from the observation signal. The disentangled representation is shown to be useful for RL as additional observation channels to the agent. Experiments on a set of Atari games with the popular Double DQN algorithm demonstrate improved sample efficiency and game performance (from 222.8% to 261.4% measured in normalized game scores, with prediction bonus reward).

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Training independent subnetworks for robust prediction

624 - Marton Havasi , Rodolphe Jenatton , Stanislav Fort 2020

Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network. However, these methods still require multiple forw ard passes for prediction, leading to a significant computational cost. In this work, we show a surprising result: the benefits of using multiple predictions can be achieved `for free under a single models forward pass. In particular, we show that, using a multi-input multi-output (MIMO) configuration, one can utilize a single models capacity to train multiple subnetworks that independently learn the task at hand. By ensembling the predictions made by the subnetworks, we improve model robustness without increasing compute. We observe a significant improvement in negative log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, ImageNet, and their out-of-distribution variants compared to previous methods.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Structured Object-Aware Physics Prediction for Video Modeling and Planning

101 - Jannik Kossen , Karl Stelzner , Marcel Hussing 2019

When humans observe a physical system, they can easily locate objects, understand their interactions, and anticipate future behavior, even in settings with complicated and previously unseen interactions. For computers, however, learning such models f rom videos in an unsupervised fashion is an unsolved research problem. In this paper, we present STOVE, a novel state-space model for videos, which explicitly reasons about objects and their positions, velocities, and interactions. It is constructed by combining an image model and a dynamics model in compositional manner and improves on previous work by reusing the dynamics model for inference, accelerating and regularizing training. STOVE predicts videos with convincing physical behavior over hundreds of timesteps, outperforms previous unsupervised models, and even approaches the performance of supervised baselines. We further demonstrate the strength of our model as a simulator for sample efficient model-based control in a task with heavily interacting objects.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Disentangling Action Sequences: Discovering Correlated Samples

70 - Jiantao Wu , Lin Wang 2020

Disentanglement is a highly desirable property of representation due to its similarity with humans understanding and reasoning. This improves interpretability, enables the performance of down-stream tasks, and enables controllable generative models. However, this domain is challenged by the abstract notion and incomplete theories to support unsupervised disentanglement learning. We demonstrate the data itself, such as the orientation of images, plays a crucial role in disentanglement and instead of the factors, and the disentangled representations align the latent variables with the action sequences. We further introduce the concept of disentangling action sequences which facilitates the description of the behaviours of the existing disentangling approaches. An analogy for this process is to discover the commonality between the things and categorizing them. Furthermore, we analyze the inductive biases on the data and find that the latent information thresholds are correlated with the significance of the actions. For the supervised and unsupervised settings, we respectively introduce two methods to measure the thresholds. We further propose a novel framework, fractional variational autoencoder (FVAE), to disentangle the action sequences with different significance step-by-step. Experimental results on dSprites and 3D Chairs show that FVAE improves the stability of disentanglement.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Video Ladder Networks

346 - Francesco Cricri , Xingyang Ni , Mikko Honkala 2016

We present the Video Ladder Network (VLN) for efficiently generating future video frames. VLN is a neural encoder-decoder model augmented at all layers by both recurrent and feedforward lateral connections. At each layer, these connections form a lat eral recurrent residual block, where the feedforward connection represents a skip connection and the recurrent connection represents the residual. Thanks to the recurrent connections, the decoder can exploit temporal summaries generated from all layers of the encoder. This way, the top layer is relieved from the pressure of modeling lower-level spatial and temporal details. Furthermore, we extend the basic version of VLN to incorporate ResNet-style residual blocks in the encoder and decoder, which help improving the prediction results. VLN is trained in self-supervised regime on the Moving MNIST dataset, achieving competitive results while having very simple structure and providing fast inference.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

الأسئلة المقترحة

ما العلاقة بين الذكاء الاصطناعي وتعلم الآلة؟

1928 - 0 - - Shamra Editor تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

ماذا يعني التنقيب عن البيانات؟

2252 - 0 - - Ahmad Ali تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

ماهي وسائل التنقيب في البيانات؟

2021 - 0 - - Ahmad Ali تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة تشرين

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Disentangling Video with Independent Prediction

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

We propose an unsupervised variational model for disentangling video into independent factors, i.e. each factors future can be predicted from its past without considering the others. We show that our approach often learns factors which are interpretable as objects in a scene.

اقرأ أيضاً

الأسئلة المقترحة