أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Dave Epstein

Globetrotter: Unsupervised Multilingual Translation from Visual Alignment

105 - Didac Suris , Dave Epstein , Carl Vondrick 2020

Multi-language machine translation without parallel corpora is challenging because there is no explicit supervision between languages. Existing unsupervised methods typically rely on topological properties of the language representations. We introduc e a framework that instead uses the visual modality to align multiple languages, using images as the bridge between them. We estimate the cross-modal alignment between language and images, and use this estimate to guide the learning of cross-lingual representations. Our language representations are trained jointly in one model with a single stage. Experiments with fifty-two languages show that our method outperforms baselines on unsupervised word-level and sentence-level translation using retrieval.

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Oops! Predicting Unintentional Action in Video

99 - Dave Epstein , Boyuan Chen , Carl Vondrick 2019

From just a short glance at a video, we can often tell whether a persons action is intentional or not. Can we train a model to recognize this? We introduce a dataset of in-the-wild videos of unintentional action, as well as a suite of tasks for recog nizing, localizing, and anticipating its onset. We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks. We also investigate self-supervised representations that leverage natural signals in our dataset, and show the effectiveness of an approach that uses the intrinsic speed of video to perform competitively with highly-supervised pretraining. However, a significant gap between machine and human performance remains. The project website is available at https://oops.cs.columbia.edu

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد