Are We Overfitting to Experimental Setups in Recognition?

67 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aditya Kusupati

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Matthew Wallingford - Aditya Kusupati - Keivan Alizadeh-Vahid

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Enabling robust intelligence in the real-world entails systems that offer continuous inference while learning from varying amounts of data and supervision. The machine learning community has organically broken down this challenging goal into manageable sub-tasks such as supervised, few-shot, and continual learning. In light of substantial progress on each sub-task, we pose the question, How well does this progress translate to more practical scenarios? To investigate this question, we construct a new framework, FLUID, which removes certain assumptions made by current experimental setups while integrating these sub-tasks via the following design choices -- consuming sequential data, allowing for flexible training phases, being compute aware, and working in an open-world setting. Evaluating a broad set of methods on FLUID leads to new insights including strong evidence that methods are overfitting to their experimental setup. For example, we find that representative few-shot methods are substantially worse than simple baselines, self-supervised representations from MoCo fail to learn new classes when the downstream task contains a mix of new and old classes, and pretraining largely mitigates the problem of catastrophic forgetting. Finally, we propose two new simple methods which outperform all other evaluated methods which further questions our progress towards robust, real-world systems. Project page: https://raivn.cs.washington.edu/projects/FLUID/.

قيم البحث

اقرأ أيضاً

Are we done with ImageNet?

200 - Lucas Beyer , Olivier J. Henaff , Alexander Kolesnikov andn Xiaohua Zhai 2020

Yes, and no. We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore devel op a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels. Furthermore, we find the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Nevertheless, we find our annotation procedure to have largely remedied the errors in the original labels, reinforcing ImageNet as a powerful benchmark for future research in visual recognition.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

FitVid: Overfitting in Pixel-Level Video Prediction

104 - Mohammad Babaeizadeh , Mohammad Taghi Saffar , Suraj Nair 2021

An agent that is capable of predicting what happens next can perform a variety of tasks through planning with no additional training. Furthermore, such an agent can internally represent the complex dynamics of the real-world and therefore can acquire a representation useful for a variety of visual perception tasks. This makes predicting the future frames of a video, conditioned on the observed past and potentially future actions, an interesting task which remains exceptionally challenging despite many recent advances. Existing video prediction models have shown promising results on simple narrow benchmarks but they generate low quality predictions on real-life datasets with more complicated dynamics or broader domain. There is a growing body of evidence that underfitting on the training data is one of the primary causes for the low quality predictions. In this paper, we argue that the inefficient use of parameters in the current video models is the main reason for underfitting. Therefore, we introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks while having similar parameter count as the current state-of-the-art models. We analyze the consequences of overfitting, illustrating how it can produce unexpected outcomes such as generating high quality output by repeating the training data, and how it can be mitigated using existing image augmentation techniques. As a result, FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Are we asking the right questions in MovieQA?

93 - Bhavan Jasani , Rohit Girdhar , Deva Ramanan 2019

Joint vision and language tasks like visual question answering are fascinating because they explore high-level understanding, but at the same time, can be more prone to language biases. In this paper, we explore the biases in the MovieQA dataset and propose a strikingly simple model which can exploit them. We find that using the right word embedding is of utmost importance. By using an appropriately trained word embedding, about half the Question-Answers (QAs) can be answered by looking at the questions and answers alone, completely ignoring narrative context from video clips, subtitles, and movie scripts. Compared to the best published papers on the leaderboard, our simple question + answer only model improves accuracy by 5% for video + subtitle category, 5% for subtitle, 15% for DVS and 6% higher for scripts.

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة

DAQ meta-software for HEP experimental setups

60 - S. Ryzhikov 2020

Meta-software for data acquisition (DAQ) is a new approach to design the DAQ systems for experimental setups in experiments in high energy physics (HEP). It abstracts from experiment-specific data processing logic, but reflects it through configurati on. It is also intended to substitute highly integrated DAQ software for a swarm of single-functional components, orchestrated by universal meta-software.

فيزياء الطاقة العالية - التجربة أنظمة وتحكم أنظمة وتحكم

Robustness and Overfitting Behavior of Implicit Background Models

100 - Shirley Liu , Charles Lehman , Ghassan AlRegib 2020

In this paper, we examine the overfitting behavior of image classification models modified with Implicit Background Estimation (SCrIBE), which transforms them into weakly supervised segmentation models that provide spatial domain visualizations witho ut affecting performance. Using the segmentation masks, we derive an overfit detection criterion that does not require testing labels. In addition, we assess the change in model performance, calibration, and segmentation masks after applying data augmentations as overfitting reduction measures and testing on various types of distorted images.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي