The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All)

219 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل James Bergstra

تاريخ النشر 2011

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف James Bergstra - Aaron Courville - Yoshua Bengio

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Sparse coding is a proven principle for learning compact representations of images. However, sparse coding by itself often leads to very redundant dictionaries. With images, this often takes the form of similar edge detectors which are replicated many times at various positions, scales and orientations. An immediate consequence of this observation is that the estimation of the dictionary components is not statistically efficient. We propose a factored model in which factors of variation (e.g. position, scale and orientation) are untangled from the underlying Gabor-like filters. There is so much redundancy in sparse codes for natural images that our model requires only a single dictionary element (a Gabor-like edge detector) to outperform standard sparse coding. Our model scales naturally to arbitrary-sized images while achieving much greater statistical efficiency during learning. We validate this claim with a number of experiments showing, in part, superior compression of out-of-sample data using a sparse coding dictionary learned with only a single image.

قيم البحث

اقرأ أيضاً

One TTS Alignment To Rule Them All

144 - Rohan Badlani , Adrian {L}ancucki , Kevin J. Shih 2021

Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to ge neralize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in RAD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators.

أنظمة الصوت في الحاسوب الحساب واللغة التعلم الآلي

Robust Quantization: One Model to Rule Them All

302 - Moran Shkolnik , Brian Chmiel , Ron Banner 2020

Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternat ive approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the quantization process is not static and can vary to meet different circumstances and implementations. To address this issue, we propose a method that provides intrinsic robustness to the model against a broad range of quantization processes. Our method is motivated by theoretical arguments and enables us to store a single generic model capable of operating at various bit-widths and quantization policies. We validate our methods effectiveness on different ImageNet models.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework

99 - Shahroz Tariq , Sangyup Lee , Simon S. Woo 2021

Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. While deep learning-based detection methods have been proposed to ide ntify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin.

الرؤية الحاسوبية وتمييز الأنماط التشفير والأمن

One Law To Rule Them All: The Radial Acceleration Relation of Galaxies

98 - Federico Lelli 2016

We study the link between baryons and dark matter in 240 galaxies with spatially resolved kinematic data. Our sample spans 9 dex in stellar mass and includes all morphological types. We consider (i) 153 late-type galaxies (LTGs; spirals and irregular s) with gas rotation curves from the SPARC database; (ii) 25 early-type galaxies (ETGs; ellipticals and lenticulars) with stellar and HI data from ATLAS^3D or X-ray data from Chandra; and (iii) 62 dwarf spheroidals (dSphs) with individual-star spectroscopy. We find that LTGs, ETGs, and classical dSphs follow the same radial acceleration relation: the observed acceleration (gobs) correlates with that expected from the distribution of baryons (gbar) over 4 dex. The relation coincides with the 1:1 line (no dark matter) at high accelerations but systematically deviates from unity below a critical scale of ~10^-10 m/s^2. The observed scatter is remarkably small (<0.13 dex) and largely driven by observational uncertainties. The residuals do not correlate with any global or local galaxy property (baryonic mass, gas fraction, radius, etc.). The radial acceleration relation is tantamount to a Natural Law: when the baryonic contribution is measured, the rotation curve follows, and vice versa. Including ultrafaint dSphs, the relation may extend by another 2 dex and possibly flatten at gbar<10^-12 m/s^2, but these data are significantly more uncertain. The radial acceleration relation subsumes and generalizes several well-known dynamical properties of galaxies, like the Tully-Fisher and Faber-Jackson relations, the baryon-halo conspiracies, and Renzos rule.

الفيزياء الفلكية من المجرات

One Representation to Rule Them All: Identifying Out-of-Support Examples in Few-shot Learning with Generic Representations

130 - Henry Kvinge , Scott Howland , Nico Courts 2021

The field of few-shot learning has made remarkable strides in developing powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds none-of-the-above examples. In this paper we describe this challenge of identifying what we term out-of-support (OOS) examples. We describe how this problem is subtly different from out-of-distribution detection and describe a new method of identifying OOS examples within the Prototypical Networks framework using a fixed point which we call the generic representation. We show that our method outperforms other existing approaches in the literature as well as other approaches that we propose in this paper. Finally, we investigate how the use of such a generic point affects the geometry of a models feature space.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط