ﻻ يوجد ملخص باللغة العربية
Virtually all of deep learning literature relies on the assumption of large amounts of available training data. Indeed, even the majority of few-shot learning methods rely on a large set of base classes for pretraining. This assumption, however, does not always hold. For some tasks, annotating a large number of classes can be infeasible, and even collecting the images themselves can be a challenge in some scenarios. In this paper, we study this problem and call it Small Data setting, in contrast to Big Data. To unlock the full potential of small data, we propose to augment the models with annotations for other related tasks, thus increasing their generalization abilities. In particular, we use the richly annotated scene parsing dataset ADE20K to construct our realistic Long-tail Recognition with Diverse Supervision (LRDS) benchmark by splitting the object categories into head and tail based on their distribution. Following the standard few-shot learning protocol, we use the head classes for representation learning and the tail classes for evaluation. Moreover, we further subsample the head categories and images to generate two novel settings which we call Scarce-Class and Scarce-Image, respectively corresponding to the shortage of samples for rare classes and training images. Finally, we analyze the effect of applying various additional supervision sources under the proposed settings. Our experiments demonstrate that densely labeling a small set of images can indeed largely remedy the small data constraints.
Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a foundation of many multimedia retrieval systems. A common approach is to rely on Product Quantization, which allows the storage of large vector databases in memory and efficient d
Lyman-$alpha$ (Ly$alpha$) is a powerful astrophysical probe. Not only is it ubiquitous at high redshifts, it is also a resonant line, making Ly$alpha$ photons scatter. This scattering process depends on the physical conditions of the gas through whic
Existing long-tailed recognition methods, aiming to train class-balance models from long-tailed data, generally assume the models would be evaluated on the uniform test class distribution. However, the practical test class distribution often violates
As tons of photos are being uploaded to public websites (e.g., Flickr, Bing, and Google) every day, learning from web data has become an increasingly popular research direction because of freely available web resources, which is also referred to as w
Recently, generative data-free quantization emerges as a practical approach that compresses the neural network to low bit-width without access to real data. It generates data to quantize the network by utilizing the batch normalization (BN) statistic