ﻻ يوجد ملخص باللغة العربية
Computer vision is difficult, partly because the desired mathematical function connecting input and output data is often complex, fuzzy and thus hard to learn. Coarse-to-fine (C2F) learning is a promising direction, but it remains unclear how it is applied to a wide range of vision problems. This paper presents a generalized C2F framework by making two technical contributions. First, we provide a unified way of C2F propagation, in which the coarse prediction (a class vector, a detected box, a segmentation mask, etc.) is encoded into a dense (pixel-level) matrix and concatenated to the original input, so that the fine model takes the same design of the coarse model but sees additional information. Second, we present a progressive training strategy which starts with feeding the ground-truth instead of the coarse output into the fine model, and gradually increases the fraction of coarse output, so that at the end of training the fine model is ready for testing. We also relate our approach to curriculum learning by showing that data difficulty keeps increasing during the training process. We apply our framework to three vision tasks including image classification, object localization and semantic segmentation, and demonstrate consistent accuracy gain compared to the baseline training strategy.
Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works mainly tackle this problem by focusing on how to locate the most discrim
Existing image-to-image transformation approaches primarily focus on synthesizing visually pleasing data. Generating images with correct identity labels is challenging yet much less explored. It is even more challenging to deal with image transformat
Fine-grained visual classification aims to recognize images belonging to multiple sub-categories within a same category. It is a challenging task due to the inherently subtle variations among highly-confused categories. Most existing methods only tak
Few-shot learning aims at rapidly adapting to novel categories with only a handful of samples at test time, which has been predominantly tackled with the idea of meta-learning. However, meta-learning approaches essentially learn across a variety of f
Facial landmark localization plays an important role in face recognition and analysis applications. In this paper, we give a brief introduction to a coarse-to-fine pipeline with neural networks and sequential regression. First, a global convolutional