ﻻ يوجد ملخص باللغة العربية
This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifying images at a finer granularity than that available at train time. It also improves the accuracy for transfer learning tasks to fine-grained datasets, thereby establishing the new state of the art on five public benchmarks, like iNaturalist-2018.
Diabetic retinopathy (DR) is one of the leading causes of blindness. However, no specific symptoms of early DR lead to a delayed diagnosis, which results in disease progression in patients. To determine the disease severity levels, ophthalmologists n
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient pr
Bilinear feature transformation has shown the state-of-the-art performance in learning fine-grained image representations. However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which
Image completion has achieved significant progress due to advances in generative adversarial networks (GANs). Albeit natural-looking, the synthesized contents still lack details, especially for scenes with complex structures or images with large hole
Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent words/phrases, resu