ﻻ يوجد ملخص باللغة العربية
Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets and the results show that our method significantly outperforms the state-of-the-art.
Modern Earth Observation systems provide sensing data at different temporal and spatial resolutions. Among optical sensors, today the Sentinel-2 program supplies high-resolution temporal (every 5 days) and high spatial resolution (10m) images that ca
The widespread dissemination of forged images generated by Deepfake techniques has posed a serious threat to the trustworthiness of digital information. This demands effective approaches that can detect perceptually convincing Deepfakes generated by
Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing bench
Person Re-identification (re-id) aims to match people across non-overlapping camera views in a public space. It is a challenging problem because many people captured in surveillance videos wear similar clothes. Consequently, the differences in their
Unsupervised evaluation of segmentation quality is a crucial step in image segmentation applications. Previous unsupervised evaluation methods usually lacked the adaptability to multi-scale segmentation. A scale-constrained evaluation method that eva