ترغب بنشر مسار تعليمي؟ اضغط هنا

Visual-Textual Association with Hardest and Semi-Hard Negative Pairs Mining for Person Search

107   0   0.0 ( 0 )
 نشر من قبل Zhen Liu
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Searching persons in large-scale image databases with the query of natural language description is a more practical important applications in video surveillance. Intuitively, for person search, the core issue should be visual-textual association, which is still an extremely challenging task, due to the contradiction between the high abstraction of textual description and the intuitive expression of visual images. However, for this task, while positive image-text pairs are always well provided, most existing methods doesnt tackle this problem effectively by mining more reasonable negative pairs. In this paper, we proposed a novel visual-textual association approach with visual and textual attention, and cross-modality hardest and semi-hard negative pair mining. In order to evaluate the effectiveness and feasibility of the proposed approach, we conduct extensive experiments on typical person search datasdet: CUHK-PEDES, in which our approach achieves the top1 score of 55.32% as a new state-of-the-art. Besides, we also evaluate the semi-hard pair mining approach in COCO caption dataset, and validate the effectiveness and complementarity of the methods.

قيم البحث

اقرأ أيضاً

97 - Xin Xu , Lei Liu , Weifeng Liu 2020
Annotating a large-scale image dataset is very tedious, yet necessary for training person re-identification models. To alleviate such a problem, we present an active hard sample mining framework via training an effective re-ID model with the least la beling efforts. Considering that hard samples can provide informative patterns, we first formulate an uncertainty estimation to actively select hard samples to iteratively train a re-ID model from scratch. Then, intra-diversity estimation is designed to reduce the redundant hard samples by maximizing their diversity. Moreover, we propose a computer-assisted identity recommendation module embedded in the active hard sample mining framework to help human annotators to rapidly and accurately label the selected samples. Extensive experiments were carried out to demonstrate the effectiveness of our method on several public datasets. Experimental results indicate that our method can reduce 57%, 63%, and 49% annotation efforts on the Market1501, MSMT17, and CUHK03, respectively, while maximizing the performance of the re-ID model.
281 - Meng Li , Lin Wu , Arnold Wiliem 2019
Histopathology image analysis can be considered as a Multiple instance learning (MIL) problem, where the whole slide histopathology image (WSI) is regarded as a bag of instances (i.e, patches) and the task is to predict a single class label to the WS I. However, in many real-life applications such as computational pathology, discovering the key instances that trigger the bag label is of great interest because it provides reasons for the decision made by the system. In this paper, we propose a deep convolutional neural network (CNN) model that addresses the primary task of a bag classification on a WSI and also learns to identify the response of each instance to provide interpretable results to the final prediction. We incorporate the attention mechanism into the proposed model to operate the transformation of instances and learn attention weights to allow us to find key patches. To perform a balanced training, we introduce adaptive weighing in each training bag to explicitly adjust the weight distribution in order to concentrate more on the contribution of hard samples. Based on the learned attention weights, we further develop a solution to boost the classification performance by generating the bags with hard negative instances. We conduct extensive experiments on colon and breast cancer histopathology data and show that our framework achieves state-of-the-art performance.
Person search has recently emerged as a challenging task that jointly addresses pedestrian detection and person re-identification. Existing approaches follow a fully supervised setting where both bounding box and identity annotations are available. H owever, annotating identities is labor-intensive, limiting the practicability and scalability of current frameworks. This paper inventively considers weakly supervised person search with only bounding box annotations. We proposed the first framework to address this novel task, namely Context-Guided Person Search (CGPS), by investigating three levels of context clues (i.e., detection, memory and scene) in unconstrained natural images. The first two are employed to promote local and global discriminative capabilities, while the latter enhances clustering accuracy. Despite its simple design, our CGPS boosts the baseline model by 8.3% in mAP on CUHK-SYSU. Surprisingly, it even achieves comparable performance to two-step person search models, while displaying higher efficiency. Our code is available at https://github.com/ljpadam/CGPS.
An important component of unsupervised learning by instance-based discrimination is a memory bank for storing a feature representation for each training sample in the dataset. In this paper, we introduce 3 improvements to the vanilla memory bank-base d formulation which brings massive accuracy gains: (a) Large mini-batch: we pull multiple augmentations for each sample within the same batch and show that this leads to better models and enhanced memory bank updates. (b) Consistency: we enforce the logits obtained by different augmentations of the same sample to be close without trying to enforce discrimination with respect to negative samples as proposed by previous approaches. (c) Hard negative mining: since instance discrimination is not meaningful for samples that are too visually similar, we devise a novel nearest neighbour approach for improving the memory bank that gradually merges extremely similar data samples that were previously forced to be apart by the instance level classification loss. Overall, our approach greatly improves the vanilla memory-bank based instance discrimination and outperforms all existing methods for both seen and unseen testing categories with cosine similarity.
133 - Youbao Tang , Ke Yan , Yuxing Tang 2019
Automatic lesion detection from computed tomography (CT) scans is an important task in medical imaging analysis. It is still very challenging due to similar appearances (e.g. intensity and texture) between lesions and other tissues, making it especia lly difficult to develop a universal lesion detector. Instead of developing a specific-type lesion detector, this work builds a Universal Lesion Detector (ULDor) based on Mask R-CNN, which is able to detect all different kinds of lesions from whole body parts. As a state-of-the-art object detector, Mask R-CNN adds a branch for predicting segmentation masks on each Region of Interest (RoI) to improve the detection performance. However, it is almost impossible to manually annotate a large-scale dataset with pixel-level lesion masks to train the Mask R-CNN for lesion detection. To address this problem, this work constructs a pseudo mask for each lesion region that can be considered as a surrogate of the real mask, based on which the Mask R-CNN is employed for lesion detection. On the other hand, this work proposes a hard negative example mining strategy to reduce the false positives for improving the detection performance. Experimental results on the NIH DeepLesion dataset demonstrate that the ULDor is enhanced using pseudo masks and the proposed hard negative example mining strategy and achieves a sensitivity of 86.21% with five false positives per image.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا