HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces

50 0 0.0 ( 0 )

Download Cite

Added by Xu Tang

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yang Liu - Xu Tang - Xiang Wu

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Current face detectors utilize anchors to frame a multi-task learning problem which combines classification and bounding box regression. Effective anchor design and anchor matching strategy enable face detectors to localize faces under large pose and scale variations. However, we observe that more than 80% correctly predicted bounding boxes are regressed from the unmatched anchors (the IoUs between anchors and target faces are lower than a threshold) in the inference phase. It indicates that these unmatched anchors perform excellent regression ability, but the existing methods neglect to learn from them. In this paper, we propose an Online High-quality Anchor Mining Strategy (HAMBox), which explicitly helps outer faces compensate with high-quality anchors. Our proposed HAMBox method could be a general strategy for anchor-based single-stage face detection. Experiments on various datasets, including WIDER FACE, FDDB, AFW and PASCAL Face, demonstrate the superiority of the proposed method. Furthermore, our team win the championship on the Face Detection test track of WIDER Face and Pedestrian Challenge 2019. We will release the codes with PaddlePaddle.

rate research

Delving into Localization Errors for Monocular 3D Object Detection

132 - Xinzhu Ma , Yinmin Zhang , Dan Xu 2021

Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging. In this work, by intensive diagnosis experiments, we quantify the impact introduced by each sub-task and found the `localization error is the vital factor in restricting monocular 3D detection. Besides, we also investigate the underlying reasons behind localization errors, analyze the issues they might bring, and propose three strategies. First, we revisit the misalignment between the center of the 2D bounding box and the projected center of the 3D object, which is a vital factor leading to low localization accuracy. Second, we observe that accurately localizing distant objects with existing technologies is almost impossible, while those samples will mislead the learned network. To this end, we propose to remove such samples from the training set for improving the overall performance of the detector. Lastly, we also propose a novel 3D IoU oriented loss for the size estimation of the object, which is not affected by `localization error. We conduct extensive experiments on the KITTI dataset, where the proposed method achieves real-time detection and outperforms previous methods by a large margin. The code will be made available at: https://github.com/xinzhuma/monodle.

Computer Vision and Pattern Recognition

Delving into Inter-Image Invariance for Unsupervised Visual Representations

104 - Jiahao Xie , Xiaohang Zhan , Ziwei Liu 2020

Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies in this track mainly focus on intra-image invariance learning. The learning typically uses rich intra-image transformations to construct positive pairs and then maximizes agreement using a contrastive loss. The merits of inter-image invariance, conversely, remain much less explored. One major obstacle to exploit inter-image invariance is that it is unclear how to reliably construct inter-image positive pairs, and further derive effective supervision from them since there are no pair annotations available. In this work, we present a rigorous and comprehensive study on inter-image invariance learning from three main constituting components: pseudo-label maintenance, sampling strategy, and decision boundary design. Through carefully-designed comparisons and analysis, we propose a unified and generic framework that supports the integration of unsupervised intra- and inter-image invariance learning. With all the obtained recipes, our final model, namely InterCLR, shows consistent improvements over state-of-the-art intra-image invariance learning methods on multiple standard benchmarks. Codes will be released at https://github.com/open-mmlab/OpenSelfSup.

Computer Vision and Pattern Recognition Machine Learning

Detecting Photoshopped Faces by Scripting Photoshop

116 - Sheng-Yu Wang , Oliver Wang , Andrew Owens 2019

Most malicious photo manipulations are created using standard image editing tools, such as Adobe Photoshop. We present a method for detecting one very popular Photoshop manipulation -- image warping applied to human faces -- using a model trained entirely using fake images that were automatically generated by scripting Photoshop itself. We show that our model outperforms humans at the task of recognizing manipulated images, can predict the specific location of edits, and in some cases can be used to undo a manipulation to reconstruct the original, unedited image. We demonstrate that the system can be successfully applied to real, artist-created image manipulations.

Computer Vision and Pattern Recognition

Delving into Data: Effectively Substitute Training for Black-box Attack

78 - Wenxuan Wang , Bangjie Yin , Taiping Yao 2021

Deep models have shown their vulnerability when processing adversarial samples. As for the black-box attack, without access to the architecture and weights of the attacked model, training a substitute model for adversarial attacks has attracted wide attention. Previous substitute training approaches focus on stealing the knowledge of the target model based on real training data or synthetic data, without exploring what kind of data can further improve the transferability between the substitute and target models. In this paper, we propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process. More specifically, a diverse data generation module is proposed to synthesize large-scale data with wide distribution. And adversarial substitute training strategy is introduced to focus on the data distributed near the decision boundary. The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack. Extensive experiments demonstrate the efficacy of our method against state-of-the-art competitors under non-target and target attack settings. Detailed visualization and analysis are also provided to help understand the advantage of our method.

Computer Vision and Pattern Recognition

Delving Deep into Liver Focal Lesion Detection: A Preliminary Study

133 - Jiechao Ma , Yingqian Chen , Yu Chen 2019

Hepatocellular carcinoma (HCC) is the second most frequent cause of malignancy-related death and is one of the diseases with the highest incidence in the world. Because the liver is the only organ in the human body that is supplied by two major vessels: the hepatic artery and the portal vein, various types of malignant tumors can spread from other organs to the liver. And due to the liver masses heterogeneous and diffusive shape, the tumor lesions are very difficult to be recognized, thus automatic lesion detection is necessary for the doctors with huge workloads. To assist doctors, this work uses the existing large-scale annotation medical image data to delve deep into liver lesion detection from multiple directions. To solve technical difficulties, such as the image-recognition task, traditional deep learning with convolution neural networks (CNNs) has been widely applied in recent years. However, this kind of neural network, such as Faster Regions with CNN features (R-CNN), cannot leverage the spatial information because it is applied in natural images (2D) rather than medical images (3D), such as computed tomography (CT) images. To address this issue, we propose a novel algorithm that is appropriate for liver CT imaging. Furthermore, according to radiologists experience in clinical diagnosis and the characteristics of CT images of liver cancer, a liver cancer-detection framework with CNN, including image processing, feature extraction, region proposal, image registration, and classification recognition, was proposed to facilitate the effective detection of liver lesions.

Computer Vision and Pattern Recognition