ترغب بنشر مسار تعليمي؟ اضغط هنا

Mining the Benefits of Two-stage and One-stage HOI Detection

151   0   0.0 ( 0 )
 نشر من قبل Yue Liao
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years. Recently, one-stage HOI detection methods have become popular. In this paper, we aim to explore the essential pros and cons of two-stage and one-stage methods. With this as the goal, we find that conventional two-stage methods mainly suffer from positioning positive interactive human-object pairs, while one-stage methods are challenging to make an appropriate trade-off on multi-task learning, i.e., object detection, and interaction classification. Therefore, a core problem is how to take the essence and discard the dregs from the conventional two types of methods. To this end, we propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner. In detail, we first design a human-object pair generator based on a state-of-the-art one-stage HOI detector by removing the interaction classification module or head and then design a relatively isolated interaction classifier to classify each human-object pair. Two cascade decoders in our proposed framework can focus on one specific task, detection or interaction classification. In terms of the specific implementation, we adopt a transformer-based HOI detector as our base model. The newly introduced disentangling paradigm outperforms existing methods by a large margin, with a significant relative mAP gain of 9.32% on HICO-Det.



قيم البحث

اقرأ أيضاً

93 - Xin Lu , Quanquan Li , Buyu Li 2020
Modern object detection methods can be divided into one-stage approaches and two-stage ones. One-stage detectors are more efficient owing to straightforward architectures, but the two-stage detectors still take the lead in accuracy. Although recent w ork try to improve the one-stage detectors by imitating the structural design of the two-stage ones, the accuracy gap is still significant. In this paper, we propose MimicDet, a novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detectors. Unlike conventional mimic methods, MimicDet has a shared backbone for one-stage and two-stage detectors, then it branches into two heads which are well designed to have compatible features for mimicking. Thus MimicDet can be end-to-end trained without the pre-train of the teacher network. And the cost does not increase much, which makes it practical to adopt large networks as backbones. We also make several specialized designs such as dual-path mimicking and staggered feature pyramid to facilitate the mimicking process. Experiments on the challenging COCO detection benchmark demonstrate the effectiveness of MimicDet. It achieves 46.1 mAP with ResNeXt-101 backbone on the COCO test-dev set, which significantly surpasses current state-of-the-art methods.
We develop a probabilistic interpretation of two-stage object detection. We show that this probabilistic interpretation motivates a number of common empirical training practices. It also suggests changes to two-stage detection pipelines. Specifically , the first stage should infer proper object-vs-background likelihoods, which should then inform the overall score of the detector. A standard region proposal network (RPN) cannot infer this likelihood sufficiently well, but many one-stage detectors can. We show how to build a probabilistic two-stage detector from any state-of-the-art one-stage detector. The resulting detectors are faster and more accurate than both their one- and two-stage precursors. Our detector achieves 56.4 mAP on COCO test-dev with single-scale testing, outperforming all published results. Using a lightweight backbone, our detector achieves 49.2 mAP on COCO at 33 fps on a Titan Xp, outperforming the popular YOLOv4 model.
Anomaly detection from a single image is challenging since anomaly data is always rare and can be with highly unpredictable types. With only anomaly-free data available, most existing methods train an AutoEncoder to reconstruct the input image and fi nd the difference between the input and output to identify the anomalous region. However, such methods face a potential problem - a coarse reconstruction generates extra image differences while a high-fidelity one may draw in the anomaly. In this paper, we solve this contradiction by proposing a two-stage approach, which generates high-fidelity yet anomaly-free reconstructions. Our Unsupervised Two-stage Anomaly Detection (UTAD) relies on two technical components, namely the Impression Extractor (IE-Net) and the Expert-Net. The IE-Net and Expert-Net accomplish the two-stage anomaly-free image reconstruction task while they also generate intuitive intermediate results, making the whole UTAD interpretable. Extensive experiments show that our method outperforms state-of-the-arts on four anomaly detection datasets with different types of real-world objects and textures.
One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between learning task-interactive and task-specific features, as well as a greater flexibility to learn the alignment via a task-aligned predictor. Second, we propose Task Alignment Learning (TAL) to explicitly pull closer (or even unify) the optimal anchors for the two tasks during training via a designed sample assignment scheme and a task-aligned loss. Extensive experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing. This surpasses the recent one-stage detectors by a large margin, such as ATSS (47.7 AP), GFL (48.2 AP), and PAA (49.0 AP), with fewer parameters and FLOPs. Qualitative results also demonstrate the effectiveness of TOOD for better aligning the tasks of object classification and localization. Code is available at https://github.com/fcjian/TOOD.
141 - Yang Yang , Min Li , Bo Meng 2021
One-stage object detectors rely on a point feature to predict the detection results. However, the point feature often lacks the information of the whole object, thereby leading to a misalignment between the object and the point feature. Meanwhile, th e classification and regression tasks are sensitive to different object regions, but their features are spatially aligned. Both of these two problems hinder the detection performance. In order to solve these two problems, we propose a simple and plug-in operator that can generate aligned and disentangled features for each task, respectively, without breaking the fully convolutional manner. By predicting two task-aware point sets that are located in each sensitive region, the proposed operator can align the point feature with the object and disentangle the two tasks from the spatial dimension. We also reveal an interesting finding of the opposite effect of the long-range skip connection for classification and regression. On the basis of the Object-Aligned and Task-disentangled operator (OAT), we propose OAT-Net, which explicitly exploits point-set features for accurate detection results. Extensive experiments on the MS-COCO dataset show that OAT can consistently boost different state-of-the-art one-stage detectors by $sim$2 AP. Notably, OAT-Net with Res2Net-101-DCN backbone achieves 53.7 AP on the COCO test-dev.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا