Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation

216 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wentao Liu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Wei Feng - Wentao Liu - Tong Li

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects localizations provide guidance for pose estimation. In this paper, we propose a turbo learning framework to perform HOI recognition and pose estimation simultaneously. First, two modules are designed to enforce message passing between the tasks, i.e. pose aware HOI recognition module and HOI guided pose estimation module. Then, these two modules form a closed loop to utilize the complementary information iteratively, which can be trained in an end-to-end manner. The proposed method achieves the state-of-the-art performance on two public benchmarks including Verbs in COCO (V-COCO) and HICO-DET datasets.

قيم البحث

130 - Mihai Fieraru , Anna Khoreva , Leonid Pishchulin 2018

Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lo t of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model hard human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.

الرؤية الحاسوبية وتمييز الأنماط

TokenPose: Learning Keypoint Tokens for Human Pose Estimation

204 - Yanjie Li , Shoukui Zhang , Zhicheng Wang 2021

Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint rel ationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation~(TokenPose). In detail, each keypoint is explicitly embedded as a token to simultaneously learn constraint relationships and appearance cues from images. Extensive experiments show that the small and large TokenPose models are on par with state-of-the-art CNN-based counterparts while being more lightweight. Specifically, our TokenPose-S and TokenPose-L achieve $72.5$ AP and $75.8$ AP on COCO validation dataset respectively, with significant reduction in parameters ($downarrow80.6%$; $downarrow$ $56.8%$) and GFLOPs ($downarrow$ $75.3%$; $downarrow$ $24.7%$). Code is publicly available.

الرؤية الحاسوبية وتمييز الأنماط

Learning to Detect Human-Object Interactions

96 - Yu-Wei Chao , Yunfan Liu , Xieyang Liu 2017

We study the problem of detecting human-object interactions (HOI) in static images, defined as predicting a human and an object bounding box with an interaction class label that connects them. HOI detection is a fundamental problem in computer vision as it provides semantic information about the interactions among the detected objects. We introduce HICO-DET, a new large benchmark for HOI detection, by augmenting the current HICO classification benchmark with instance annotations. To solve the task, we propose Human-Object Region-based Convolutional Neural Networks (HO-RCNN). At the core of our HO-RCNN is the Interaction Pattern, a novel DNN input that characterizes the spatial relations between two bounding boxes. Experiments on HICO-DET demonstrate that our HO-RCNN, by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches.

الرؤية الحاسوبية وتمييز الأنماط

Composite Localization for Human Pose Estimation

93 - ZiFan Chen , Xin Qin , Chao Yang 2021

The existing human pose estimation methods are confronted with inaccurate long-distance regression or high computational cost due to the complex learning objectives. This work proposes a novel deep learning framework for human pose estimation called composite localization to divide the complex learning objective into two simpler ones: a sparse heatmap to find the keypoints approximate location and two short-distance offsetmaps to obtain its final precise coordinates. To realize the framework, we construct two types of composite localization networks: CLNet-ResNet and CLNet-Hourglass. We evaluate the networks on three benchmark datasets, including the Leeds Sports Pose dataset, the MPII Human Pose dataset, and the COCO keypoints detection dataset. The experimental results show that our CLNet-ResNet50 outperforms SimpleBaseline by 1.14% with about 1/2 GFLOPs. Our CLNet-Hourglass outperforms the original stacked-hourglass by 4.45% on COCO.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Deep Learning-Based Human Pose Estimation: A Survey

190 - Ce Zheng , Wenhan Wu , Taojiannan Yang 2020

Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide ra nge of applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently developed deep learning-based solutions have achieved high performance in human pose estimation, there still remain challenges due to insufficient training data, depth ambiguities, and occlusion. The goal of this survey paper is to provide a comprehensive review of recent deep learning-based solutions for both 2D and 3D pose estimation via a systematic analysis and comparison of these solutions based on their input data and inference procedures. More than 240 research papers since 2014 are covered in this survey. Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included. Quantitative performance comparisons of the reviewed methods on popular datasets are summarized and discussed. Finally, the challenges involved, applications, and future research directions are concluded. We also provide a regularly updated project page: url{https://github.com/zczcwh/DL-HPE}

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي الوسائط المتعددة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة اليرموك الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً