ﻻ يوجد ملخص باللغة العربية
The existing human pose estimation methods are confronted with inaccurate long-distance regression or high computational cost due to the complex learning objectives. This work proposes a novel deep learning framework for human pose estimation called composite localization to divide the complex learning objective into two simpler ones: a sparse heatmap to find the keypoints approximate location and two short-distance offsetmaps to obtain its final precise coordinates. To realize the framework, we construct two types of composite localization networks: CLNet-ResNet and CLNet-Hourglass. We evaluate the networks on three benchmark datasets, including the Leeds Sports Pose dataset, the MPII Human Pose dataset, and the COCO keypoints detection dataset. The experimental results show that our CLNet-ResNet50 outperforms SimpleBaseline by 1.14% with about 1/2 GFLOPs. Our CLNet-Hourglass outperforms the original stacked-hourglass by 4.45% on COCO.
We propose a new bottom-up method for multi-person 2D human pose estimation that is particularly well suited for urban mobility such as self-driving cars and delivery robots. The new method, PifPaf, uses a Part Intensity Field (PIF) to localize body
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently cap
Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. While weakly-supervised methods require less supervision, by utilizing 2D poses or multi-view imagery without annotations, they still need
Human pose estimation is a major computer vision problem with applications ranging from augmented reality and video capture to surveillance and movement tracking. In the medical context, the latter may be an important biomarker for neurological impai
Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation. However, in the field of human