ﻻ يوجد ملخص باللغة العربية
Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint relationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation~(TokenPose). In detail, each keypoint is explicitly embedded as a token to simultaneously learn constraint relationships and appearance cues from images. Extensive experiments show that the small and large TokenPose models are on par with state-of-the-art CNN-based counterparts while being more lightweight. Specifically, our TokenPose-S and TokenPose-L achieve $72.5$ AP and $75.8$ AP on COCO validation dataset respectively, with significant reduction in parameters ($downarrow80.6%$; $downarrow$ $56.8%$) and GFLOPs ($downarrow$ $75.3%$; $downarrow$ $24.7%$). Code is publicly available.
We propose a simple yet reliable bottom-up approach with a good trade-off between accuracy and efficiency for the problem of multi-person pose estimation. Given an image, we employ an Hourglass Network to infer all the keypoints from different person
The typical bottom-up human pose estimation framework includes two stages, keypoint detection and grouping. Most existing works focus on developing grouping algorithms, e.g., associative embedding, and pixel-wise keypoint regression that we adopt in
Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lo
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently cap
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects localiza