ﻻ يوجد ملخص باللغة العربية
In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We therefore propose a novel method that jointly models multi-person pose estimation and tracking in a single formulation. To this end, we represent body joint detections in a video by a spatio-temporal graph and solve an integer linear program to partition the graph into sub-graphs that correspond to plausible body pose trajectories for each person. The proposed approach implicitly handles occlusion and truncation of persons. Since the problem has not been addressed quantitatively in the literature, we introduce a challenging Multi-Person PoseTrack dataset, and also propose a completely unconstrained evaluation protocol that does not make any assumptions about the scale, size, location or the number of persons. Finally, we evaluate the proposed approach and several baseline methods on our new dataset.
Human poses and motions are important cues for analysis of videos with people and there is strong evidence that representations based on body pose are highly effective for a variety of tasks such as activity recognition, content retrieval and social
Despite of the recent success of neural networks for human pose estimation, current approaches are limited to pose estimation of a single person and cannot handle humans in groups or crowds. In this work, we propose a method that estimates the poses
This paper considers the task of articulated human pose estimation of multiple people in real world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines. It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in t
Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person po