ﻻ يوجد ملخص باللغة العربية
This paper provides a comprehensive and exhaustive study of adversarial attacks on human pose estimation models and the evaluation of their robustness. Besides highlighting the important differences between well-studied classification and human pose-estimation systems w.r.t. adversarial attacks, we also provide deep insights into the design choices of pose-estimation systems to shape future work. We benchmark the robustness of several 2D single person pose-estimation architectures trained on multiple datasets, MPII and COCO. In doing so, we also explore the problem of attacking non-classification networks including regression based networks, which has been virtually unexplored in the past. par We find that compared to classification and semantic segmentation, human pose estimation architectures are relatively robust to adversarial attacks with the single-step attacks being surprisingly ineffective. Our study shows that the heatmap-based pose-estimation models are notably robust than their direct regression-based systems and that the systems which explicitly model anthropomorphic semantics of human body fare better than their other counterparts. Besides, targeted attacks are more difficult to obtain than un-targeted ones and some body-joints are easier to fool than the others. We present visualizations of universal perturbations to facilitate unprecedented insights into their workings on pose-estimation. Additionally, we show them to generalize well across different networks. Finally we perform a user study about perceptibility of these examples.
Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lo
The existing human pose estimation methods are confronted with inaccurate long-distance regression or high computational cost due to the complex learning objectives. This work proposes a novel deep learning framework for human pose estimation called
This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet. As existing datasets do not have whole-body annotations, previous methods
In the presence of annotated data, deep human pose estimation networks yield impressive performance. Nevertheless, annotating new data is extremely time-consuming, particularly in real-world conditions. Here, we address this by leveraging contrastive
Estimating three-dimensional human poses from the positions of two-dimensional joints has shown promising results.However, using two-dimensional joint coordinates as input loses more information than image-based approaches and results in ambiguity.In