ترغب بنشر مسار تعليمي؟ اضغط هنا

Think about boundary: Fusing multi-level boundary information for landmark heatmap regression

58   0   0.0 ( 0 )
 نشر من قبل Jinheng Xie
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Although current face alignment algorithms have obtained pretty good performances at predicting the location of facial landmarks, huge challenges remain for faces with severe occlusion and large pose variations, etc. On the contrary, semantic location of facial boundary is more likely to be reserved and estimated on these scenes. Therefore, we study a two-stage but end-to-end approach for exploring the relationship between the facial boundary and landmarks to get boundary-aware landmark predictions, which consists of two modules: the self-calibrated boundary estimation (SCBE) module and the boundary-aware landmark transform (BALT) module. In the SCBE module, we modify the stem layers and employ intermediate supervision to help generate high-quality facial boundary heatmaps. Boundary-aware features inherited from the SCBE module are integrated into the BALT module in a multi-scale fusion framework to better model the transformation from boundary to landmark heatmap. Experimental results conducted on the challenging benchmark datasets demonstrate that our approach outperforms state-of-the-art methods in the literature.



قيم البحث

اقرأ أيضاً

Although heatmap regression is considered a state-of-the-art method to locate facial landmarks, it suffers from huge spatial complexity and is prone to quantization error. To address this, we propose a novel attentive one-dimensional heatmap regressi on method for facial landmark localization. First, we predict two groups of 1D heatmaps to represent the marginal distributions of the x and y coordinates. These 1D heatmaps reduce spatial complexity significantly compared to current heatmap regression methods, which use 2D heatmaps to represent the joint distributions of x and y coordinates. With much lower spatial complexity, the proposed method can output high-resolution 1D heatmaps despite limited GPU memory, significantly alleviating the quantization error. Second, a co-attention mechanism is adopted to model the inherent spatial patterns existing in x and y coordinates, and therefore the joint distributions on the x and y axes are also captured. Third, based on the 1D heatmap structures, we propose a facial landmark detector capturing spatial patterns for landmark detection on an image; and a tracker further capturing temporal patterns with a temporal refinement mechanism for landmark tracking. Experimental results on four benchmark databases demonstrate the superiority of our method.
Deep learning methods have achieved excellent performance in pose estimation, but the lack of robustness causes the keypoints to change drastically between similar images. In view of this problem, a stable heatmap regression method is proposed to all eviate network vulnerability to small perturbations. We utilize the correlation between different rows and columns in a heatmap to alleviate the multi-peaks problem, and design a highly differentiated heatmap regression to make a keypoint discriminative from surrounding points. A maximum stability training loss is used to simplify the optimization difficulty when minimizing the prediction gap of two similar images. The proposed method achieves a significant advance in robustness over state-of-the-art approaches on two benchmark datasets and maintains high performance.
Heatmap regression has become the most prevalent choice for nowadays human pose estimation methods. The ground-truth heatmaps are usually constructed via covering all skeletal keypoints by 2D gaussian kernels. The standard deviations of these kernels are fixed. However, for bottom-up methods, which need to handle a large variance of human scales and labeling ambiguities, the current practice seems unreasonable. To better cope with these problems, we propose the scale-adaptive heatmap regression (SAHR) method, which can adaptively adjust the standard deviation for each keypoint. In this way, SAHR is more tolerant of various human scales and labeling ambiguities. However, SAHR may aggravate the imbalance between fore-background samples, which potentially hurts the improvement of SAHR. Thus, we further introduce the weight-adaptive heatmap regression (WAHR) to help balance the fore-background samples. Extensive experiments show that SAHR together with WAHR largely improves the accuracy of bottom-up human pose estimation. As a result, we finally outperform the state-of-the-art model by +1.5AP and achieve 72.0AP on COCO test-dev2017, which is com-arable with the performances of most top-down methods. Source codes are available at https://github.com/greatlog/SWAHR-HumanPose.
Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is m uch less straightforward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only an RGB image. In this paper we propose a new method for 3D hand pose estimation from a monocular image through a novel 2.5D pose representation. Our new representation estimates pose up to a scaling factor, which can be estimated additionally if a prior of the hand size is given. We implicitly learn depth maps and heatmap distributions with a novel CNN architecture. Our system achieves the state-of-the-art estimation of 2D and 3D hand pose on several challenging datasets in presence of severe occlusions.
The screen content images (SCIs) usually comprise various content types with sharp edges, in which the artifacts or distortions can be well sensed by the vanilla structure similarity measurement in a full reference manner. Nonetheless, almost all of the current SOTA structure similarity metrics are locally formulated in a single-level manner, while the true human visual system (HVS) follows the multi-level manner, and such mismatch could eventually prevent these metrics from achieving trustworthy quality assessment. To ameliorate, this paper advocates a novel solution to measure structure similarity globally from the perspective of sparse representation. To perform multi-level quality assessment in accordance with the real HVS, the above-mentioned global metric will be integrated with the conventional local ones by resorting to the newly devised selective deep fusion network. To validate its efficacy and effectiveness, we have compared our method with 12 SOTA methods over two widely-used large-scale public SCI datasets, and the quantitative results indicate that our method yields significantly higher consistency with subjective quality score than the currently leading works. Both the source code and data are also publicly available to gain widespread acceptance and facilitate new advancement and its validation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا