أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Junhao Zhang

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

121 - Junhao Zhang , Yali Wang , Zhipeng Zhou 2021

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size.

الرؤية الحاسوبية وتمييز الأنماط

Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud

130 - Mingye Xu , Zhipeng Zhou , Junhao Zhang 2021

This paper investigates the indistinguishable points (difficult to predict label) in semantic segmentation for large-scale 3D point clouds. The indistinguishable points consist of those located in complex boundary, points with similar local textures but different categories, and points in isolate small hard areas, which largely harm the performance of 3D semantic segmentation. To address this challenge, we propose a novel Indistinguishable Area Focalization Network (IAF-Net), which selects indistinguishable points adaptively by utilizing the hierarchical semantic features and enhances fine-grained features for points especially those indistinguishable points. We also introduce multi-stage loss to improve the feature representation in a progressive way. Moreover, in order to analyze the segmentation performances of indistinguishable areas, we propose a new evaluation metric called Indistinguishable Points Based Metric (IPBM). Our IAF-Net achieves the comparable results with state-of-the-art performance on several popular 3D point cloud datasets e.g. S3DIS and ScanNet, and clearly outperforms other methods on IPBM.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos

96 - Tianyu Luan , Yali Wang , Junhao Zhang 2021

The end-to-end Human Mesh Recovery (HMR) approach has been successfully used for 3D body reconstruction. However, most HMR-based frameworks reconstruct human body by directly learning mesh parameters from images or videos, while lacking explicit guid ance of 3D human pose in visual data. As a result, the generated mesh often exhibits incorrect pose for complex activities. To tackle this problem, we propose to exploit 3D pose to calibrate human mesh. Specifically, we develop two novel Pose Calibration frameworks, i.e., Serial PC-HMR and Parallel PC-HMR. By coupling advanced 3D pose estimators and HMR in a serial or parallel manner, these two frameworks can effectively correct human mesh with guidance of a concise pose calibration module. Furthermore, since the calibration module is designed via non-rigid pose transformation, our PC-HMR frameworks can flexibly tackle bone length variations to alleviate misplacement in the calibrated mesh. Finally, our frameworks are based on generic and complementary integration of data-driven learning and geometrical modeling. Via plug-and-play modules, they can be efficiently adapted for both image/video-based human mesh recovery. Additionally, they have no requirement of extra 3D pose annotations in the testing phase, which releases inference difficulties in practice. We perform extensive experiments on the popular bench-marks, i.e., Human3.6M, 3DPW and SURREAL, where our PC-HMR frameworks achieve the SOTA results.

الرؤية الحاسوبية وتمييز الأنماط

Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud

165 - Mutian Xu , Junhao Zhang , Zhipeng Zhou 2020

In 2D image processing, some attempts decompose images into high and low frequency components for describing edge and smooth parts respectively. Similarly, the contour and flat area of 3D objects, such as the boundary and seat area of a chair, descri be different but also complementary geometries. However, such investigation is lost in previous deep networks that understand point clouds by directly treating all points or local patches equally. To solve this problem, we propose Geometry-Disentangled Attention Network (GDANet). GDANet introduces Geometry-Disentangle Module to dynamically disentangle point clouds into the contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components. Then GDANet exploits Sharp-Gentle Complementary Attention Module that regards the features from sharp and gentle variation components as two holistic representations, and pays different attentions to them while fusing them respectively with original point cloud features. In this way, our method captures and refines the holistic and complementary 3D geometric semantics from two distinct disentangled components to supplement the local information. Extensive experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters. Code is released on https://github.com/mutianxu/GDANet.

الرؤية الحاسوبية وتمييز الأنماط

On the relationship between Gaussian stochastic blockmodels and label propagation algorithms

61 - Junhao Zhang , Tongfei Chen , Junfeng Hu 2014

The problem of community detection receives great attention in recent years. Many methods have been proposed to discover communities in networks. In this paper, we propose a Gaussian stochastic blockmodel that uses Gaussian distributions to fit weigh t of edges in networks for non-overlapping community detection. The maximum likelihood estimation of this model has the same objective function as general label propagation with node preference. The node preference of a specific vertex turns out to be a value proportional to the intra-community eigenvector centrality (the corresponding entry in principal eigenvector of the adjacency matrix of the subgraph inside that vertexs community) under maximum likelihood estimation. Additionally, the maximum likelihood estimation of a constrained version of our model is highly related to another extension of label propagation algorithm, namely, the label propagation algorithm under constraint. Experiments show that the proposed Gaussian stochastic blockmodel performs well on various benchmark networks.

الشبكات الاجتماعية والمعلومات الفيزياء والمجتمع

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد