أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Peng Zhou

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

121 - Junhao Zhang , Yali Wang , Zhipeng Zhou 2021

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size.

الرؤية الحاسوبية وتمييز الأنماط

Digging into Uncertainty in Self-supervised Multi-view Stereo

141 - Hongbin Xu , Zhipeng Zhou , Yali Wang 2021

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pre text task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.

الرؤية الحاسوبية وتمييز الأنماط

Sparse Bayesian Deep Learning for Dynamic System Identification

210 - Hongpeng Zhou , Chahine Ibrahim , Wei Xing Zheng 2021

This paper proposes a sparse Bayesian treatment of deep neural networks (DNNs) for system identification. Although DNNs show impressive approximation ability in various fields, several challenges still exist for system identification problems. First, DNNs are known to be too complex that they can easily overfit the training data. Second, the selection of the input regressors for system identification is nontrivial. Third, uncertainty quantification of the model parameters and predictions are necessary. The proposed Bayesian approach offers a principled way to alleviate the above challenges by marginal likelihood/model evidence approximation and structured group sparsity-inducing priors construction. The identification algorithm is derived as an iterative regularized optimization procedure that can be solved as efficiently as training typical DNNs. Furthermore, a practical calculation approach based on the Monte-Carlo integration method is derived to quantify the uncertainty of the parameters and predictions. The effectiveness of the proposed Bayesian approach is demonstrated on several linear and nonlinear systems identification benchmarks with achieving good and competitive simulation accuracy.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم

Uniformly bounded fibred coarse embeddability and uniformly bounded a-T-menability

124 - Jianguo Zhang , Dapeng Zhou 2021

In this paper, we introduce the concept of uniformly bounded fibred coarse embeddability of metric spaces, generalizing the notion of fibred coarse embeddability defined by X. Chen, Q. Wang and G. Yu. Moreover, we show its relationship with uniformly bounded a-T-menability of groups. Finally, we give some examples to illustrate the differences between uniformly bounded fibred coarse embeddability and fibred coarse embeddability.

تحليل وظيفي نظرية المجموعة هندسة القياسات

Optimizing the Numbers of Queries and Replies in Federated Learning with Differential Privacy

131 - Yipeng Zhou , Xuezheng Liu , Yao Fu 2021

Federated learning (FL) empowers distributed clients to collaboratively train a shared machine learning model through exchanging parameter information. Despite the fact that FL can protect clients raw data, malicious users can still crack original da ta with disclosed parameters. To amend this flaw, differential privacy (DP) is incorporated into FL clients to disturb original parameters, which however can significantly impair the accuracy of the trained model. In this work, we study a crucial question which has been vastly overlooked by existing works: what are the optimal numbers of queries and replies in FL with DP so that the final model accuracy is maximized. In FL, the parameter server (PS) needs to query participating clients for multiple global iterations to complete training. Each client responds a query from the PS by conducting a local iteration. Our work investigates how many times the PS should query clients and how many times each client should reply the PS. We investigate two most extensively used DP mechanisms (i.e., the Laplace mechanism and Gaussian mechanisms). Through conducting convergence rate analysis, we can determine the optimal numbers of queries and replies in FL with DP so that the final model accuracy can be maximized. Finally, extensive experiments are conducted with publicly available datasets: MNIST and FEMNIST, to verify our analysis and the results demonstrate that properly setting the numbers of queries and replies can significantly improve the final model accuracy in FL with DP.

التعلم الآلي التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

An Efficient Training Approach for Very Large Scale Face Recognition

94 - Kai Wang , Shuo Wang , Zhipeng Zhou 2021

Face recognition has achieved significant progress in deep-learning era due to the ultra-large-scale and well-labeled datasets. However, training on ultra-large-scale datasets is time-consuming and takes up a lot of hardware resource. Therefore, designing an efficient training approach is crucial and indispensable. The heavy computational and memory costs mainly result from the high dimensionality of the Fully-Connected (FC) layer. Specifically, the dimensionality is determined by the number of face identities, which can be million-level or even more. To this end, we propose a novel training approach for ultra-large-scale face datasets, termed Faster Face Classification (F$^2$C). In F$^2$C, we first define a Gallery Net and a Probe Net that are used to generate identities centers and extract faces features for face recognition, respectively. Gallery Net has the same structure as Probe Net and inherits the parameters from Probe Net with a moving average paradigm. After that, to reduce the training time and hardware costs of the FC layer, we propose a Dynamic Class Pool (DCP) that stores the features from Gallery Net and calculates the inner product (logits) with positive samples (whose identities are in the DCP) in each mini-batch. DCP can be regarded as a substitute for the FC layer but it is far smaller, thus greatly reducing the computational and memory costs. For negative samples (whose identities are not in DCP), we minimize the cosine similarities between negative samples and those in DCP. Then, to improve the update efficiency of DCPs parameters, we design a dual data-loader including identity-based and instance-based loaders to generate a certain of identities and samples in mini-batches.

الرؤية الحاسوبية وتمييز الأنماط

Slashing Communication Traffic in Federated Learning by Transmitting Clustered Model Updates

78 - Laizhong Cui , Xiaoxin Su , Yipeng Zhou 2021

Federated Learning (FL) is an emerging decentralized learning framework through which multiple clients can collaboratively train a learning model. However, a major obstacle that impedes the wide deployment of FL lies in massive communication traffic. To train high dimensional machine learning models (such as CNN models), heavy communication traffic can be incurred by exchanging model updates via the Internet between clients and the parameter server (PS), implying that the network resource can be easily exhausted. Compressing model updates is an effective way to reduce the traffic amount. However, a flexible unbiased compression algorithm applicable for both uplink and downlink compression in FL is still absent from existing works. In this work, we devise the Model Update Compression by Soft Clustering (MUCSC) algorithm to compress model updates transmitted between clients and the PS. In MUCSC, it is only necessary to transmit cluster centroids and the cluster ID of each model update. Moreover, we prove that: 1) The compressed model updates are unbiased estimation of their original values so that the convergence rate by transmitting compressed model updates is unchanged; 2) MUCSC can guarantee that the influence of the compression error on the model accuracy is minimized. Then, we further propose the boosted MUCSC (B-MUCSC) algorithm, a biased compression algorithm that can achieve an extremely high compression rate by grouping insignificant model updates into a super cluster. B-MUCSC is suitable for scenarios with very scarce network resource. Ultimately, we conduct extensive experiments with the CIFAR-10 and FEMNIST datasets to demonstrate that our algorithms can not only substantially reduce the volume of communication traffic in FL, but also improve the training efficiency in practical networks.

التعلم الآلي

A Generalized Tunneling Current Formula for Metal/Insulator Heterojunctions under Large Bias and Finite Temperature

98 - Zenghua Cai , Menglin Huang , Peng Zhou 2021

The Fowler-Nordheim tunneling current formula has been widely used in the design of devices based on metal/insulator (metal/semiconductor) heterojunctions with triangle potential barriers, such as the flash memory. Here we adopt the model that was us ed to derive the Landauer formula at finite temperature, the nearly-free electron approximation to describe the electronic states in semi-infinite metal electrode and the Wentzel-Kramers-Brillouin (WKB) approximation to describe the transmission coefficient, and derive a tunneling current formula for metal/insulator heterojunctions under large bias and electric field. In contrast to the Fowler-Nordheim formula which is the limit at zero temperature, our formula is generalized to the finite temperature (with the thermal excitation of electrons in metal electrode considered) and the potential barriers beyond triangle ones, which may be used for the design of more complicated heterojunction devices based on the carrier tunneling.

الفيزياء ميسكالي وننكالي علم المواد

Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud

130 - Mingye Xu , Zhipeng Zhou , Junhao Zhang 2021

This paper investigates the indistinguishable points (difficult to predict label) in semantic segmentation for large-scale 3D point clouds. The indistinguishable points consist of those located in complex boundary, points with similar local textures but different categories, and points in isolate small hard areas, which largely harm the performance of 3D semantic segmentation. To address this challenge, we propose a novel Indistinguishable Area Focalization Network (IAF-Net), which selects indistinguishable points adaptively by utilizing the hierarchical semantic features and enhances fine-grained features for points especially those indistinguishable points. We also introduce multi-stage loss to improve the feature representation in a progressive way. Moreover, in order to analyze the segmentation performances of indistinguishable areas, we propose a new evaluation metric called Indistinguishable Points Based Metric (IPBM). Our IAF-Net achieves the comparable results with state-of-the-art performance on several popular 3D point cloud datasets e.g. S3DIS and ScanNet, and clearly outperforms other methods on IPBM.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Frustrated Arrays of Nanomagnets for Efficient Reservoir Computing

106 - Alexander J. Edwards , Dhritiman Bhattacharya , Peng Zhou 2021

We simulated our nanomagnet reservoir computer (NMRC) design on benchmark tasks, demonstrating NMRCs high memory content and expressibility. In support of the feasibility of this method, we fabricated a frustrated nanomagnet reservoir layer. Using th is structure, we describe a low-power, low-area system with an area-energy-delay product $10^7$ lower than conventional RC systems, that is therefore promising for size, weight, and power (SWaP) constrained applications.

الحوسبة العصبية والتطورية التقنيات الناشئة الفيزياء التطبيقية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد