أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Li Wang

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

121 - Junhao Zhang , Yali Wang , Zhipeng Zhou 2021

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size.

الرؤية الحاسوبية وتمييز الأنماط

Embedding Node Structural Role Identity Using Stress Majorization

199 - Lili Wang , Chenghan Huang , Weicheng Ma 2021

Nodes in networks may have one or more functions that determine their role in the system. As opposed to local proximity, which captures the local context of nodes, the role identity captures the functional role that nodes play in a network, such as b eing the center of a group, or the bridge between two groups. This means that nodes far apart in a network can have similar structural role identities. Several recent works have explored methods for embedding the roles of nodes in networks. However, these methods all rely on either approximating or indirect modeling of structural equivalence. In this paper, we present a novel and flexible framework using stress majorization, to transform the high-dimensional role identities in networks directly (without approximation or indirect modeling) to a low-dimensional embedding space. Our method is also flexible, in that it does not rely on specific structural similarity definitions. We evaluated our method on the tasks of node classification, clustering, and visualization, using three real-world and five synthetic networks. Our experiments show that our framework achieves superior results than existing methods in learning node role representations.

الشبكات الاجتماعية والمعلومات الذكاء الاصطناعي التعلم الآلي

Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization

524 - Lili Wang , Chenghan Huang , Weicheng Ma 2021

Recent years have seen a rise in the development of representational learning methods for graph data. Most of these methods, however, focus on node-level representation learning at various scales (e.g., microscopic, mesoscopic, and macroscopic node e mbedding). In comparison, methods for representation learning on whole graphs are currently relatively sparse. In this paper, we propose a novel unsupervised whole graph embedding method. Our method uses spectral graph wavelets to capture topological similarities on each k-hop sub-graph between nodes and uses them to learn embeddings for the whole graph. We evaluate our method against 12 well-known baselines on 4 real-world datasets and show that our method achieves the best performance across all experiments, outperforming the current state-of-the-art by a considerable margin.

التعلم الآلي الذكاء الاصطناعي الشبكات الاجتماعية والمعلومات

Enhancement of electron-positron pairs in combined potential wells with linear chirp frequency

146 - Li Wang , Lie-Juan Li , Melike Mohamedsedik 2021

The effect of linear chirp frequency on the process of electron-positron pairs production from vacuum in the combined potential wells is investigated by computational quantum field theory. Numerical results of electron number and energy spectrum unde r different frequency modulation parameters are obtained. By comparing with the fixed frequency, it is found that frequency modulation has a significant enhancement effect on the number of electrons. Especially when the frequency is small, appropriate frequency modulation enhances multiphoton processes in pair creation, thus promoting the pair creation. However, the number of electrons created by high frequency oscillating combined potential wells decreases after frequency modulation due to the phenomenon of high frequency suppression. The contours of the number of electrons varying with frequency and frequency modulation parameters are given, which may provide theoretical reference for possible experiments.

فيزياء الكم فيزياء الطاقة العالية - الظواهر

Security analysis method for practical quantum key distribution with arbitrary encoding schemes

201 - Zehong Chang , Fumin Wang , Xiaoli Wang 2021

Quantum key distribution (QKD) gradually has become a crucial element of practical secure communication. In different scenarios, the security analysis of genuine QKD systems is complicated. A universal secret key rate calculation method, used for rea listic factors such as multiple degrees of freedom encoding, asymmetric protocol structures, equipment flaws, environmental noise, and so on, is still lacking. Based on the correlations of statistical data, we propose a security analysis method without restriction on encoding schemes. This method makes a trade-off between applicability and accuracy, which can effectively analyze various existing QKD systems. We illustrate its ability by analyzing source flaws and a high-dimensional asymmetric protocol. Results imply that our method can give tighter bounds than the Gottesman-Lo-Lutkenhaus-Preskill (GLLP) analysis and is beneficial to analyze protocols with complex encoding structures. Our work has the potential to become a reference standard for the security analysis of practical QKD.

فيزياء الكم التشفير والأمن بصريات

Spin-dependent metalens with intensity-adjustable dual-focused vortex beams

421 - Qun Hao , Wenli Wang , Yao Hu 2021

Vortex beams with orbital angular momentum has been attracting tremendous attention due to their considerable applications ranging from optical tweezers to quantum information processing. Metalens, an ultra-compact and multifunctional device, provide a desired platform for designing vortex beams. A spin-dependent metalens can boost the freedom to further satisfy practical applications. By combining geometric phase and propagation phase, we propose and demonstrate an approach to design a spin-dependent metalens generating dual-focused vortex beams along longitudinal or transverse direction, i.e., metalenses with predesigned spin-dependent phase profiles. Under the illumination of an elliptical polarization incident beam, two spin-dependent focused vortex beams can be observed, and the relative focal intensity of them can be easily adjusted by modulating the ellipticity of the incident beam. Moreover, we also demonstrated that the separate distance between these dual-focused beams and their topological charges could be simultaneously tailored at will, which may have a profound impact on optical trapping and manipulation in photonics.

بصريات

Graph Attention Layer Evolves Semantic Segmentation for Road Pothole Detection: A Benchmark and Algorithms

107 - Rui Fan , Hengli Wang , Yuan Wang 2021

Existing road pothole detection approaches can be classified as computer vision-based or machine learning-based. The former approaches typically employ 2-D image analysis/understanding or 3-D point cloud modeling and segmentation algorithms to detect road potholes from vision sensor data. The latter approaches generally address road pothole detection using convolutional neural networks (CNNs) in an end-to-end manner. However, road potholes are not necessarily ubiquitous and it is challenging to prepare a large well-annotated dataset for CNN training. In this regard, while computer vision-based methods were the mainstream research trend in the past decade, machine learning-based methods were merely discussed. Recently, we published the first stereo vision-based road pothole detection dataset and a novel disparity transformation algorithm, whereby the damaged and undamaged road areas can be highly distinguished. However, there are no benchmarks currently available for state-of-the-art (SoTA) CNNs trained using either disparity images or transformed disparity images. Therefore, in this paper, we first discuss the SoTA CNNs designed for semantic segmentation and evaluate their performance for road pothole detection with extensive experiments. Additionally, inspired by graph neural network (GNN), we propose a novel CNN layer, referred to as graph attention layer (GAL), which can be easily deployed in any existing CNN to optimize image feature representations for semantic segmentation. Our experiments compare GAL-DeepLabv3+, our best-performing implementation, with nine SoTA CNNs on three modalities of training data: RGB images, disparity images, and transformed disparity images. The experimental results suggest that our proposed GAL-DeepLabv3+ achieves the best overall pothole detection accuracy on all training data modalities.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Byzantine-Robust Federated Learning via Credibility Assessment on Non-IID Data

130 - Kun Zhai , Qiang Ren , Junli Wang 2021

Federated learning is a novel framework that enables resource-constrained edge devices to jointly learn a model, which solves the problem of data protection and data islands. However, standard federated learning is vulnerable to Byzantine attacks, wh ich will cause the global model to be manipulated by the attacker or fail to converge. On non-iid data, the current methods are not effective in defensing against Byzantine attacks. In this paper, we propose a Byzantine-robust framework for federated learning via credibility assessment on non-iid data (BRCA). Credibility assessment is designed to detect Byzantine attacks by combing adaptive anomaly detection model and data verification. Specially, an adaptive mechanism is incorporated into the anomaly detection model for the training and prediction of the model. Simultaneously, a unified update algorithm is given to guarantee that the global model has a consistent direction. On non-iid data, our experiments demonstrate that the BRCA is more robust to Byzantine attacks compared with conventional methods

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

Digging into Uncertainty in Self-supervised Multi-view Stereo

141 - Hongbin Xu , Zhipeng Zhou , Yali Wang 2021

Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pre text task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.

الرؤية الحاسوبية وتمييز الأنماط

Progressive Coordinate Transforms for Monocular 3D Object Detection

117 - Li Wang , Li Zhang , Yi Zhu 2021

Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment. While significant progress has been achieved with expensive LiDAR point clouds, it poses a great challenge for 3D object detection given only a monocular image. While there exist different alternatives for tackling this problem, it is found that they are either equipped with heavy networks to fuse RGB and depth information or empirically ineffective to process millions of pseudo-LiDAR points. With in-depth examination, we realize that these limitations are rooted in inaccurate object localization. In this paper, we propose a novel and lightweight approach, dubbed {em Progressive Coordinate Transforms} (PCT) to facilitate learning coordinate representations. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy leads to superior improvements on the KITTI and Waymo Open Dataset monocular 3D detection benchmarks. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks. The code is available at: https://github.com/amazon-research/progressive-coordinate-transforms .

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد