أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Zhicheng Wang

Guiding Query Position and Performing Similar Attention for Transformer-Based Detection Heads

154 - Xiaohu Jiang , Ze Chen , Zhicheng Wang 2021

After DETR was proposed, this novel transformer-based detection paradigm which performs several cross-attentions between object queries and feature maps for predictions has subsequently derived a series of transformer-based detection heads. These mod els iterate object queries after each cross-attention. However, they dont renew the query position which indicates object queries position information. Thus model needs extra learning to figure out the newest regions that query position should express and need more attention. To fix this issue, we propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively. Another problem of such transformer-based detection heads is the high complexity to perform attention on multi-scale feature maps, which hinders them from improving detection performance at all scales. Therefore we propose a novel fusion scheme named Similar Attention (SiA): besides the feature maps is fused, SiA also fuse the attention weights maps to accelerate the learning of high-resolution attention weight map by well-learned low-resolution attention weight map. Our experiments show that the proposed GQPos improves the performance of a series of models, including DETR, SMCA, YoloS, and HoiTransformer and SiA consistently improve the performance of multi-scale transformer-based detection heads like DETR and HoiTransformer.

الرؤية الحاسوبية وتمييز الأنماط

Adaptive Dilated Convolution For Human Pose Estimation

348 - Zhengxiong Luo , Zhicheng Wang , Yan Huang 2021

Most existing human pose estimation (HPE) methods exploit multi-scale information by fusing feature maps of four different spatial sizes, ie $1/4$, $1/8$, $1/16$, and $1/32$ of the input image. There are two drawbacks of this strategy: 1) feature map s of different spatial sizes may be not well aligned spatially, which potentially hurts the accuracy of keypoint location; 2) these scales are fixed and inflexible, which may restrict the generalization ability over various human sizes. Towards these issues, we propose an adaptive dilated convolution (ADC). It can generate and fuse multi-scale features of the same spatial sizes by setting different dilation rates for different channels. More importantly, these dilation rates are generated by a regression module. It enables ADC to adaptively adjust the fused scales and thus ADC may generalize better to various human sizes. ADC can be end-to-end trained and easily plugged into existing methods. Extensive experiments show that ADC can bring consistent improvements to various HPE methods. The source codes will be released for further research.

الرؤية الحاسوبية وتمييز الأنماط

Physics-informed neural networks (PINNs) for fluid mechanics: A review

136 - Shengze Cai , Zhiping Mao , Zhicheng Wang 2021

Despite the significant progress over the last 50 years in simulating flow problems using numerical discretization of the Navier-Stokes equations (NSE), we still cannot incorporate seamlessly noisy data into existing algorithms, mesh-generation is co mplex, and we cannot tackle high-dimensional problems governed by parametrized NSE. Moreover, solving inverse flow problems is often prohibitively expensive and requires complex and expensive formulations and new computer codes. Here, we review flow physics-informed learning, integrating seamlessly data and mathematical models, and implementing them using physics-informed neural networks (PINNs). We demonstrate the effectiveness of PINNs for inverse problems related to three-dimensional wake flows, supersonic flows, and biomedical flows.

ديناميات السوائل التعلم الآلي

Lusztig correspondence and the Gan-Gross-Prasad problem

74 - Zhicheng Wang 2021

In previous work, we study the Gan-Gross-Prasad problem for unipotent representations of finite classical groups. In this paper, we deduce the Gan-Gross-Prasad problem for arbitrary representations from the unipotent representations by Lusztig correspondence.

نظرية التمثيل

TokenPose: Learning Keypoint Tokens for Human Pose Estimation

204 - Yanjie Li , Shoukui Zhang , Zhicheng Wang 2021

Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint rel ationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation~(TokenPose). In detail, each keypoint is explicitly embedded as a token to simultaneously learn constraint relationships and appearance cues from images. Extensive experiments show that the small and large TokenPose models are on par with state-of-the-art CNN-based counterparts while being more lightweight. Specifically, our TokenPose-S and TokenPose-L achieve $72.5$ AP and $75.8$ AP on COCO validation dataset respectively, with significant reduction in parameters ($downarrow80.6%$; $downarrow$ $56.8%$) and GFLOPs ($downarrow$ $75.3%$; $downarrow$ $24.7%$). Code is publicly available.

الرؤية الحاسوبية وتمييز الأنماط

V2F-Net: Explicit Decomposition of Occluded Pedestrian Detection

82 - Mingyang Shang , Dawei Xiang , Zhicheng Wang 2021

Occlusion is very challenging in pedestrian detection. In this paper, we propose a simple yet effective method named V2F-Net, which explicitly decomposes occluded pedestrian detection into visible region detection and full body estimation. V2F-Net co nsists of two sub-networks: Visible region Detection Network (VDN) and Full body Estimation Network (FEN). VDN tries to localize visible regions and FEN estimates full-body box on the basis of the visible box. Moreover, to further improve the estimation of full body, we propose a novel Embedding-based Part-aware Module (EPM). By supervising the visibility for each part, the network is encouraged to extract features with essential part information. We experimentally show the effectiveness of V2F-Net by conducting several experiments on two challenging datasets. V2F-Net achieves 5.85% AP gains on CrowdHuman and 2.24% MR-2 improvements on CityPersons compared to FPN baseline. Besides, the consistent gain on both one-stage and two-stage detector validates the generalizability of our method.

الرؤية الحاسوبية وتمييز الأنماط

Using Low-rank Representation of Abundance Maps and Nonnegative Tensor Factorization for Hyperspectral Nonlinear Unmixing

92 - Lianru Gao , Zhicheng Wang , Lina Zhuang 2021

Tensor-based methods have been widely studied to attack inverse problems in hyperspectral imaging since a hyperspectral image (HSI) cube can be naturally represented as a third-order tensor, which can perfectly retain the spatial information in the i mage. In this article, we extend the linear tensor method to the nonlinear tensor method and propose a nonlinear low-rank tensor unmixing algorithm to solve the generalized bilinear model (GBM). Specifically, the linear and nonlinear parts of the GBM can both be expressed as tensors. Furthermore, the low-rank structures of abundance maps and nonlinear interaction abundance maps are exploited by minimizing their nuclear norm, thus taking full advantage of the high spatial correlation in HSIs. Synthetic and real-data experiments show that the low rank of abundance maps and nonlinear interaction abundance maps exploited in our method can improve the performance of the nonlinear unmixing. A MATLAB demo of this work will be available at https://github.com/LinaZhuang for the sake of reproducibility.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

IBRNet: Learning Multi-View Image-Based Rendering

252 - Qianqian Wang , Zhicheng Wang , Kyle Genova 2021

We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multi-view posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. Project page: https://ibrnet.github.io/

الرؤية الحاسوبية وتمييز الأنماط

Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation

94 - Zhengxiong Luo , Zhicheng Wang , Yan Huang 2020

Heatmap regression has become the most prevalent choice for nowadays human pose estimation methods. The ground-truth heatmaps are usually constructed via covering all skeletal keypoints by 2D gaussian kernels. The standard deviations of these kernels are fixed. However, for bottom-up methods, which need to handle a large variance of human scales and labeling ambiguities, the current practice seems unreasonable. To better cope with these problems, we propose the scale-adaptive heatmap regression (SAHR) method, which can adaptively adjust the standard deviation for each keypoint. In this way, SAHR is more tolerant of various human scales and labeling ambiguities. However, SAHR may aggravate the imbalance between fore-background samples, which potentially hurts the improvement of SAHR. Thus, we further introduce the weight-adaptive heatmap regression (WAHR) to help balance the fore-background samples. Extensive experiments show that SAHR together with WAHR largely improves the accuracy of bottom-up human pose estimation. As a result, we finally outperform the state-of-the-art model by +1.5AP and achieve 72.0AP on COCO test-dev2017, which is com-arable with the performances of most top-down methods. Source codes are available at https://github.com/greatlog/SWAHR-HumanPose.

الرؤية الحاسوبية وتمييز الأنماط

A fast multi-fidelity method with uncertainty quantification for complex data correlations: Application to vortex-induced vibrations of marine risers

390 - Xuhui Meng , Zhicheng Wang , Dixia Fan 2020

We develop a fast multi-fidelity modeling method for very complex correlations between high- and low-fidelity data by working in modal space to extract the proper correlation function. We apply this method to infer the amplitude of motion of a flexib le marine riser in cross-flow, subject to vortex-induced vibrations (VIV). VIV are driven by an absolute instability in the flow, which imposes a frequency (Strouhal) law that requires a matching with the impedance of the structure; this matching is easily achieved because of the rapid parametric variation of the added mass force. As a result, the wavenumber of the riser spatial response is within narrow bands of uncertainty. Hence, an error in wavenumber prediction can cause significant phase-related errors in the shape of the amplitude of response along the riser, rendering correlation between low- and high-fidelity data very complex. Working in modal space as outlined herein, dense data from low-fidelity data, provided by the semi-empirical computer code VIVA, can correlate in modal space with few high-fidelity data, obtained from experiments or fully-resolved CFD simulations, to correct both phase and amplitude and provide predictions that agree very well overall with the correct shape of the amplitude response. We also quantify the uncertainty in the prediction using Bayesian modeling and exploit this uncertainty to formulate an active learning strategy for the best possible location of the sensors providing the high fidelity measurements.

ديناميات السوائل

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد