أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jianbo Liu

HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based Gesture Recognition

85 - Jianbo Liu , Ying Wang , Shiming Xiang 2021

Previous methods for skeleton-based gesture recognition mostly arrange the skeleton sequence into a pseudo picture or spatial-temporal graph and apply deep Convolutional Neural Network (CNN) or Graph Convolutional Network (GCN) for feature extraction . Although achieving superior results, these methods have inherent limitations in dynamically capturing local features of interactive hand parts, and the computing efficiency still remains a serious issue. In this work, the self-attention mechanism is introduced to alleviate this problem. Considering the hierarchical structure of hand joints, we propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition, which is based on pure self-attention without any CNN, RNN or GCN operators. Specifically, the joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand. In terms of temporal features, the temporal self-attention module is utilized to capture the temporal dynamics of the fingers and the entire hand. Finally, these features are fused by the fusion self-attention module for gesture classification. Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.

الرؤية الحاسوبية وتمييز الأنماط

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection

86 - Jianbo Liu , Sijie Ren , Yuanjie Zheng 2020

Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convoluti ons in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level features from the encoder features. With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation. The EfficientFCN achieves comparable or even better performance than state-of-the-art methods with only 1/3 of their computational costs for semantic segmentation on PASCAL Context, PASCAL VOC, ADE20K datasets. Meanwhile, the proposed HGD-FPN achieves $>2%$ higher mean Average Precision (mAP) when integrated into several object detection frameworks with ResNet-50 encoding backbones.

الرؤية الحاسوبية وتمييز الأنماط

Simultaneous measurements on the electron and X-ray spectra from laser-irradiated near-critical-density double-layer targets at relativistic intensity

92 - Jianbo Liu , Pengjie Wang , Yinren Shou 2020

We report the experimental results of simultaneous measurements on the electron and X-ray spectra from near-critical-density (NCD) double-layer targets irradiated by relativistic femtosecond pulses at the intensity of 5E19 W/cm^2. The dependence of t he electron and X-ray spectra on the density and thickness of the NCD layer was studied. For the optimal targets, electrons with temperature of 5.5 MeV and X-rays with critical energy of 5 keV were obtained. 2D particle-in-cell simulations based on the experimental parameters confirm the electrons are accelerated in the plasma channel through direct laser acceleration, resulting in temperature significantly higher than the pondermotive temperature. Bright X-rays are generated from betatron emission and Thomson backscattering before the electrons leave the double-layer targets.

فيزياء البلازما مسرع فيزياء

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

89 - Jianbo Liu , Junjun He , Jiawei Zhang 2020

Both performance and efficiency are important to semantic segmentation. State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance. However, due to many convolution operations are conducted on the high-resolution feature maps, such dilatedFCN-based methods result in large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from the encoder. However, the performances of existing encoder-decoder methods are far from comparable with the dilatedFCN-based methods. In this paper, we propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution. A holistically-guided decoder is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding task is converted to novel codebook generation and codeword assembly task, which takes advantages of the high-level and low-level features from the encoder. Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost. Extensive experiments on PASCAL Context, PASCAL VOC, ADE20K validate the effectiveness of the proposed EfficientFCN.

الرؤية الحاسوبية وتمييز الأنماط

Learning to Predict Context-adaptive Convolution for Semantic Segmentation

91 - Jianbo Liu , Junjun He , Jimmy S. Ren 2020

Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods demonstrate that using global context for re-weighting feature channels can effectively improve the accuracy of semantic segmentation. However, the globally-sharing feature re-weighting vector might not be optimal for regions of different classes in the input image. In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-efficient manner. When used for convolution with the semantic feature maps, the predicted convolutional kernels can generate the spatially-varying feature weighting factors capturing both global and local contextual information. Comprehensive experimental results show that our CaC-Net achieves superior segmentation performance on three public datasets, PASCAL Context, PASCAL VOC 2012 and ADE20K.

الرؤية الحاسوبية وتمييز الأنماط

Learning Selfie-Friendly Abstraction from Artistic Style Images

81 - Yicun Liu , Jimmy Ren , Jianbo Liu 2018

Artistic style transfer can be thought as a process to generate differen

الرؤية الحاسوبية وتمييز الأنماط

Robust Tracking Using Region Proposal Networks

175 - Jimmy Ren , Zhiyang Yu , Jianbo Liu 2017

Recent advances in visual tracking showed that deep Convolutional Neural Networks (CNN) trained for image classification can be strong feature extractors for discriminative trackers. However, due to the drastic difference between image classification and tracking, extra treatments such as model ensemble and feature engineering must be carried out to bridge the two domains. Such procedures are either time consuming or hard to generalize well across datasets. In this paper we discovered that the internal structure of Region Proposal Network (RPN)s top layer feature can be utilized for robust visual tracking. We showed that such property has to be unleashed by a novel loss function which simultaneously considers classification accuracy and bounding box quality. Without ensemble and any extra treatment on feature maps, our proposed method achieved state-of-the-art results on several large scale benchmarks including OTB50, OTB100 and VOT2016. We will make our code publicly available.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد