أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yuhang Li

Real World Robustness from Systematic Noise

89 - Yan Wang , Yuhang Li , Ruihao Gong 2021

Systematic error, which is not determined by chance, often refers to the inaccuracy (involving either the observation or measurement process) inherent to a system. In this paper, we exhibit some long-neglected but frequent-happening adversarial examp les caused by systematic error. More specifically, we find the trained neural network classifier can be fooled by inconsistent implementations of image decoding and resize. This tiny difference between these implementations often causes an accuracy drop from training to deployment. To benchmark these real-world adversarial examples, we propose ImageNet-S dataset, which enables researchers to measure a classifiers robustness to systematic error. For example, we find a normal ResNet-50 trained on ImageNet can have 1%-5% accuracy difference due to the systematic error. Together our evaluation and dataset may aid future work toward real-world robustness and practical generalization.

الرؤية الحاسوبية وتمييز الأنماط

Design of a Flying Humanoid Robot Based on Thrust Vector Control

129 - Yuhang Li , Yuhao Zhou , Junbin Huang 2021

Achieving short-distance flight helps improve the efficiency of humanoid robots moving in complex environments (e.g., crossing large obstacles or reaching high places) for rapid emergency missions. This study proposes a design of a flying humanoid ro bot named Jet-HR2. The robot has 10 joints driven by brushless motors and harmonic drives for locomotion. To overcome the challenge of the stable-attitude takeoff in small thrust-to-weight conditions, the robot was designed based on the concept of thrust vectoring. The propulsion system consists of four ducted fans, that is, two fixed on the waist of the robot and the other two mounted on the feet, for thrust vector control. The thrust vector is controlled by adjusting the attitude of the foot during the flight. A simplified model and control strategies are proposed to solve the problem of attitude instability caused by mass errors and joint position errors during takeoff. The experimental results show that the robots spin and dive behaviors during takeoff were effectively suppressed by controlling the thrust vector of the ducted fan on the foot. The robot successfully achieved takeoff at a thrust-to-weight ratio of 1.17 (17 kg / 20 kg) and maintained a stable attitude, reaching a takeoff height of over 1000 mm.

علم الروبوتات

Gain dynamics driven broadband Q-switched noise-like pulse in ultrafast fiber laser

206 - Ji Zhou , Yuhang Li , Qing Yang 2021

We investigate the buildup dynamics of broadband Q-switched noise-like pulse (QS-NLP) driven by slow gain dynamics in a microfiber-based passively mode-locked Yb-doped fiber laser. Based on shot-to-shot tracing of the transient optical spectra and qu alitatively reproduced numerial simulation, we demonstrate that slow gain dynamics is deeply involved in the onset of such complex temporal and spectral instabilities of QS-NLP. The proposed dynamic model in this work could contribute to deeper insight of such nonlinear dynamics and transient dynamics simulation in ultrafast fiber laser.

بصريات

A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural Networks Calibration

196 - Yuhang Li , Shikuang Deng , Xin Dong 2021

Spiking Neural Network (SNN) has been recognized as one of the next generation of neural networks. Conventionally, SNN can be converted from a pre-trained ANN by only replacing the ReLU activation to spike activation while keeping the parameters inta ct. Perhaps surprisingly, in this work we show that a proper way to calibrate the parameters during the conversion of ANN to SNN can bring significant improvements. We introduce SNN Calibration, a cheap but extraordinarily effective method by leveraging the knowledge within a pre-trained Artificial Neural Network (ANN). Starting by analyzing the conversion error and its propagation through layers theoretically, we propose the calibration algorithm that can correct the error layer-by-layer. The calibration only takes a handful number of training data and several minutes to finish. Moreover, our calibration algorithm can produce SNN with state-of-the-art architecture on the large-scale ImageNet dataset, including MobileNet and RegNet. Extensive experiments demonstrate the effectiveness and efficiency of our algorithm. For example, our advanced pipeline can increase up to 69% top-1 accuracy when converting MobileNet on ImageNet compared to baselines. Codes are released at https://github.com/yhhhli/SNN_Calibration.

التعلم الآلي

Act Like a Radiologist: Towards Reliable Multi-view Correspondence Reasoning for Mammogram Mass Detection

159 - Yuhang Liu , Fandong Zhang , Chaoqi Chen 2021

Mammogram mass detection is crucial for diagnosing and preventing the breast cancers in clinical practice. The complementary effect of multi-view mammogram images provides valuable information about the breast anatomical prior structure and is of gre at significance in digital mammography interpretation. However, unlike radiologists who can utilize the natural reasoning ability to identify masses based on multiple mammographic views, how to endow the existing object detection models with the capability of multi-view reasoning is vital for decision-making in clinical diagnosis but remains the boundary to explore. In this paper, we propose an Anatomy-aware Graph convolutional Network (AGN), which is tailored for mammogram mass detection and endows existing detection methods with multi-view reasoning ability. The proposed AGN consists of three steps. Firstly, we introduce a Bipartite Graph convolutional Network (BGN) to model the intrinsic geometric and semantic relations of ipsilateral views. Secondly, considering that the visual asymmetry of bilateral views is widely adopted in clinical practice to assist the diagnosis of breast lesions, we propose an Inception Graph convolutional Network (IGN) to model the structural similarities of bilateral views. Finally, based on the constructed graphs, the multi-view information is propagated through nodes methodically, which equips the features learned from the examined view with multi-view reasoning ability. Experiments on two standard benchmarks reveal that AGN significantly exceeds the state-of-the-art performance. Visualization results show that AGN provides interpretable visual cues for clinical diagnosis.

الرؤية الحاسوبية وتمييز الأنماط

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

361 - Yuhang Li , Ruihao Gong , Xu Tan 2021

We study the challenging task of neural network quantization without end-to-end retraining, called Post-training Quantization (PTQ). PTQ usually requires a small subset of training data but produces less powerful quantized models than Quantization-Aw are Training (QAT). In this work, we propose a novel PTQ framework, dubbed BRECQ, which pushes the limits of bitwidth in PTQ down to INT2 for the first time. BRECQ leverages the basic building blocks in neural networks and reconstructs them one-by-one. In a comprehensive theoretical study of the second-order error, we show that BRECQ achieves a good balance between cross-layer dependency and generalization error. To further employ the power of quantization, the mixed precision technique is incorporated in our framework by approximating the inter-layer and intra-layer sensitivity. Extensive experiments on various handcrafted and searched neural architectures are conducted for both image classification and object detection tasks. And for the first time we prove that, without bells and whistles, PTQ can attain 4-bit ResNet and MobileNetV2 comparable with QAT and enjoy 240 times faster production of quantized models. Codes are available at https://github.com/yhhhli/BRECQ.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing

200 - Yuhang Li , Feng Zhu , Ruihao Gong 2020

User data confidentiality protection is becoming a rising challenge in the present deep learning research. Without access to data, conventional data-driven model compression faces a higher risk of performance degradation. Recently, some works propose to generate images from a specific pretrained model to serve as training data. However, the inversion process only utilizes biased feature statistics stored in one model and is from low-dimension to high-dimension. As a consequence, it inevitably encounters the difficulties of generalizability and inexact inversion, which leads to unsatisfactory performance. To address these problems, we propose MixMix based on two simple yet effective techniques: (1) Feature Mixing: utilizes various models to construct a universal feature space for generalized inversion; (2) Data Mixing: mixes the synthesized images and labels to generate exact label information. We prove the effectiveness of MixMix from both theoretical and empirical perspectives. Extensive experiments show that MixMix outperforms existing methods on the mainstream compression tasks, including quantization, knowledge distillation, and pruning. Specifically, MixMix achieves up to 4% and 20% accuracy uplift on quantization and pruning, respectively, compared to existing data-free compression work.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

DeepFacePencil: Creating Face Images from Freehand Sketches

313 - Yuhang Li , Xuejin Chen , Binxin Yang 2020

In this paper, we explore the task of generating photo-realistic face images from hand-drawn sketches. Existing image-to-image translation methods require a large-scale dataset of paired sketches and images for supervision. They typically utilize syn thesized edge maps of face images as training data. However, these synthesized edge maps strictly align with the edges of the corresponding face images, which limit their generalization ability to real hand-drawn sketches with vast stroke diversity. To address this problem, we propose DeepFacePencil, an effective tool that is able to generate photo-realistic face images from hand-drawn sketches, based on a novel dual generator image translation network during training. A novel spatial attention pooling (SAP) is designed to adaptively handle stroke distortions which are spatially varying to support various stroke styles and different levels of details. We conduct extensive experiments and the results demonstrate the superiority of our model over existing methods on both image quality and model generalization to hand-drawn sketches.

الرؤية الحاسوبية وتمييز الأنماط

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

68 - Yuhang Li , Wei Wang , Haoli Bai 2020

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the over all performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Meanwhile, it is yet unclear how to perform convolution for weights and activations of different precision efficiently on generic hardware platforms. To resolve these two issues, in this paper, we first propose an Efficient Bitwidth Search (EBS) algorithm, which reuses the meta weights for different quantization bitwidth and thus the strength for each candidate precision can be optimized directly w.r.t the objective without superfluous copies, reducing both the memory and computational cost significantly. Second, we propose a binary decomposition algorithm that converts weights and activations of different precision into binary matrices to make the mixed precision convolution efficient and practical. Experiment results on CIFAR10 and ImageNet datasets demonstrate our mixed precision QNN outperforms the handcrafted uniform bitwidth counterparts and other mixed precision techniques.

التعلم الآلي التعلم الالي

LinesToFacePhoto: Face Photo Generation from Lines with Conditional Self-Attention Generative Adversarial Network

95 - Yuhang Li , Xuejin Chen , Feng Wu 2019

In this paper, we explore the task of generating photo-realistic face images from lines. Previous methods based on conditional generative adversarial networks (cGANs) have shown their power to generate visually plausible images when a conditional ima ge and an output image share well-aligned structures. However, these models fail to synthesize face images with a whole set of well-defined structures, e.g. eyes, noses, mouths, etc., especially when the conditional line map lacks one or several parts. To address this problem, we propose a conditional self-attention generative adversarial network (CSAGAN). We introduce a conditional self-attention mechanism to cGANs to capture long-range dependencies between different regions in faces. We also build a multi-scale discriminator. The large-scale discriminator enforces the completeness of global structures and the small-scale discriminator encourages fine details, thereby enhancing the realism of generated face images. We evaluate the proposed model on the CelebA-HD dataset by two perceptual user studies and three quantitative metrics. The experiment results demonstrate that our method generates high-quality facial images while preserving facial structures. Our results outperform state-of-the-art methods both quantitatively and qualitatively.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد