أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Long Ma

Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning

394 - Keqi Deng , Songjun Cao , Long Ma 2021

Recently, self-supervised pre-training has gained success in automatic speech recognition (ASR). However, considering the difference between speech accents in real scenarios, how to identify accents and use accent features to improve ASR is still cha llenging. In this paper, we employ the self-supervised pre-training method for both accent identification and accented speech recognition tasks. For the former task, a standard deviation constraint loss (SDC-loss) based end-to-end (E2E) architecture is proposed to identify accents under the same language. As for accented speech recognition task, we design an accent-dependent ASR system, which can utilize additional accent input features. Furthermore, we propose a frame-level accent feature, which is extracted based on the proposed accent identification model and can be dynamically adjusted. We pre-train our models using 960 hours unlabeled LibriSpeech dataset and fine-tune them on AESRC2020 speech dataset. The experimental results show that our proposed accent-dependent ASR system is significantly ahead of the AESRC2020 baseline and achieves $6.5%$ relative word error rate (WER) reduction compared with our accent-independent ASR system.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

107 - Wei Niu , Zhengang Li , Xiaolong Ma 2021

It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices because even the powerful modern mobile devices are considered as ``resource-constrained when executing large-scale DNNs. It necessitates the s parse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy. This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles. We start by proposing a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference; and (b) the BCR pruning optimizations for determining pruning hyperparameters and performing weight pruning. We compare GRIM with Alibaba MNN, TVM, TensorFlow-Lite, a sparse implementation based on CSR, PatDNN, and ESE (a representative FPGA inference acceleration framework for RNNs), and achieve up to 14.08x speedup.

التعلم الآلي الذكاء الاصطناعي

Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery

122 - Zhuo Zheng , Ailong Ma , Liangpei Zhang 2021

For high spatial resolution (HSR) remote sensing images, bitemporal supervised learning always dominates change detection using many pairwise labeled bitemporal images. However, it is very expensive and time-consuming to pairwise label large-scale bi temporal HSR remote sensing images. In this paper, we propose single-temporal supervised learning (STAR) for change detection from a new perspective of exploiting object changes in unpaired images as supervisory signals. STAR enables us to train a high-accuracy change detector only using textbf{unpaired} labeled images and generalize to real-world bitemporal images. To evaluate the effectiveness of STAR, we design a simple yet effective change detector called ChangeStar, which can reuse any deep semantic segmentation architecture by the ChangeMixin module. The comprehensive experimental results show that ChangeStar outperforms the baseline with a large margin under single-temporal supervision and achieves superior performance under bitemporal supervision. Code is available at https://github.com/Z-Zheng/ChangeStar

الرؤية الحاسوبية وتمييز الأنماط

Human-In-The-Loop Document Layout Analysis

416 - Xingjiao Wu , Tianlong Ma , Xin Li 2021

Document layout analysis (DLA) aims to divide a document image into different types of regions. DLA plays an important role in the document content understanding and information extraction systems. Exploring a method that can use less data for effect ive training contributes to the development of DLA. We consider a Human-in-the-loop (HITL) collaborative intelligence in the DLA. Our approach was inspired by the fact that the HITL push the model to learn from the unknown problems by adding a small amount of data based on knowledge. The HITL select key samples by using confidence. However, using confidence to find key samples is not suitable for DLA tasks. We propose the Key Samples Selection (KSS) method to find key samples in high-level tasks (semantic segmentation) more accurately through agent collaboration, effectively reducing costs. Once selected, these key samples are passed to human beings for active labeling, then the model will be updated with the labeled samples. Hence, we revisited the learning system from reinforcement learning and designed a sample-based agent update strategy, which effectively improves the agents ability to accept new samples. It achieves significant improvement results in two benchmarks (DSSE-200 (from 77.1% to 86.3%) and CS-150 (from 88.0% to 95.6%)) by using 10% of labeled data.

الرؤية الحاسوبية وتمييز الأنماط

Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?

105 - Xiaolong Ma , Geng Yuan , Xuan Shen 2021

There have been long-standing controversies and inconsistencies over the experiment setup and criteria for identifying the winning ticket in literature. To reconcile such, we revisit the definition of lottery ticket hypothesis, with comprehensive and more rigorous conditions. Under our new definition, we show concrete evidence to clarify whether the winning ticket exists across the major DNN architectures and/or applications. Through extensive experiments, we perform quantitative analysis on the correlations between winning tickets and various experimental factors, and empirically study the patterns of our observations. We find that the key training hyperparameters, such as learning rate and training epochs, as well as the architecture characteristics such as capacities and residual connections, are all highly correlated with whether and when the winning tickets can be identified. Based on our analysis, we summarize a guideline for parameter settings in regards of specific architecture characteristics, which we hope to catalyze the research progress on the topic of lottery ticket hypothesis.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Multimodal Emergent Fake News Detection via Meta Neural Process Networks

107 - Yaqing Wang , Fenglong Ma , Haoyu Wang 2021

Fake news travels at unprecedented speeds, reaches global audiences and puts users and communities at great risk via social media platforms. Deep learning based models show good performance when trained on large amounts of labeled data on events of i nterest, whereas the performance of models tends to degrade on other events due to domain shift. Therefore, significant challenges are posed for existing detection approaches to detect fake news on emergent events, where large-scale labeled datasets are difficult to obtain. Moreover, adding the knowledge from newly emergent events requires to build a new model from scratch or continue to fine-tune the model, which can be challenging, expensive, and unrealistic for real-world settings. In order to address those challenges, we propose an end-to-end fake news detection framework named MetaFEND, which is able to learn quickly to detect fake news on emergent events with a few verified posts. Specifically, the proposed model integrates meta-learning and neural process methods together to enjoy the benefits of these approaches. In particular, a label embedding module and a hard attention mechanism are proposed to enhance the effectiveness by handling categorical information and trimming irrelevant posts. Extensive experiments are conducted on multimedia datasets collected from Twitter and Weibo. The experimental results show our proposed MetaFEND model can detect fake news on never-seen events effectively and outperform the state-of-the-art methods.

استرجاع المعلومات الحساب واللغة

Effective Model Sparsification by Scheduled Grow-and-Prune Methods

77 - Xiaolong Ma , Minghai Qin , Fei Sun 2021

Deep neural networks (DNNs) are effective in solving many real-world problems. Larger DNN models usually exhibit better quality (e.g., accuracy) but their excessive computation results in long training and inference time. Model sparsification can red uce the computation and memory cost while maintaining model quality. Most existing sparsification algorithms unidirectionally remove weights, while others randomly or greedily explore a small subset of weights in each layer. The inefficiency of the algorithms reduces the achievable sparsity level. In addition, many algorithms still require pre-trained dense models and thus suffer from large memory footprint and long training time. In this paper, we propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. It addresses the shortcomings of the previous works by repeatedly growing a subset of layers to dense and then pruning back to sparse after some training. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks, such as image classification, objective detection, 3D object part segmentation, and translation. They also outperform other state-of-the-art (SOTA) pruning methods, including pruning from pre-trained dense models. As an example, a 90% sparse ResNet-50 obtained via GaP achieves 77.9% top-1 accuracy on ImageNet, improving the SOTA results by 1.5%.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI

87 - Geng Yuan , Zhiheng Liao , Xiaolong Ma 2021

Recent research demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication -- the intensive and key computation in deep neural n etworks (DNNs). However, hardware failure, such as stuck-at-fault defects, is one of the main concerns that impedes the ReRAM devices to be a feasible solution for real implementations. The existing solutions to address this issue usually require an optimization to be conducted for each individual device, which is impractical for mass-produced products (e.g., IoT devices). In this paper, we rethink the value of weight pruning in ReRAM-based DNN design from the perspective of model fault tolerance. And a differential mapping scheme is proposed to improve the fault tolerance under a high stuck-on fault rate. Our method can tolerate almost an order of magnitude higher failure rate than the traditional two-column method in representative DNN tasks. More importantly, our method does not require extra hardware cost compared to the traditional two-column mapping scheme. The improvement is universal and does not require the optimization process for each individual device.

التعلم الآلي الذكاء الاصطناعي الأداء

Compositional Sketch Search

81 - Alexander Black , Tu Bui , Long Mai 2021

We present an algorithm for searching image collections using free-hand sketches that describe the appearance and relative positions of multiple objects. Sketch based image retrieval (SBIR) methods predominantly match queries containing a single, dom inant object invariant to its position within an image. Our work exploits drawings as a concise and intuitive representation for specifying entire scene compositions. We train a convolutional neural network (CNN) to encode masked visual features from sketched objects, pooling these into a spatial descriptor encoding the spatial relationships and appearances of objects in the composition. Training the CNN backbone as a Siamese network under triplet loss yields a metric search embedding for measuring compositional similarity which may be efficiently leveraged for visual search by applying product quantization.

الرؤية الحاسوبية وتمييز الأنماط

Significantly Enhanced Performance of Nanofluidic Osmotic Power Generation by Slipping Surfaces of Nanopores

70 - Long Ma , Kabin Lin , Yinghua Qiu 2021

High-performance osmotic energy conversion (OEC) with perm-selective porous membrane requires both high ionic selectivity and permeability simultaneously. Here, hydrodynamic slip is considered on surfaces of nanopores to break the tradeoff between io nic selectivity and permeability, because it decreases the viscous friction at solid-liquid interfaces which can promote ionic diffusion during OEC. Taking advantage of simulations, influences from individual slipping surfaces on the OEC performance have been investigated, i.e. the slipping inner surface (surfaceinner) and exterior surfaces on the low- and high-concentration sides (surfaceL and surfaceH). Results show that the slipping surfaceL is crucial for high-performance OEC. For nanopores with various lengths, the slipping surfaceL simultaneously increases both ionic permeability and selectivity of nanopores, which results in both significantly enhanced electric power and energy conversion efficiency. While for nanopores longer than 30 nm, the slipping surfaceinner plays a dominant role in the increase of electric power, which induces a considerable decrease in energy conversion efficiency due to enhanced transport of both cations and anions. Considering the difficulty in hydrodynamic slip modification to the surfaceinner of nanopores, the surface modification to the surfaceL may be a better choice to achieve high-performance OEC. Our results provide feasible guidance to the design of porous membranes for high-performance osmotic energy harvesting.

الفيزياء الكيميائية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد