ترغب بنشر مسار تعليمي؟ اضغط هنا

Deep neural networks often have a huge number of parameters, which posts challenges in deployment in application scenarios with limited memory and computation capacity. Knowledge distillation is one approach to derive compact models from bigger ones. However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima. In this paper, we propose ProKT, a new model-agnostic method by projecting the supervision signals of a teacher model into the students parameter space. Such projection is implemented by decomposing the training objective into local intermediate targets with an approximate mirror descent technique. The proposed method could be less sensitive with the quirks during optimization which could result in a better local optimum. Experiments on both image and text datasets show that our proposed ProKT consistently achieves superior performance compared to other existing knowledge distillation methods.
111 - Xian Shi , Pan Zhou , Wei Chen 2021
Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is stil l in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and then propose an efficient ASR model searching method that benefits from the natural advantage of differentiable architecture search (Darts) in reducing computational overheads. We fuse Darts mutator and Conformer blocks to form a complete search space, within which a modified architecture called Darts-Conformer cell is found automatically. The entire searching process on AISHELL-1 dataset costs only 0.7 GPU days. Replacing the Conformer encoder by stacking searched cell, we get an end-to-end ASR model (named as Darts-Conformner) that outperforms the Conformer baseline by 4.7% on the open-source AISHELL-1 dataset. Besides, we verify the transferability of the architecture searched on a small dataset to a larger 2k-hour dataset. To the best of our knowledge, this is the first successful attempt to apply gradient-based architecture search in the attention-based encoder-decoder ASR model.
287 - Xian Shi , Lin Chen 2021
Quantum networks play a key role in many scenarios of quantum information theory. Here we consider the quantum causal networks in the manner of entropy. First we present a revised smooth max-relative entropy of quantum combs, then we present a lower and upper bound of a type II error of the hypothesis testing. Next we present a lower bound of the smooth max-relative entropy for the quantum combs with asymptotic equipartition. At last, we consider the score to quantify the performance of an operator. We present a quantity equaling to the smooth asymptotic version of the performance of a quantum positive operator.
156 - Xian Shi , Fan Yu , Yizhou Lu 2021
The variety of accents has posed a big challenge to speech recognition. The Accented English Speech Recognition Challenge (AESRC2020) is designed for providing a common testbed and promoting accent-related research. Two tracks are set in the challeng e -- English accent recognition (track 1) and accented English speech recognition (track 2). A set of 160 hours of accented English speech collected from 8 countries is released with labels as the training set. Another 20 hours of speech without labels is later released as the test set, including two unseen accents from another two countries used to test the model generalization ability in track 2. We also provide baseline systems for the participants. This paper first reviews the released dataset, track setups, baselines and then summarizes the challenge results and major techniques used in the submissions.
End-to-end models are favored in automatic speech recognition (ASR) because of its simplified system structure and superior performance. Among these models, recurrent neural network transducer (RNN-T) has achieved significant progress in streaming on -device speech recognition because of its high-accuracy and low-latency. RNN-T adopts a prediction network to enhance language information, but its language modeling ability is limited because it still needs paired speech-text data to train. Further strengthening the language modeling ability through extra text data, such as shallow fusion with an external language model, only brings a small performance gain. In view of the fact that Mandarin Chinese is a character-based language and each character is pronounced as a tonal syllable, this paper proposes a novel cascade RNN-T approach to improve the language modeling ability of RNN-T. Our approach firstly uses an RNN-T to transform acoustic feature into syllable sequence, and then converts the syllable sequence into character sequence through an RNN-T-based syllable-to-character converter. Thus a rich text repository can be easily used to strengthen the language model ability. By introducing several important tricks, the cascade RNN-T approach surpasses the character-based RNN-T by a large margin on several Mandarin test sets, with much higher recognition quality and similar latency.
74 - Jiehong Lin , Xian Shi , Yuan Gao 2020
Point set is arguably the most direct approximation of an object or scene surface, yet its practical acquisition often suffers from the shortcoming of being noisy, sparse, and possibly incomplete, which restricts its use for a high-quality surface re covery. Point set upsampling aims to increase its density and regularity such that a better surface recovery could be achieved. The problem is severely ill-posed and challenging, considering that the upsampling target itself is only an approximation of the underlying surface. Motivated to improve the surface approximation via point set upsampling, we identify the factors that are critical to the objective, by pairing the surface approximation error bounds of the input and output point sets. It suggests that given a fixed budget of points in the upsampling result, more points should be distributed onto the surface regions where local curvatures are relatively high. To implement the motivation, we propose a novel design of Curvature-ADaptive Point set Upsampling network (CAD-PU), the core of which is a module of curvature-adaptive feature expansion. To train CAD-PU, we follow the same motivation and propose geometrically intuitive surrogates that approximate discrete notions of surface curvature for the upsampled point set. We further integrate the proposed surrogates into an adversarial learning based curvature minimization objective, which gives a practically effective learning of CAD-PU. We conduct thorough experiments that show the efficacy of our contributions and the advantages of our method over existing ones. Our implementation codes are publicly available at https://github.com/JiehongLin/CAD-PU.
122 - Rong Ye , Wenxian Shi , Hao Zhou 2020
How to generate descriptions from structured data organized in tables? Existing approaches using neural encoder-decoder models often suffer from lacking diversity. We claim that an open set of templates is crucial for enriching the phrase constructio ns and realizing varied generations. Learning such templates is prohibitive since it often requires a large paired <table, description> corpus, which is seldom available. This paper explores the problem of automatically learning reusable templates from paired and non-paired data. We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables. Our contributions include: a) we carefully devise a specific model architecture and losses to explicitly disentangle text template and semantic content information, in the latent spaces, and b)we utilize both small parallel data and large raw text without aligned tables to enrich the template learning. Experiments on datasets from a variety of different domains show that VTM is able to generate more diversely while keeping a good fluency and quality.
The goal of Project GAUSS is to return samples from the dwarf planet Ceres. Ceres is the most accessible ocean world candidate and the largest reservoir of water in the inner solar system. It shows active cryovolcanism and hydrothermal activities in recent history that resulted in minerals not found in any other planets to date except for Earths upper crust. The possible occurrence of recent subsurface ocean on Ceres and the complex geochemistry suggest possible past habitability and even the potential for ongoing habitability. Aiming to answer a broad spectrum of questions about the origin and evolution of Ceres and its potential habitability, GAUSS will return samples from this possible ocean world for the first time. The project will address the following top-level scientific questions: 1) What is the origin of Ceres and the origin and transfer of water and other volatiles in the inner solar system? 2) What are the physical properties and internal structure of Ceres? What do they tell us about the evolutionary and aqueous alteration history of icy dwarf planets? 3) What are the astrobiological implications of Ceres? Was it habitable in the past and is it still today? 4) What are the mineralogical connections between Ceres and our current collections of primitive meteorites? GAUSS will first perform a high-resolution global remote sensing investigation, characterizing the geophysical and geochemical properties of Ceres. Candidate sampling sites will then be identified, and observation campaigns will be run for an in-depth assessment of the candidate sites. Once the sampling site is selected, a lander will be deployed on the surface to collect samples and return them to Earth in cryogenic conditions that preserves the volatile and organic composition as well as the original physical status as much as possible.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا