أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Felix Wu

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

108 - Felix Wu , Kwangyoun Kim , Jing Pan 2021

This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes.

الحساب واللغة التعلم الآلي أنظمة الصوت في الحاسوب

Multi-mode Transformer Transducer with Stochastic Future Context

74 - Kwangyoun Kim , Felix Wu , Prashant Sridhar 2021

Automatic speech recognition (ASR) models make fewer errors when more surrounding speech information is presented as context. Unfortunately, acquiring a larger future context leads to higher latency. There exists an inevitable trade-off between speed and accuracy. Naively, to fit different latency requirements, people have to store multiple models and pick the best one under the constraints. Instead, a more desirable approach is to have a single model that can dynamically adjust its latency based on different constraints, which we refer to as Multi-mode ASR. A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy. In pursuit of Multi-mode ASR, we propose Stochastic Future Context, a simple training procedure that samples one streaming configuration in each iteration. Through extensive experiments on AISHELL-1 and LibriSpeech datasets, we show that a Multi-mode ASR model rivals, if not surpasses, a set of competitive streaming baselines trained with different latency budgets.

معالجة الصوت والكلام الحساب واللغة أنظمة الصوت في الحاسوب

Making Paper Reviewing Robust to Bid Manipulation Attacks

109 - Ruihan Wu , Chuan Guo , Felix Wu 2021

Most computer science conferences rely on paper bidding to assign reviewers to papers. Although paper bidding enables high-quality assignments in days of unprecedented submission numbers, it also opens the door for dishonest reviewers to adversariall y influence paper reviewing assignments. Anecdotal evidence suggests that some reviewers bid on papers by friends or colluding authors, even though these papers are outside their area of expertise, and recommend them for acceptance without considering the merit of the work. In this paper, we study the efficacy of such bid manipulation attacks and find that, indeed, they can jeopardize the integrity of the review process. We develop a novel approach for paper bidding and assignment that is much more robust against such attacks. We show empirically that our approach provides robustness even when dishonest reviewers collude, have full knowledge of the assignment systems internal workings, and have access to the systems inputs. In addition to being more robust, the quality of our paper review assignments is comparable to that of current, non-robust assignment approaches.

التشفير والأمن علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Revisiting Few-sample BERT Fine-tuning

237 - Tianyi Zhang , Felix Wu , Arzoo Katiyar 2020

This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization met hod with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent practice of using a pre-determined, and small number of training iterations. We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process. In light of these observations, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe the impact of these methods diminishes significantly with our modified process.

الحساب واللغة التعلم الآلي

Pay Less Attention with Lightweight and Dynamic Convolutions

195 - Felix Wu , Angela Fan , Alexei Baevski 2019

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution c an perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient than self-attention. We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic. Experiments on large-scale machine translation, language modeling and abstractive summarization show that dynamic convolutions improve over strong self-attention models. On the WMT14 English-German test set dynamic convolutions achieve a new state of the art of 29.7 BLEU.

الحساب واللغة

Globalness Detection in Online Social Network

131 - Yu-Cheng Lin , Chun-Ming Lai , S. Felix Wu 2018

Classification problems have made significant progress due to the maturity of artificial intelligence (AI). However, differentiating items from categories without noticeable boundaries is still a huge challenge for machines -- which is also crucial f or machines to be intelligent. In order to study the fuzzy concept on classification, we define and propose a globalness detection with the four-stage operational flow. We then demonstrate our framework on Facebook public pages inter-like graph with their geo-location. Our prediction algorithm achieves high precision (89%) and recall (88%) of local pages. We evaluate the results on both states and countries level, finding that the global node ratios are relatively high in those states (NY, CA) having large and international cities. Several global nodes examples have also been shown and studied in this paper. It is our hope that our results unveil the perfect value from every classification problem and provide a better understanding of global and local nodes in Online Social Networks (OSNs).

الشبكات الاجتماعية والمعلومات التعلم الآلي التعلم الالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد