أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Sercan O. Arik

Controlling Neural Networks with Rule Representations

233 - Sungyong Seo , Sercan O. Arik , Jinsung Yoon 2021

We propose a novel training method to integrate rules into deep learning, in a way their strengths are controllable at inference. Deep Neural Networks with Controllable Rule Representations (DeepCTRL) incorporates a rule encoder into the model couple d with a rule-based objective, enabling a shared representation for decision making. DeepCTRL is agnostic to data type and model architecture. It can be applied to any kind of rule defined for inputs and outputs. The key aspect of DeepCTRL is that it does not require retraining to adapt the rule strength -- at inference, the user can adjust it based on the desired operation point on accuracy vs. rule verification ratio. In real-world domains where incorporating rules is critical -- such as Physics, Retail and Healthcare -- we show the effectiveness of DeepCTRL in teaching rules for deep learning. DeepCTRL improves the trust and reliability of the trained models by significantly increasing their rule verification ratio, while also providing accuracy gains at downstream tasks. Additionally, DeepCTRL enables novel use cases such as hypothesis testing of the rules on data samples, and unsupervised adaptation based on shared rules between datasets.

التعلم الآلي التعلم الالي

On Completeness-aware Concept-Based Explanations in Deep Neural Networks

110 - Chih-Kuan Yeh , Been Kim , Sercan O. Arik 2019

Human explanations of high-level decisions are often expressed in terms of key concepts the decisions are based on. In this paper, we study such concept-based explainability for Deep Neural Networks (DNNs). First, we define the notion of completeness , which quantifies how sufficient a particular set of concepts is in explaining a models prediction behavior based on the assumption that complete concept scores are sufficient statistics of the model prediction. Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable, which addresses the limitations of existing methods on concept explanations. To define an importance score for each discovered concept, we adapt game-theoretic notions to aggregate over sets and propose ConceptSHAP. Via proposed metrics and user studies, on a synthetic dataset with apriori-known concept explanations, as well as on real-world image and language datasets, we validate the effectiveness of our method in finding concepts that are both complete in explaining the decisions and interpretable. (The code is released at https://github.com/chihkuanyeh/concept_exp)

التعلم الآلي التعلم الالي

Distilling Effective Supervision from Severe Label Noise

144 - Zizhao Zhang , Han Zhang , Sercan O. Arik 2019

Collecting large-scale data with clean labels for supervised training of neural networks is practically challenging. Although noisy labels are usually cheap to acquire, existing methods suffer a lot from label noise. This paper targets at the challen ge of robust training at high label noise regimes. The key insight to achieve this goal is to wisely leverage a small trusted set to estimate exemplar weights and pseudo labels for noisy data in order to reuse them for supervised training. We present a holistic framework to train deep neural networks in a way that is highly invulnerable to label noise. Our method sets the new state of the art on various types of label noise and achieves excellent performance on large-scale datasets with real-world label noise. For instance, on CIFAR100 with a $40%$ uniform noise ratio and only 10 trusted labeled data per class, our method achieves $80.2{pm}0.3%$ classification accuracy, where the error rate is only $1.4%$ higher than a neural network trained without label noise. Moreover, increasing the noise ratio to $80%$, our method still maintains a high accuracy of $75.5{pm}0.2%$, compared to the previous best accuracy $48.2%$. Source code available: https://github.com/google-research/google-research/tree/master/ieg

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

127 - Sercan O. Arik , Heewoo Jun , 2018

We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core processors, and very fast (more than 300x real-time) waveform synthesis. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Neural Voice Cloning with a Few Samples

146 - Sercan O. Arik , Jitong Chen , Kainan Peng 2018

Voice cloning is a highly desired feature for personalized speech interfaces. Neural network based speech synthesis has been shown to generate high quality speech for a large number of speakers. In this paper, we introduce a neural voice cloning syst em that takes a few audio samples as input. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios and to be used with a multi-speaker generative model. In terms of naturalness of the speech and its similarity to original speaker, both approaches can achieve good performance, even with very few cloning audios. While speaker adaptation can achieve better naturalness and similarity, the cloning time or required memory for the speaker encoding approach is significantly less, making it favorable for low-resource deployment.

الحساب واللغة التعلم الآلي أنظمة الصوت في الحاسوب

Deep Voice: Real-time Neural Text-to-Speech

98 - Sercan O. Arik , Mike Chrzanowski , Adam Coates 2017

We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmenta tion model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For the segmentation model, we propose a novel way of performing phoneme boundary detection with deep neural networks using connectionist temporal classification (CTC) loss. For the audio synthesis model, we implement a variant of WaveNet that requires fewer parameters and trains faster than the original. By using a neural network for each component, our system is simpler and more flexible than traditional text-to-speech systems, where each component requires laborious feature engineering and extensive domain expertise. Finally, we show that inference with our system can be performed faster than real time and describe optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations.

الحساب واللغة التعلم الآلي الحوسبة العصبية والتطورية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد