ترغب بنشر مسار تعليمي؟ اضغط هنا

One TTS Alignment To Rule Them All

145   0   0.0 ( 0 )
 نشر من قبل Rohan Badlani
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in RAD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators.



قيم البحث

اقرأ أيضاً

Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternat ive approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the quantization process is not static and can vary to meet different circumstances and implementations. To address this issue, we propose a method that provides intrinsic robustness to the model against a broad range of quantization processes. Our method is motivated by theoretical arguments and enables us to store a single generic model capable of operating at various bit-widths and quantization policies. We validate our methods effectiveness on different ImageNet models.
98 - Federico Lelli 2016
We study the link between baryons and dark matter in 240 galaxies with spatially resolved kinematic data. Our sample spans 9 dex in stellar mass and includes all morphological types. We consider (i) 153 late-type galaxies (LTGs; spirals and irregular s) with gas rotation curves from the SPARC database; (ii) 25 early-type galaxies (ETGs; ellipticals and lenticulars) with stellar and HI data from ATLAS^3D or X-ray data from Chandra; and (iii) 62 dwarf spheroidals (dSphs) with individual-star spectroscopy. We find that LTGs, ETGs, and classical dSphs follow the same radial acceleration relation: the observed acceleration (gobs) correlates with that expected from the distribution of baryons (gbar) over 4 dex. The relation coincides with the 1:1 line (no dark matter) at high accelerations but systematically deviates from unity below a critical scale of ~10^-10 m/s^2. The observed scatter is remarkably small (<0.13 dex) and largely driven by observational uncertainties. The residuals do not correlate with any global or local galaxy property (baryonic mass, gas fraction, radius, etc.). The radial acceleration relation is tantamount to a Natural Law: when the baryonic contribution is measured, the rotation curve follows, and vice versa. Including ultrafaint dSphs, the relation may extend by another 2 dex and possibly flatten at gbar<10^-12 m/s^2, but these data are significantly more uncertain. The radial acceleration relation subsumes and generalizes several well-known dynamical properties of galaxies, like the Tully-Fisher and Faber-Jackson relations, the baryon-halo conspiracies, and Renzos rule.
Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. While deep learning-based detection methods have been proposed to ide ntify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin.
The analytical solution of the three--dimensional linear pendulum in a rotating frame of reference is obtained, including Coriolis and centrifugal accelerations, and expressed in terms of initial conditions. This result offers the possibility of trea ting Foucault and Bravais pendula as trajectories of the very same system of equations, each of them with particular initial conditions. We compare with the common two--dimensional approximations in textbooks. A previously unnoticed pattern in the three--dimensional Foucault pendulum attractor is presented.
One Monad to Prove Them All is a modern fairy tale about curiosity and perseverance, two important properties of a successful PhD student. We follow the PhD student Mona on her adventure of proving properties about Haskell programs in the proof assis tant Coq. On the one hand, as a PhD student in computer science Mona observes an increasing demand for correct software products. In particular, because of the large amount of existing software, verifying existing software products becomes more important. Verifying programs in the functional programming language Haskell is no exception. On the other hand, Mona is delighted to see that communities in the area of theorem proving are becoming popular. Thus, Mona sets out to learn more about the interactive theorem prover Coq and verifying Haskell programs in Coq. To prove properties about a Haskell function in Coq, Mona has to translate the function into Coq code. As Coq programs have to be total and Haskell programs are often not, Mona has to model partiality explicitly in Coq. In her quest for a solution Mona finds an ancient manuscript that explains how properties about Haskell functions can be proven in the proof assistant Agda by translating Haskell programs into monadic Agda programs. By instantiating the monadic program with a concrete monad instance the proof can be performed in either a total or a partial setting. Mona discovers that the proposed transformation does not work in Coq due to a restriction in the termination checker. In fact the transformation does not work in Agda anymore as well, as the termination checker in Agda has been improved. We follow Mona on an educational journey through the land of functional programming where she learns about concepts like free monads and containers as well as basics and restrictions of proof assistants like Coq. These concepts are well-known individually, but their interplay gives rise to a solution for Monas problem based on the originally proposed monadic tranformation that has not been presented before. When Mona starts to test her approach by proving a statement about simple Haskell functions, she realizes that her approach has an additional advantage over the original idea in Agda. Monas final solution not only works for a specific monad instance but even allows her to prove monad-generic properties. Instead of proving properties over and over again for specific monad instances she is able to prove properties that hold for all monads representable by a container-based instance of the free monad. In order to strengthen her confidence in the practicability of her approach, Mona evaluates her approach in a case study that compares two implementations for queues. In order to share the results with other functional programmers the fairy tale is available as a literate Coq file. If you are a citizen of the land of functional programming or are at least familiar with its customs, had a journey that involved reasoning about functional programs of your own, or are just a curious soul looking for the next story about monads and proofs, then this tale is for you.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا