أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Rohit Kumar

Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

89 - Anurenjan Purushothaman , Anirudh Sreeram , Rohit Kumar 2021

The task of speech recognition in far-field environments is adversely affected by the reverberant artifacts that elicit as the temporal smearing of the sub-band envelopes. In this paper, we develop a neural model for speech dereverberation using the long-term sub-band envelopes of speech. The sub-band envelopes are derived using frequency domain linear prediction (FDLP) which performs an autoregressive estimation of the Hilbert envelopes. The neural dereverberation model estimates the envelope gain which when applied to reverberant signals suppresses the late reflection components in the far-field signal. The dereverberated envelopes are used for feature extraction in speech recognition. Further, the sequence of steps involved in envelope dereverberation, feature extraction and acoustic modeling for ASR can be implemented as a single neural processing pipeline which allows the joint learning of the dereverberation network and the acoustic model. Several experiments are performed on the REVERB challenge dataset, CHiME-3 dataset and VOiCES dataset. In these experiments, the joint learning of envelope dereverberation and acoustic model yields significant performance improvements over the baseline ASR system based on log-mel spectrogram as well as other past approaches for dereverberation (average relative improvements of 10-24% over the baseline system). A detailed analysis on the choice of hyper-parameters and the cost function involved in envelope dereverberation is also provided.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب معالجة الإشارات

End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes

125 - Rohit Kumar , Anurenjan Purushothaman , Anirudh Sreeram 2021

The end-to-end (E2E) automatic speech recognition (ASR) offers several advantages over previous efforts for recognizing speech. However, in reverberant conditions, E2E ASR is a challenging task as the long-term sub-band envelopes of the reverberant s peech are temporally smeared. In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. The temporal envelopes are modeled using the framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelope gain based enhancement of temporal envelopes. The model architecture consists of a combination of convolutional and long short term memory (LSTM) neural network layers. Further, the envelope dereverberation, feature extraction and acoustic modeling using transformer based E2E ASR can all be jointly optimized for the speech recognition task. The joint optimization ensures that the dereverberation model targets the ASR cost function. We perform E2E speech recognition experiments on the REVERB challenge dataset as well as on the VOiCES dataset. In these experiments, the proposed joint modeling approach yields significant improvements compared to baseline E2E ASR system (average relative improvements of 21% on the REVERB challenge dataset and about 10% on the VOiCES dataset).

معالجة الصوت والكلام

Ce-site dilution in the ferromagnetic Kondo lattice CeRh$_6$Ge$_4$

174 - Jia-Cheng Xu , Hang Su , Rohit Kumar 2021

The heavy fermion ferromagnet CeRh$_6$Ge$_4$ is the first example of a clean stoichiometric system where the ferromagnetic transition can be continuously suppressed by hydrostatic pressure to a quantum critical point. In order to reveal the outcome w hen the magnetic lattice of CeRh$_6$Ge$_4$ is diluted with non-magnetic atoms, this study reports comprehensive measurements of the physical properties of both single crystal and polycrystalline samples of La$_x$Ce$_{1-x}$Rh$_6$Ge$_4$. With increasing $x$, the Curie temperature decreases, and no transition is observed for $x$ $>$ 0.25, while the system evolves from exhibiting coherent Kondo lattice behaviors at low $x$, to the Kondo impurity scenario at large $x$. Moreover, non-Fermi liquid behavior (NFL) is observed over a wide doping range, which agrees well with the disordered Kondo model for 0.52 $leq$ $x$ $leq$ 0.66, while strange metal behavior is revealed in the vicinity of $x_c$ = 0.26.

الإلكترونات المرتبطة بشدة

MuSIC@Indiana: an effective tool for accurate measurement of fusion with low-intensity radioactive beams

388 - J. E. Johnstone , Rohit Kumar , S. Hudan 2021

The design, construction, and characterization of the Multi-Sampling Ionization Chamber, MuSIC@Indiana, are described. This detector provides efficient and accurate measurement of the fusion cross-section at near-barrier energies. The response of the detector to low-intensity beams of $^{17,18}$O, $^{19}$F, $^{23}$Na, $^{24,26}$Mg, $^{27}$Al, and $^{28}$Si at E$_{lab}$ = 50-60 MeV was examined. MuSIC@Indiana was commissioned by measuring the $^{18}$O+$^{12}$C fusion excitation function for 11 $<$ E$_{cm}$ $<$ 20 MeV using CH$_{4}$ gas. A simple, effective analysis cleanly distinguishes proton capture and two-body scattering events from fusion on carbon. With MuSIC@Indiana, measurement of 15 points on the excitation function for a single incident beam energy is achieved. The resulting excitation function is shown to be in good agreement with literature data

أجهزة الكشف الفيزيائية التجربة النووية

Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

95 - Rohit Kumar Sachan , Rachit Agarwal , Sandeep Kumar Shukla 2021

The rise in the adoption of blockchain technology has led to increased illegal activities by cyber-criminals costing billions of dollars. Many machine learning algorithms are applied to detect such illegal behavior. These algorithms are often trained on the transaction behavior and, in some cases, trained on the vulnerabilities that exist in the system. In our approach, we study the feasibility of using metadata such as Domain Name (DN) associated with the account in the blockchain and identify whether an account should be tagged malicious or not. Here, we leverage the temporal aspects attached to the DNs. Our results identify 144930 DNs that show malicious behavior, and out of these, 54114 DNs show persistent malicious behavior over time. Nonetheless, none of these identified malicious DNs were reported in new officially tagged malicious blockchain DNs.

التشفير والأمن التعلم الآلي

Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

95 - Anurenjan Purushothaman , Anirudh Sreeram , Rohit Kumar 2020

Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dere verberation of speech. The temporal envelopes are derived using the autoregressive modeling framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelop gain based enhancement of temporal envelopes and it consists of a series of convolutional and recurrent neural network layers. The enhanced sub-band envelopes are used to generate features for automatic speech recognition (ASR). The ASR experiments are performed on the REVERB challenge dataset as well as the CHiME-3 dataset. In these experiments, the proposed neural enhancement approach provides significant improvements over a baseline ASR system with beamformed audio (average relative improvements of 21% on the development set and about 11% on the evaluation set in word error rates for REVERB challenge dataset).

معالجة الصوت والكلام أنظمة الصوت في الحاسوب معالجة الإشارات

Injectivity and range description of first $(k+1)$ integral moment transforms over $m$-tensor fields in $mathbb{R}^n$

40 - Rohit Kumar Mishra , Suman Kumar Sahoo 2020

In this work, we prove a new decomposition result for rank $m$ symmetric tensor fields which generalizes the well known solenoidal and potential decomposition of tensor fields. This decomposition is then used to describe the kernel and to prove an in jectivity result for first $(k+1)$ integral moment transforms of symmetric $m$-tensor fields in $mathbb{R}^n$. Additionally, we also present a range characterization for first $(k+1)$ integral moment transforms in terms of the Johns equation.

تحليل PDES

Generalized V-line transforms in 2D vector tomography

48 - Gaik Ambartsoumian , Mohammad Javad Latifi Jebelli , Rohit Kumar Mishra 2020

We study the inverse problem of recovering a vector field in $mathbb{R}^2$ from a set of new generalized $V$-line transforms in three different ways. First, we introduce the longitudinal and transverse $V$-line transforms for vector fields in $mathbb {R}^2$. We then give an explicit characterization of their respective kernels and show that they are complements of each other. We prove invertibility of each transform modulo their kernels and combine them to reconstruct explicitly the full vector field. In the second method, we combine the longitudinal and transverse V-line transforms with their corresponding first moment transforms and recover the full vector field from either pair. We show that the available data in each of these setups can be used to derive the signed V-line transform of both scalar component of the vector field, and use the known inversion of the latter. The final major result of this paper is the derivation of an exact closed form formula for reconstruction of the full vector field in $mathbb{R}^2$ from its star transform with weights. We solve this problem by relating the star transform of the vector field to the ordinary Radon transform of the scalar components of the field.

التحليل الكلاسيكي و ODEs الفيزياء الرياضية تحليل PDES

LEAP Submission to CHiME-6 ASR Challenge}

98 - Anirudh Sreeram , Anurenjan Purushothaman , Rohit Kumar 2020

This paper reports the LEAP submission to the CHiME-6 challenge. The CHiME-6 Automatic Speech Recognition (ASR) challenge Track 1 involved the recognition of speech in noisy and reverberant acoustic conditions in home environments with multiple-party interactions. For the challenge submission, the LEAP system used extensive data augmentation and a factorized time-delay neural network (TDNN) architecture. We also explored a neural architecture that interleaved the TDNN layers with LSTM layers. The submitted system improved the Kaldi recipe by 2% in terms of relative word-error-rate improvements.

معالجة الصوت والكلام

Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer

307 - Siddhant Garg , Rohit Kumar Sharma , Yingyu Liang 2020

Fine-tuning (FT) pre-trained sentence embedding models on small datasets has been shown to have limitations. In this paper we show that concatenating the embeddings from the pre-trained model with those from a simple sentence embedding model trained only on the target data, can improve over the performance of FT for few-sample tasks. To this end, a linear classifier is trained on the combined embeddings, either by freezing the embedding model weights or training the classifier and embedding models end-to-end. We perform evaluation on seven small datasets from NLP tasks and show that our approach with end-to-end training outperforms FT with negligible computational overhead. Further, we also show that sophisticated combination techniques like CCA and KCCA do not work as well in practice as concatenation. We provide theoretical analysis to explain this empirical observation.

الحساب واللغة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد