أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Anurag Kumar

NORESQA -- A Framework for Speech Quality Assessment using Non-Matching References

232 - Pranay Manocha , Buye Xu , Anurag Kumar 2021

The perceptual task of speech quality assessment (SQA) is a challenging task for machines to do. Objective SQA methods that rely on the availability of the corresponding clean reference have been the primary go-to approaches for SQA. Clearly, these m ethods fail in real-world scenarios where the ground truth clean references are not available. In recent years, non-intrusive methods that train neural networks to predict ratings or scores have attracted much attention, but they suffer from several shortcomings such as lack of robustness, reliance on labeled data for training and so on. In this work, we propose a new direction for speech quality assessment. Inspired by humans innate ability to compare and assess the quality of speech signals even when they have non-matching contents, we propose a novel framework that predicts a subjective relative quality score for the given speech signal with respect to any provided reference without using any subjective data. We show that neural networks trained using our framework produce scores that correlate well with subjective mean opinion scores (MOS) and are also competitive to methods such as DNSMOS, which explicitly relies on MOS from humans for training networks. Moreover, our method also provides a natural way to embed quality-related information in neural networks, which we show is helpful for downstream tasks such as speech enhancement.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

DPLM: A Deep Perceptual Spatial-Audio Localization Metric

157 - Pranay Manocha , Anurag Kumar , Buye Xu 2021

Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, w e tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general purpose quality metric to assess spatial localization differences between two binaural recordings. We model localization similarity by utilizing activation-level distances from deep networks trained for direction of arrival (DOA) estimation. Our proposed metric (DPLM) outperforms baseline metrics on correlation with subjective ratings on a diverse set of datasets, even without the benefit of any human-labeled training data.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

Multi-Channel Speech Enhancement using Graph Neural Networks

88 - Panagiotis Tzirakis , Anurag Kumar , Jacob Donley 2021

Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering tech niques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing each audio channel as a node lying in a non-Euclidean space and, specifically, a graph. This formulation allows us to apply graph neural networks (GNN) to find spatial correlations among the different channels (nodes). We utilize graph convolution networks (GCN) by incorporating them in the embedding space of a U-Net architecture. We use LibriSpeech dataset and simulate room acoustics data to extensively experiment with our approach using different array types, and number of microphones. Results indicate the superiority of our approach when compared to prior state-of-the-art method.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

A Closer Look at Weak Label Learning for Audio Events

86 - Ankit Shah , Anurag Kumar , Alexander G. Hauptmann 2018

Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labe led dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for weakly supervised training of audio events. The approach follows some basic design principle desirable in a learning method relying on weakly labeled audio. We then describe important characteristics, which naturally arise in weakly supervised learning of sound events. We show how these aspects of weak labels affect the generalization of models. More specifically, we study how characteristics such as label density and corruption of labels affects weakly supervised training for audio events. We also study the feasibility of directly obtaining weak labeled data from the web without any manual label and compare it with a dataset which has been manually labeled. The analysis and understanding of these factors should be taken into picture in the development of future weak label learning methods. Audioset, a large scale weakly labeled dataset for sound events is used in our experiments.

أنظمة الصوت في الحاسوب التعلم الآلي الوسائط المتعددة

Content-based Representations of audio using Siamese neural networks

314 - Pranay Manocha , Rohan Badlani , Anurag Kumar 2017

In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. This problem is similar to the problem of query by example of audio, which aim s to retrieve media samples from a database, which are similar to the user-provided example. We propose a novel approach which encodes the audio into a vector representation using Siamese Neural Networks. The goal is to obtain an encoding similar for files belonging to the same audio class, thus allowing retrieval of semantically similar audio. Using simple similarity measures such as those based on simple euclidean distance and cosine similarity we show that these representations can be very effectively used for retrieving recordings similar in audio content.

أنظمة الصوت في الحاسوب استرجاع المعلومات معالجة الصوت والكلام

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

311 - Benjamin Elizalde , Anurag Kumar , Ankit Shah 2016

In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specif ic to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91.

أنظمة الصوت في الحاسوب

An Analysis on the Inter-Cell Station Dependency Probability in an IEEE 802.11 Infrastructure WLANs

126 - Albert Sunny , Joy Kuri , Anurag Kumar 2013

In this document, we are primarily interested in computing the probabilities of various types of dependencies that can occur in a multi-cell infrastructure network.

بنية الشبكات والإنترنت نظرية المعلومات نظرية المعلومات

Analytical Models for Energy Consumption in Infrastructure WLAN STAs Carrying TCP Traffic

87 - Pranav Agrawal 2009

We develop analytical models for estimating the energy spent by stations (STAs) in infrastructure WLANs when performing TCP controlled file downloads. We focus on the energy spent in radio communication when the STAs are in the Continuously Active Mo de (CAM), or in the static Power Save Mode (PSM). Our approach is to develop accurate models for obtaining the fraction of times the STA radios spend in idling, receiving and transmitting. We discuss two traffic models for each mode of operation: (i) each STA performs one large file download, and (ii) the STAs perform short file transfers. We evaluate the rate of STA energy expenditure with long file downloads, and show that static PSM is worse than just using CAM. For short file downloads we compute the number of file downloads that can be completed with given battery capacity, and show that PSM performs better than CAM for this case. We provide a validation of our analytical models using the NS-2 simulator. In contrast to earlier work on analytical modeling of PSM, our models that capture the details of the interactions between the 802.11 MAC in PSM and certain aspects of TCP.

بنية الشبكات والإنترنت الأداء

Optimal Routing and Power Control for a Single Cell, Dense, Ad Hoc Wireless Network

270 - Venkatesh Ramaiyan , Anurag Kumar , 2009

We consider a dense, ad hoc wireless network, confined to a small region. The wireless network is operated as a single cell, i.e., only one successful transmission is supported at a time. Data packets are sent between sourcedestination pairs by multi hop relaying. We assume that nodes self-organise into a multihop network such that all hops are of length d meters, where d is a design parameter. There is a contention based multiaccess scheme, and it is assumed that every node always has data to send, either originated from it or a transit packet (saturation assumption). In this scenario, we seek to maximize a measure of the transport capacity of the network (measured in bit-meters per second) over power controls (in a fading environment) and over the hop distance d, subject to an average power constraint. We first argue that for a dense collection of nodes confined to a small region, single cell operation is efficient for single user decoding transceivers. Then, operating the dense ad hoc wireless network (described above) as a single cell, we study the hop length and power control that maximizes the transport capacity for a given network power constraint.

بنية الشبكات والإنترنت نظرية المعلومات نظرية المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد