New community

Subscribe to the gold package and get unlimited access to Shamra Academy

HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

64 0 0.0 ( 0 )

Download Cite

Added by Ivan Kiskin

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Ivan Kiskin - Adam D. Cobb - Lawrence Wang

Machine Learning Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Mosquitoes are the only known vector of malaria, which leads to hundreds of thousands of deaths each year. Understanding the number and location of potential mosquito vectors is of paramount importance to aid the reduction of malaria transmission cases. In recent years, deep learning has become widely used for bioacoustic classification tasks. In order to enable further research applications in this field, we release a new dataset of mosquito audio recordings. With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events. We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels. We hope this will become a vital resource for those researching all aspects of malaria, and add to the existing audio datasets for bioacoustic detection and signal processing.

rate research

Few-shot acoustic event detection via meta-learning

132 - Bowen Shi , Ming Sun , Krishna C. Puvvada 2020

We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been under-studied. We formulate few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a variety of meta-learning approaches, which are conventionally used to solve few-shot classification problem. Compared to supervised baselines, meta-learning models achieve superior performance, thus showing its effectiveness on generalization to new audio events. Our analysis including impact of initialization and domain discrepancy further validate the advantage of meta-learning approaches in few-shot AED.

Machine Learning Sound Audio and Speech Processing

Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition

161 - Yan Gao , Titouan Parcollet , Nicholas Lane 2020

Knowledge distillation has been widely used to compress existing deep learning models while preserving the performance on a wide range of applications. In the specific context of Automatic Speech Recognition (ASR), distillation from ensembles of acoustic models has recently shown promising results in increasing recognition performance. In this paper, we propose an extension of multi-teacher distillation methods to joint CTC-attention end-to-end ASR systems. We also introduce three novel distillation strategies. The core intuition behind them is to integrate the error rate metric to the teacher selection rather than solely focusing on the observed losses. In this way, we directly distill and optimize the student toward the relevant metric for speech recognition. We evaluate these strategies under a selection of training procedures on different datasets (TIMIT, Librispeech, Common Voice) and various languages (English, French, Italian). In particular, state-of-the-art error rates are reported on the Common Voice French, Italian and TIMIT datasets.

Machine Learning Sound Audio and Speech Processing

End2End Acoustic to Semantic Transduction

274 - Valentin Pelloin , Nathalie Camelin , Antoine Laurent 2021

In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism. It reliably selects contextual acoustic features in order to hypothesize semantic contents. An initial architecture capable of extracting all pronounced words and concepts from acoustic spans is designed and tested. With a shallow fusion language model, this system reaches a 13.6 concept error rate (CER) and an 18.5 concept value error rate (CVER) on the French MEDIA corpus, achieving an absolute 2.8 points reduction compared to the state-of-the-art. Then, an original model is proposed for hypothesizing concepts and their values. This transduction reaches a 15.4 CER and a 21.6 CVER without any new type of context.

Computation and Language Sound Audio and Speech Processing

EM-Based Channel Estimation from Crowd-Sourced RSSI Samples Corrupted by Noise and Interference

120 - Silvija Kokalj-Filipovic , Larry Greenstein 2015

We propose a method for estimating channel parameters from RSSI measurements and the lost packet count, which can work in the presence of losses due to both interference and signal attenuation below the noise floor. This is especially important in the wireless networks, such as vehicular, where propagation model changes with the density of nodes. The method is based on Stochastic Expectation Maximization, where the received data is modeled as a mixture of distributions (no/low interference and strong interference), incomplete (censored) due to packet losses. The PDFs in the mixture are Gamma, according to the commonly accepted model for wireless signal and interference power. This approach leverages the loss count as additional information, hence outperforming maximum likelihood estimation, which does not use this information (ML-), for a small number of received RSSI samples. Hence, it allows inexpensive on-line channel estimation from ad-hoc collected data. The method also outperforms ML- on uncensored data mixtures, as ML- assumes that samples are from a single-mode PDF.

Machine Learning

A DIY data acquisition system for acoustic field measurements under harsh conditions

51 - Steffen Buchholz , Mathias Lemke , Julius Reiss 2020

Monitoring active volcanos is an ongoing and important task helping to understand and predict volcanic eruptions. In recent years, analysing the acoustic properties of eruptions became more relevant. We present an inexpensive, lightweight, portable, easy to use and modular acoustic data acquisition system for field measurements that can record data with up to 100~kHz. The system is based on a Raspberry Pi 3 B running a custom build bare metal operating system. It connects to an external analog - digital converter with the microphone sensor. A GPS receiver allows the logging of the position and in addition the recording of a very accurate time signal synchronously to the acoustic data. With that, it is possible for multiple modules to effectively work as a single microphone array. The whole system can be build with low cost and demands only minimal technical infrastructure. We demonstrate a possible use of such a microphone array by deploying 20 modules on the active volcano textit{Stromboli} in the Aeolian Islands by Sicily, Italy. We use the collected acoustic data to indentify the sound source position for all recorded eruptions.

Geophysics Sound Audio and Speech Processing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions