HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

64 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ivan Kiskin

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ivan Kiskin - Adam D. Cobb - Lawrence Wang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Mosquitoes are the only known vector of malaria, which leads to hundreds of thousands of deaths each year. Understanding the number and location of potential mosquito vectors is of paramount importance to aid the reduction of malaria transmission cases. In recent years, deep learning has become widely used for bioacoustic classification tasks. In order to enable further research applications in this field, we release a new dataset of mosquito audio recordings. With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events. We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels. We hope this will become a vital resource for those researching all aspects of malaria, and add to the existing audio datasets for bioacoustic detection and signal processing.

قيم البحث

132 - Bowen Shi , Ming Sun , Krishna C. Puvvada 2020

We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been under-studied. We formulate few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a variety of meta-learning approaches, which are conventionally used to solve few-shot classification problem. Compared to supervised baselines, meta-learning models achieve superior performance, thus showing its effectiveness on generalization to new audio events. Our analysis including impact of initialization and domain discrepancy further validate the advantage of meta-learning approaches in few-shot AED.

التعلم الآلي أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition

161 - Yan Gao , Titouan Parcollet , Nicholas Lane 2020

Knowledge distillation has been widely used to compress existing deep learning models while preserving the performance on a wide range of applications. In the specific context of Automatic Speech Recognition (ASR), distillation from ensembles of acou stic models has recently shown promising results in increasing recognition performance. In this paper, we propose an extension of multi-teacher distillation methods to joint CTC-attention end-to-end ASR systems. We also introduce three novel distillation strategies. The core intuition behind them is to integrate the error rate metric to the teacher selection rather than solely focusing on the observed losses. In this way, we directly distill and optimize the student toward the relevant metric for speech recognition. We evaluate these strategies under a selection of training procedures on different datasets (TIMIT, Librispeech, Common Voice) and various languages (English, French, Italian). In particular, state-of-the-art error rates are reported on the Common Voice French, Italian and TIMIT datasets.

التعلم الآلي أنظمة الصوت في الحاسوب معالجة الصوت والكلام

End2End Acoustic to Semantic Transduction

274 - Valentin Pelloin , Nathalie Camelin , Antoine Laurent 2021

In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism. It reliably selects contextual acoustic features in order to hypothesize semantic contents. An initial architecture ca pable of extracting all pronounced words and concepts from acoustic spans is designed and tested. With a shallow fusion language model, this system reaches a 13.6 concept error rate (CER) and an 18.5 concept value error rate (CVER) on the French MEDIA corpus, achieving an absolute 2.8 points reduction compared to the state-of-the-art. Then, an original model is proposed for hypothesizing concepts and their values. This transduction reaches a 15.4 CER and a 21.6 CVER without any new type of context.

الحساب واللغة أنظمة الصوت في الحاسوب معالجة الصوت والكلام

EM-Based Channel Estimation from Crowd-Sourced RSSI Samples Corrupted by Noise and Interference

324 - Silvija Kokalj-Filipovic , Larry Greenstein 2015

We propose a method for estimating channel parameters from RSSI measurements and the lost packet count, which can work in the presence of losses due to both interference and signal attenuation below the noise floor. This is especially important in th e wireless networks, such as vehicular, where propagation model changes with the density of nodes. The method is based on Stochastic Expectation Maximization, where the received data is modeled as a mixture of distributions (no/low interference and strong interference), incomplete (censored) due to packet losses. The PDFs in the mixture are Gamma, according to the commonly accepted model for wireless signal and interference power. This approach leverages the loss count as additional information, hence outperforming maximum likelihood estimation, which does not use this information (ML-), for a small number of received RSSI samples. Hence, it allows inexpensive on-line channel estimation from ad-hoc collected data. The method also outperforms ML- on uncensored data mixtures, as ML- assumes that samples are from a single-mode PDF.

التعلم الآلي

A DIY data acquisition system for acoustic field measurements under harsh conditions

51 - Steffen Buchholz , Mathias Lemke , Julius Reiss 2020

Monitoring active volcanos is an ongoing and important task helping to understand and predict volcanic eruptions. In recent years, analysing the acoustic properties of eruptions became more relevant. We present an inexpensive, lightweight, portable, easy to use and modular acoustic data acquisition system for field measurements that can record data with up to 100~kHz. The system is based on a Raspberry Pi 3 B running a custom build bare metal operating system. It connects to an external analog - digital converter with the microphone sensor. A GPS receiver allows the logging of the position and in addition the recording of a very accurate time signal synchronously to the acoustic data. With that, it is possible for multiple modules to effectively work as a single microphone array. The whole system can be build with low cost and demands only minimal technical infrastructure. We demonstrate a possible use of such a microphone array by deploying 20 modules on the active volcano textit{Stromboli} in the Aeolian Islands by Sicily, Italy. We use the collected acoustic data to indentify the sound source position for all recorded eruptions.

الجيوفيزياء أنظمة الصوت في الحاسوب معالجة الصوت والكلام