New community

Subscribe to the gold package and get unlimited access to Shamra Academy

On Generative Spoken Language Modeling from Raw Audio

على النمذجة اللغة المنطوقة من الصوت من الصوت الخام

460 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

spoken language modeling generative spoken language raw audio نمذجة اللغة المنطوقة اللغة المنطوقة التوليدية الصوت الخام صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Abstract We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation. We set up baseline systems consisting of a discrete speech encoder (returning pseudo-text units), a generative language model (trained on pseudo- text), and a speech decoder (generating a waveform from pseudo-text) all trained without supervision and validate the proposed metrics with human evaluation. Across 3 speech encoders (CPC, wav2vec 2.0, HuBERT), we find that the number of discrete units (50, 100, or 200) matters in a task-dependent and encoder- dependent way, and that some combinations approach text-based systems.1

References used

https://aclanthology.org/

rate research

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

358 - Association for Computation Linguistics 2021 مقالة

The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existi ng data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual (x) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.

improve zero-shot spoken tasks improve zero-shot تحسين صفر النار المنطوقة المهام تحسين صفر النار صناعة حمض الفوسفور

A Study of The Effectiveness and Sound Quality in Audio Compression Algorithms

1955 - Tishreen University 2016 ورقة بحثية

The sound is an essential component of multimedia, and due to the needto be used in many life applications such as television broadcasting andcommunication programs, so it was necessary for the existence of audio signal processing techniquessuch as compressing, improving, and noisereduction. Data compression process aims to reduce the bit rate used, by doing encoding information using fewer bits than the original representation for transmitting and storing. By this process,the unnecessary information is determined and removed, that means it gives the compressed information for useable compression, which we need as a fundamental, not the minutest details. This research aims to study how to process sound and musical signal. It's a process that consists of a wide range of applications like coding and digital compression for the effective transport and storage on mobile phones and portable music players, modeling and reproduction of the sound of musical instruments and music halls and the harmonics of digital music, editing digital music, and classification of music content, and other things.

تحويل التجب المتقطع التعديل النبضي المرمز معدل أخذ العينات خوارزمية MPEG (Pulse code modulation (PCM Sample rate MPEG (Moving Pictures Experts Groups) Algorithm (Discrete Cosine Transform (DCT المزيد..

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

585 - Association for Computation Linguistics 2021 مقالة

Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languag es, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.

الانتباه المتكرر cross-lingual spoken language اللغة المنطوقة عبر اللغات صناعة حمض الفوسفور

Data science and knowledge extraction from raw data

1927 - شمرا 2019 محاضرة

تعرض المحاضرة شرح عن علم البيانات وعلاقته بعلم الإحصاء والتعلم الآلي وحالتين دراسيتين عن دور عالم البيانات في تصميم حلول تعتمد على استخراج المعرفة من حجم كبير من البيانات المتوفرة, كما يتم عرض أهم المهام في المؤتمرات العلمية التي يمكن المشاركة بها لطلاب المعلوماتية المهتمين بهذا المجال

Machine learning Artificial intelligence Statistics Data science

Using DWT to Include Digital Watermark in Audio

1260 - Aِl-Baath University 2016 ورقة بحثية

In this paper, we propose a new method to embed digital watermarking in audio files, using Discrete Wavelet Transform (DWT) and the way to extract the watermark data. The method efficiency is measured using Peak Signal –to-Noise Ratio (PSNR) , No rmalized Correlation Coefficient (NC). The advantage of our method is the robustness against several attacks and compression.

العلامة المائية الرقمية الصوت الرقمي تحويل الموجة المتقطع Digital Watermarking Digital audio DWT

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On Generative Spoken Language Modeling from Raw Audio

على النمذجة اللغة المنطوقة من الصوت من الصوت الخام

Ask ChatGPT about the research

Read More

suggested questions