تصف هذه الورقة النموذج المدمج للمهمة المشتركة SIGTYP 2021 التي تهدف إلى تحديد 18 لغة مختلفة عن تسجيلات الكلام.يتم تحويل معاملات CEPSTRAL Mel-تردد Mel المستمدة من الملفات الصوتية إلى طفرات، ثم تغذيها بعد ذلك في بنية CNN المستند إلى 50.حصل النموذج النهائي على التحقق من الصحة واختبار بدلة 0.73 و 0.53، على التوالي.
This paper describes the model built for the SIGTYP 2021 Shared Task aimed at identifying 18 typologically different languages from speech recordings. Mel-frequency cepstral coefficients derived from audio files are transformed into spectrograms, which are then fed into a ResNet-50-based CNN architecture. The final model achieved validation and test accuracies of 0.73 and 0.53, respectively.
References used
https://aclanthology.org/
This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-re
Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. There are no strict grammatical constraints observed in code-mixing, and it consists of non-standard variations of spelling. The linguistic
Frame semantic parsing is a semantic analysis task based on FrameNet which has received great attention recently. The task usually involves three subtasks sequentially: (1) target identification, (2) frame classification and (3) semantic role labelin
Precise information of word boundary can alleviate the problem of lexical ambiguity to improve the performance of natural language processing (NLP) tasks. Thus, Chinese word segmentation (CWS) is a fundamental task in NLP. Due to the development of p
Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction be