توضح هذه الورقة النظام الذي طورته STATITATIAIRE D'Analyze StatistIck Des Tyses لتحديد الهوية الحالية (DLI) المهمة المشتركة لعام 2021. هذه المهمة صعبة للغاية لأن المواد تتكون من تعليقات YouTube قصيرة، مكتوبة في البرنامج النصي الروماني، من ثلاثةلغات Dravidian ذات الصلة ارتباطا وثيقا، وفئة رابعة تتكون من العديد من اللغات الأخرى في أبعاد متفاوتة، كلها مختلطة مع اللغة الإنجليزية.يتكون النظام المقترح من نموذج الانحدار اللوجستي الذي يستخدمه كلما يتميز فقط N-Grams من الشخصيات بحد أقصى طول 5. بعد تحسينها من حيث ترجيح الميزة ومعلمات المصنف، فهو المرتبة الأولى في التحدي.تقوم التحليلات الإضافية التي أجريت تسطير أهمية التحسين، خاصة عندما يكون مقياس الفعالية هو الماكرو F1.
This paper describes the system developed by the Laboratoire d'analyse statistique des textes for the Dravidian Language Identification (DLI) shared task of VarDial 2021. This task is particularly difficult because the materials consists of short YouTube comments, written in Roman script, from three closely related Dravidian languages, and a fourth category consisting of several other languages in varying proportions, all mixed with English. The proposed system is made up of a logistic regression model which uses as only features n-grams of characters with a maximum length of 5. After its optimization both in terms of the feature weighting and the classifier parameters, it ranked first in the challenge. The additional analyses carried out underline the importance of optimization, especially when the measure of effectiveness is the Macro-F1.
References used
https://aclanthology.org/
This paper presents a technique for the identification of participant slots in English language contracts. Taking inspiration from unsupervised slot extraction techniques, the system presented here uses a supervised approach to identify terms used to
Discourse segmentation and sentence-level discourse parsing play important roles for various NLP tasks to consider textual coherence. Despite recent achievements in both tasks, there is still room for improvement due to the scarcity of labeled data.
We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models at inference time. Similar to dynamic evaluation (Krause et al., 2018), HSO computes the gradient of the log-probability
This paper describes the model built for the SIGTYP 2021 Shared Task aimed at identifying 18 typologically different languages from speech recordings. Mel-frequency cepstral coefficients derived from audio files are transformed into spectrograms, whi
Learning a good latent representation is essential for text style transfer, which generates a new sentence by changing the attributes of a given sentence while preserving its content. Most previous works adopt disentangled latent representation learn