هندسة ميزة هي خطوة مهمة في خطوط أنابيب NLP الكلاسيكية، ولكن قد لا يكون مهندسو تعلم الآلات على بينة من الإشارات للبحث عنها عند معالجة نص اللغة الأجنبية.مجموعة أدوات الاستخراج الروسية (RFET) هي مجموعة من مكتبات استخراج الميزات المجمعة لسهولة الاستخدام من قبل المهندسين الذين لا يتحدثون الروسية.تتضمن مجموعة الميزات الحالية ل RFET ميزات تنطبق على الأنواع عبر وسائل التواصل الاجتماعي للنص ومهام العلوم الاجتماعية الحاسوبية.نوضح فعالية الأداة باستخدامه في مهمة تحديد سمة الشخصية.قارنا أداء آلات ناقلات الدعم (SVMS) المدربين مع وبدون الميزات التي توفرها RFET؛نحن أيضا قارنها مع SVM مع ميزات التضمين العصبية الناتجة عن عقوبة المرور.
Feature engineering is an important step in classical NLP pipelines, but machine learning engineers may not be aware of the signals to look for when processing foreign language text. The Russian Feature Extraction Toolkit (RFET) is a collection of feature extraction libraries bundled for ease of use by engineers who do not speak Russian. RFET's current feature set includes features applicable to social media genres of text and to computational social science tasks. We demonstrate the effectiveness of the tool by using it in a personality trait identification task. We compare the performance of Support Vector Machines (SVMs) trained with and without the features provided by RFET; we also compare it to a SVM with neural embedding features generated by Sentence-BERT.
References used
https://aclanthology.org/
This research aims at showing the importance of the child drama on
activating child's imagination and developing his creative talents and
abilities. It also aims at tracing drama history, its origins, and kinds in
poetry, prose and drama fields. It also aims at showing the effect of
school drama on the development of child personality and activating the
processes of creation and technical creativity.
Endowing a task-oriented dialogue system with adaptiveness to user personality can greatly help improve the performance of a dialogue task. However, such a dialogue system can be practically challenging to implement, because it is unclear how user pe
The paper describes the TenTrans's submissions to the WMT 2021 Efficiency Shared Task. We explore training a variety of smaller compact transformer models using the teacher-student setup. Our model is trained by our self-developed open-source multili
We present an open-source toolkit for Danish Natural Language Processing, enabling easy access to Danish NLP's latest advancements. The toolkit features wrapper-functions for loading models and datasets in a unified way using third-party NLP framewor
The paper reports the results of a translationese study of literary texts based on translated and non-translated Russian. We aim to find out if translations deviate from non-translated literary texts, and if the established differences can be attribu