أدى فجر العصر الرقمي إلى زيادة متطلبات موارد البحوث الرقمية، والتي يجب معالجتها بسرعة ومعالجتها بواسطة أجهزة الكمبيوتر.نظرا لكمية البيانات التي تم إنشاؤها بواسطة عملية الرقمنة هذه، أصبح تصميم الأدوات التي تمكن تحليل وإدارة البيانات والبيانات الوصفية موضوعا ذا صلة.في هذا السياق، يساهم الكائنات متعددة اللغات من استبيانات المسح (MCSQ) في إنشاء وتوزيع البيانات للعلوم والإعلان الاجتماعي (SSH) بعد مبادئ عادلة (غير قابلة للتحقيق، غير قابلة للوصول، قابل للتشغيل القابلة لإعادة الاستخدام)، وتوفر وظائف للمستخدمين النهائيينلا يعرف ذلك البرمجة من خلال واجهة سهلة الاستخدام.بمجرد تطبيق المرشحات المرغوبة في واجهة الرسم، يمكن للمستخدمين إنشاء موارد لغوية لمناطق البحث والترجمة، مثل ذكريات الترجمة، وبالتالي تسهيل الوصول إلى البيانات واستخدامها.
The dawn of the digital age led to increasing demands for digital research resources, which shall be quickly processed and handled by computers. Due to the amount of data created by this digitization process, the design of tools that enable the analysis and management of data and metadata has become a relevant topic. In this context, the Multilingual Corpus of Survey Questionnaires (MCSQ) contributes to the creation and distribution of data for the Social Sciences and Humanities (SSH) following FAIR (Findable, Accessible, Interoperable and Reusable) principles, and provides functionalities for end-users that are not acquainted with programming through an easy-to-use interface. By simply applying the desired filters in the graphic interface, users can build linguistic resources for the survey research and translation areas, such as translation memories, thus facilitating data access and usage.
References used
https://aclanthology.org/
We present the first annotated corpus for multilingual analysis of potentially unfair clauses in online Terms of Service. The data set comprises a total of 100 contracts, obtained from 25 documents annotated in four different languages: English, Germ
In this paper, we present work in progress aimed at the development of a new image dataset with annotated objects. The Multilingual Image Corpus consists of an ontology of visual objects (based on WordNet) and a collection of thematically related ima
Comment sections allow users to share their personal experiences, discuss and form different opinions, and build communities out of organic conversations. However, many comment sections present chronological ranking to all users. In this paper, I dis
Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages. Most of these models feature an important corpus sampling step in the process of accumulating training data in different languages, to en
In developing an online question-answering system for the medical domains, natural language inference (NLI) models play a central role in question matching and intention detection. However, which models are best for our datasets? Manually selecting o