Do you want to publish a course? Click here

The Multilingual Corpus of Survey Questionnaires Query Interface

وجعة متعددة اللغات من استبيانات الاستبيان واجهة الاستعلام

299   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

The dawn of the digital age led to increasing demands for digital research resources, which shall be quickly processed and handled by computers. Due to the amount of data created by this digitization process, the design of tools that enable the analysis and management of data and metadata has become a relevant topic. In this context, the Multilingual Corpus of Survey Questionnaires (MCSQ) contributes to the creation and distribution of data for the Social Sciences and Humanities (SSH) following FAIR (Findable, Accessible, Interoperable and Reusable) principles, and provides functionalities for end-users that are not acquainted with programming through an easy-to-use interface. By simply applying the desired filters in the graphic interface, users can build linguistic resources for the survey research and translation areas, such as translation memories, thus facilitating data access and usage.



References used
https://aclanthology.org/
rate research

Read More

We present the first annotated corpus for multilingual analysis of potentially unfair clauses in online Terms of Service. The data set comprises a total of 100 contracts, obtained from 25 documents annotated in four different languages: English, Germ an, Italian, and Polish. For each contract, potentially unfair clauses for the consumer are annotated, for nine different unfairness categories. We show how a simple yet efficient annotation projection technique based on sentence embeddings could be used to automatically transfer annotations across languages.
In this paper, we present work in progress aimed at the development of a new image dataset with annotated objects. The Multilingual Image Corpus consists of an ontology of visual objects (based on WordNet) and a collection of thematically related ima ges annotated with segmentation masks and object classes. We identified 277 dominant classes and 1,037 parent and attribute classes, and grouped them into 10 thematic domains such as sport, medicine, education, food, security, etc. For the selected classes a large-scale web image search is being conducted in order to compile a substantial collection of high-quality copyright free images. The focus of the paper is the annotation protocol which we established to facilitate the annotation process: the Ontology of visual objects and the conventions for image selection and for object segmentation. The dataset is designed both for image classification and object detection and for semantic segmentation. In addition, the object annotations will be supplied with multilingual descriptions by using freely available wordnets.
Comment sections allow users to share their personal experiences, discuss and form different opinions, and build communities out of organic conversations. However, many comment sections present chronological ranking to all users. In this paper, I dis cuss personalization approaches in comment sections based on different objectives for newsrooms and researchers to consider. I propose algorithmic and interface designs when personalizing the presentation of comments based on different objectives including relevance, diversity, and education/background information. I further explain how transparency, user control, and comment type diversity could help users most benefit from the personalized interacting experience.
Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages. Most of these models feature an important corpus sampling step in the process of accumulating training data in different languages, to en sure that the signal from better resourced languages does not drown out poorly resourced ones. In this study, we train multiple multilingual recurrent language models, based on the ELMo architecture, and analyse both the effect of varying corpus size ratios on downstream performance, as well as the performance difference between monolingual models for each language, and broader multilingual language models. As part of this effort, we also make these trained models available for public use.
In developing an online question-answering system for the medical domains, natural language inference (NLI) models play a central role in question matching and intention detection. However, which models are best for our datasets? Manually selecting o r tuning a model is time-consuming. Thus we experiment with automatically optimizing the model architectures on the task at hand via neural architecture search (NAS). First, we formulate a novel architecture search space based on the previous NAS literature, supporting cross-sentence attention (cross-attn) modeling. Second, we propose to modify the ENAS method to accelerate and stabilize the search results. We conduct extensive experiments on our two medical NLI tasks. Results show that our system can easily outperform the classical baseline models. We compare different NAS methods and demonstrate our approach provides the best results.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا