Research papers, master and doctoral theses about إنشاء

RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models

328 - Association for Computation Linguistics 2021 مقالة

To audit the robustness of named entity recognition (NER) models, we propose RockNER, a simple yet effective method to create natural adversarial examples. Specifically, at the entity level, we replace target entities with other entities of the same semantic class in Wikidata; at the context level, we use pre-trained language models (e.g., BERT) to generate word substitutions. Together, the two levels of at- tack produce natural adversarial examples that result in a shifted distribution from the training data on which our target models have been trained. We apply the proposed method to the OntoNotes dataset and create a new benchmark named OntoRock for evaluating the robustness of existing NER models via a systematic evaluation protocol. Our experiments and analysis reveal that even the best model has a significant performance drop, and these models seem to memorize in-domain entity patterns instead of reasoning from the context. Our work also studies the effects of a few simple data augmentation methods to improve the robustness of NER models.

الأساس المنطقي الاستخراجي create natural adversarial إنشاء الخصم الطبيعي صناعة حمض الفوسفور

Corpus Creation and Language Identification in Low-Resource Code-Mixed Telugu-English Text

169 - Association for Computation Linguistics 2021 مقالة

Code-Mixing (CM) is a common phenomenon in multilingual societies. CM plays a significant role in technology and medical fields where terminologies in the native language are not available or known. Language Identification (LID) of the CM data will h elp solve NLP tasks such as Spell Checking, Named Entity Recognition, Part-Of-Speech tagging, and Semantic Parsing. In the current era of machine learning, a common problem to the above-mentioned tasks is the availability of Learning data to train models. In this paper, we introduce two Telugu-English CM manually annotated datasets (Twitter dataset and Blog dataset). The Twitter dataset contains more romanization variability and misspelled words than the blog dataset. We compare across various classification models and perform extensive bench-marking using both Classical and Deep Learning Models for LID compared to existing models. We propose two architectures for language classification (Telugu and English) in CM data: (1) Word Level Classification (2) Sentence Level word-by-word Classification and compare these approaches presenting two strong baselines for LID on these datasets.

code-mixed telugu-english text corpus creation low-resource code-mixed telugu-english نص خلط رمز التيلجو إنشاء كوربوس Low-Resource Code-Mixed Telugu English صناعة حمض الفوسفور المزيد..

Dynabench: Rethinking Benchmarking in NLP

421 - Association for Computation Linguistics 2021 مقالة

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model wil l misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field.

rethinking benchmarking dataset creation dynabench إعادة التفكير في المعيار إنشاء DataSet. Dynabench. صناعة حمض الفوسفور المزيد..

Grammatical Error Generation Based on Translated Fragments

277 - Association for Computation Linguistics 2021 مقالة

We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a wider rang e of non-native style language in comparison to a state-of-the-art baseline model. We carry out quantitative and qualitative evaluation. Our method is shown to outperform the baseline on data with a high proportion of errors.

error generation based generation based based on translated إنشاء خطأ الجيل مقره بناء على ترجمتها صناعة حمض الفوسفور المزيد..

Parallel Text Alignment and Monolingual Parallel Corpus Creation from Philosophical Texts for Text Simplification

210 - Association for Computation Linguistics 2021 مقالة

Text simplification is a growing field with many potential useful applications. Training text simplification algorithms generally requires a lot of annotated data, however there are not many corpora suitable for this task. We propose a new unsupervis ed method for aligning text based on Doc2Vec embeddings and a new alignment algorithm, capable of aligning texts at different levels. Initial evaluation shows promising results for the new approach. We used the newly developed approach to create a new monolingual parallel corpus composed of the works of English early modern philosophers and their corresponding simplified versions.

creation from philosophical parallel corpus creation philosophical texts إنشاء من الفلسفية موازية إنشاء كوربوس النصوص الفلسفية صناعة حمض الفوسفور المزيد..

A Sliding-Window Approach to Automatic Creation of Meeting Minutes

476 - Association for Computation Linguistics 2021 مقالة

Meeting minutes record any subject matter discussed, decisions reached and actions taken at the meeting. The importance of automatic minuting cannot be overstated. In this paper, we present a sliding window approach to automatic generation of meeting minutes. It aims at addressing issues pertaining to the nature of spoken text, including the lengthy transcript and lack of document structure, which make it difficult to identify salient content to be included in meeting minutes. Our approach combines a sliding-window approach and a neural abstractive summarizer to navigate through the raw transcript to find salient content. The approach is evaluated on transcripts of natural meeting conversations, where we compare results obtained for human transcripts and two versions of automatic transcripts and discuss how and to what extent the summarizer succeeds at capturing salient content.

automatic creation meeting minutes meeting إنشاء تلقائي محضر اجتماع لقاء صناعة حمض الفوسفور المزيد..

Emotion-Aware, Emotion-Agnostic, or Automatic: Corpus Creation Strategies to Obtain Cognitive Event Appraisal Annotations

351 - Association for Computation Linguistics 2021 مقالة

Appraisal theories explain how the cognitive evaluation of an event leads to a particular emotion. In contrast to theories of basic emotions or affect (valence/arousal), this theory has not received a lot of attention in natural language processing. Yet, in psychology it has been proven powerful: Smith and Ellsworth (1985) showed that the appraisal dimensions attention, certainty, anticipated effort, pleasantness, responsibility/control and situational control discriminate between (at least) 15 emotion classes. We study different annotation strategies for these dimensions, based on the event-focused enISEAR corpus (Troiano et al., 2019). We analyze two manual annotation settings: (1) showing the text to annotate while masking the experienced emotion label; (2) revealing the emotion associated with the text. Setting 2 enables the annotators to develop a more realistic intuition of the described event, while Setting 1 is a more standard annotation procedure, purely relying on text. We evaluate these strategies in two ways: by measuring inter-annotator agreement and by fine- tuning RoBERTa to predict appraisal variables. Our results show that knowledge of the emotion increases annotators' reliability. Further, we evaluate a purely automatic rule-based labeling strategy (inferring appraisal from annotated emotion classes). Training on automatically assigned labels leads to a competitive performance of our classifier, even when tested on manual annotations. This is an indicator that it might be possible to automatically create appraisal corpora for every domain for which emotion corpora already exist.

obtain cognitive event corpus creation strategies obtain cognitive الحصول على الحدث المعرفي استراتيجيات إنشاء كوربوس الحصول على المعرفي صناعة حمض الفوسفور المزيد..

Electronic Libraries: Planning to Establish an Academic Electronic Library

1443 - Damascus University 2008 ورقة بحثية

Libraries and information centers are characterized with continuing changes and successive developments, notably the emergence of electronic libraries, which were an inevitable consequence of the evolution of information and communications. This h as led to radical shifts in the mass conservation and processing of information, and in the media-borne, and changed the forms of organization and information exchange, but no doubt that was a positive impact in providing appropriate information services and advanced to the beneficiaries. Electronic libraries has made it possible to provide the services that could not be provided and carried out by traditional libraries, and because of the features that are unique to the electronic library, making the presence of great importance both to the users, librarians and publishers.

المكتبة الإلكترونية مكتبة إلكترونية أكاديمية إنشاء تخطيط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد