Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Introducing CAD: the Contextual Abuse Dataset

تقديم CAD: DataSet الإساءة السياقية

871 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

contextual abuse dataset introducing cad contextual abuse بيانات الإساءة السياقية تقديم CAD. سوء المعاملة السياقية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic. Progress in automatically detecting and classifying abusive content is often held back by the lack of high quality and detailed datasets.We introduce a new dataset of primarily English Reddit entries which addresses several limitations of prior work. It (1) contains six conceptually distinct primary categories as well as secondary categories, (2) has labels annotated in the context of the conversation thread, (3) contains rationales and (4) uses an expert-driven group-adjudication process for high quality annotations. We report several baseline models to benchmark the work of future researchers. The annotated dataset, annotation guidelines, models and code are freely available.

References used

https://aclanthology.org/

rate research

Introducing A large Tunisian Arabizi Dialectal Dataset for Sentiment Analysis

572 - Association for Computation Linguistics 2021 مقالة

On various Social Media platforms, people, tend to use the informal way to communicate, or write posts and comments: their local dialects. In Africa, more than 1500 dialects and languages exist. Particularly, Tunisians talk and write informally using Latin letters and numbers rather than Arabic ones. In this paper, we introduce a large common-crawl-based Tunisian Arabizi dialectal dataset dedicated for Sentiment Analysis. The dataset consists of a total of 100k comments (about movies, politic, sport, etc.) annotated manually by Tunisian native speakers as Positive, negative and Neutral. We evaluate our dataset on sentiment analysis task using the Bidirectional Encoder Representations from Transformers (BERT) as a contextual language model in its multilingual version (mBERT) as an embedding technique then combining mBERT with Convolutional Neural Network (CNN) as classifier. The dataset is publicly available.

tunisian arabizi dialectal arabizi dialectal dataset الجدلي العربي التونسي عربيز DataSetal. صناعة حمض الفوسفور

The Swedish Winogender Dataset

766 - Association for Computation Linguistics 2021 مقالة

We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women betw een occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.

swedish winogender dataset swedish winogender english winogender benchmark سويدية وينوجندر DataSet. السويدية ينوجندر الإنجليزية ينوجندر المعايير صناعة حمض الفوسفور المزيد..

Enriching the E2E dataset

1022 - Association for Computation Linguistics 2021 مقالة

This study introduces an enriched version of the E2E dataset, one of the most popular language resources for data-to-text NLG. We extract intermediate representations for popular pipeline tasks such as discourse ordering, text structuring, lexicaliza tion and referring expression generation, enabling researchers to rapidly develop and evaluate their data-to-text pipeline systems. The intermediate representations are extracted by aligning non-linguistic and text representations through a process called delexicalization, which consists in replacing input referring expressions to entities/attributes with placeholders. The enriched dataset is publicly available.

dataset enriching nlg DataSet. إثراء NLG. صناعة حمض الفوسفور المزيد..

Introducing Information Retrieval for Biomedical Informatics Students

812 - Association for Computation Linguistics 2021 مقالة

Introducing biomedical informatics (BMI) students to natural language processing (NLP) requires balancing technical depth with practical know-how to address application-focused needs. We developed a set of three activities introducing introductory BM I students to information retrieval with NLP, covering document representation strategies and language models from TF-IDF to BERT. These activities provide students with hands-on experience targeted towards common use cases, and introduce fundamental components of NLP workflows for a wide variety of applications.

biomedical informatics students biomedical informatics introducing biomedical informatics طلاب المعلوماتية الطبية الحيوية المعلوماتية الطبية الحيوية تقديم المعلوماتية الطبية الحيوية صناعة حمض الفوسفور المزيد..

Introducing Mouse Actions into Interactive-Predictive Neural Machine Translation

715 - Association for Computation Linguistics 2021 مقالة

The quality of the translations generated by Machine Translation (MT) systems has highly improved through the years and but we are still far away to obtain fully automatic high-quality translations. To generate them and translators make use of Comput er-Assisted Translation (CAT) tools and among which we find the Interactive-Predictive Machine Translation (IPMT) systems. In this paper and we use bandit feedback as the main and only information needed to generate new predictions that correct the previous translations. The application of bandit feedback reduces significantly the number of words that the translator need to type in an IPMT session. In conclusion and the use of this technique saves useful time and effort to translators and its performance improves with the future advances in MT and so we recommend its application in the actuals IPMT systems.

introducing mouse actions interactive-predictive neural machine تقديم إجراءات الماوس الآلة العصبية التفاعلية صناعة حمض الفوسفور

Introducing CAD: the Contextual Abuse Dataset

تقديم CAD: DataSet الإساءة السياقية

Ask ChatGPT about the research

Read More

suggested questions