Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

DreamDrug - A crowdsourced NER dataset for detecting drugs in darknet markets

Dreamdrug - مجموعة بيانات NER Growdsourced للكشف عن المخدرات في أسواق Darknet

623 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present DreamDrug, a crowdsourced dataset for detecting mentions of drugs in noisy user-generated item listings from darknet markets. Our dataset contains nearly 15,000 manually annotated drug entities in over 3,500 item listings scraped from the darknet market platform DreamMarket'' in 2017. We also train and evaluate baseline models for detecting these entities, using contextual language models fine-tuned in a few-shot setting and on the full dataset, and examine the effect of pretraining on in-domain unannotated corpora.

References used

https://aclanthology.org/

rate research

tWT--WT: A Dataset to Assert the Role of Target Entities for Detecting Stance of Tweets

679 - Association for Computation Linguistics 2021 مقالة

The stance detection task aims at detecting the stance of a tweet or a text for a target. These targets can be named entities or free-form sentences (claims). Though the task involves reasoning of the tweet with respect to a target, we find that it i s possible to achieve high accuracy on several publicly available Twitter stance detection datasets without looking at the target sentence. Specifically, a simple tweet classification model achieved human-level performance on the WT--WT dataset and more than two-third accuracy on various other datasets. We investigate the existence of biases in such datasets to find the potential spurious correlations of sentiment-stance relations and lexical choice associated with the stance category. Furthermore, we propose a new large dataset free of such biases and demonstrate its aptness on the existing stance detection systems. Our empirical findings show much scope for research on the stance detection task and proposes several considerations for creating future stance detection datasets.

assert the role stance detection datasets أكد الدور مجموعات بيانات الكشف عن الموقف صناعة حمض الفوسفور

RED: A Novel Dataset for Romanian Emotion Detection from Tweets

1280 - Association for Computation Linguistics 2021 مقالة

In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated with five types of emotions: joy, fear, sadness, anger and neutral, with the intent of being used for opinion mining and analysis tasks. In this article we present some features of our novel dataset, and create a benchmark to achieve the first supervised machine learning model for automatic Emotion Detection in Romanian short texts. We investigate the performance of four classical machine learning models: Multinomial Naive Bayes, Logistic Regression, Support Vector Classification and Linear Support Vector Classification. We also investigate more modern approaches like fastText, which makes use of subword information. Lastly, we fine-tune the Romanian BERT for text classification and our experiments show that the BERT-based model has the best performance for the task of Emotion Detection from Romanian tweets. Keywords: Emotion Detection, Twitter, Romanian, Supervised Machine Learning

التبعيات العالمية romanian emotion detection support vector classification الكشف عن العاطفة الرومانية دعم تصنيف ناقلات صناعة حمض الفوسفور

FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German

758 - Association for Computation Linguistics 2021 مقالة

As the world continues to fight the COVID-19 pandemic, it is simultaneously fighting an infodemic' -- a flood of disinformation and spread of conspiracy theories leading to health threats and the division of society. To combat this infodemic, there i s an urgent need for benchmark datasets that can help researchers develop and evaluate models geared towards automatic detection of disinformation. While there are increasing efforts to create adequate, open-source benchmark datasets for English, comparable resources are virtually unavailable for German, leaving research for the German language lagging significantly behind. In this paper, we introduce the new benchmark dataset FANG-COVID consisting of 28,056 real and 13,186 fake German news articles related to the COVID-19 pandemic as well as data on their propagation on Twitter. Furthermore, we propose an explainable textual- and social context-based model for fake news detection, compare its performance to black-box'' models and perform feature ablation to assess the relative importance of human-interpretable features in distinguishing fake news from authentic news.

large-scale benchmark dataset benchmark dataset benchmark dataset fang-covid مجموعة البيانات القياسية واسعة النطاق معيار DataSet. معيار DataSet Fang-Covid صناعة حمض الفوسفور المزيد..

A Dataset for Research on Modelling Depression Severity in Online Forum Data

1127 - Association for Computation Linguistics 2021 مقالة

People utilize online forums to either look for information or to contribute it. Because of their growing popularity, certain online forums have been created specifically to provide support, assistance, and opinions for people suffering from mental i llness. Depression is one of the most frequent psychological illnesses worldwide. People communicate more with online forums to find answers for their psychological disease. However, there is no mechanism to measure the severity of depression in each post and give higher importance to those who are diagnosed more severely depressed. Despite the fact that numerous researches based on online forum data and the identification of depression have been conducted, the severity of depression is rarely explored. In addition, the absence of datasets will stymie the development of novel diagnostic procedures for practitioners. From this study, we offer a dataset to support research on depression severity evaluation. The computational approach to measure an automatic process, identified severity of depression here is quite novel approach. Nonetheless, this elaborate measuring severity of depression in online forum posts is needed to ensure the measurement scales used in our research meets the expected norms of scientific research.

modelling depression severity online forum data modelling depression نمذجة الشدة الاكتئاب بيانات المنتدى عبر الإنترنت نمذجة الاكتئاب صناعة حمض الفوسفور المزيد..

A Large-Scale English Multi-Label Twitter Dataset for Cyberbullying and Online Abuse Detection

820 - Association for Computation Linguistics 2021 مقالة

In this paper, we introduce a new English Twitter-based dataset for cyberbullying detection and online abuse. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabiliti es of various forms of bullying and offensive content, including insult, trolling, profanity, sarcasm, threat, porn and exclusion. We recruited a pool of 17 annotators to perform fine-grained annotation on the dataset with each tweet annotated by three annotators. All our annotators are high school educated and frequent users of social media. Inter-rater agreement for the dataset as measured by Krippendorff's Alpha is 0.67. Analysis performed on the dataset confirmed common cyberbullying themes reported by other studies and revealed interesting relationships between the classes. The dataset was used to train a number of transformer-based deep learning models returning impressive results.

online abuse detection large-scale english multi-label english multi-label twitter اكتشاف إساءة الاستخدام عبر الإنترنت الترمية الإنجليزية متعددة الواسعة الإنجليزية متعددة التسمية تويتر صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

DreamDrug - A crowdsourced NER dataset for detecting drugs in darknet markets

Dreamdrug - مجموعة بيانات NER Growdsourced للكشف عن المخدرات في أسواق Darknet

Ask ChatGPT about the research

Read More

suggested questions