Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Classification of Censored Tweets in Chinese Language using XLNet

تصنيف تغريدات الرقابة باللغة الصينية باستخدام XLNet

719 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In the growth of today's world and advanced technology, social media networks play a significant role in impacting human lives. Censorship is the overthrowing of speech, public transmission, or other details that play a vast role in social media. The content may be considered harmful, sensitive, or inconvenient. Authorities like institutes, governments, and other organizations conduct Censorship. This paper has implemented a model that helps classify censored and uncensored tweets as a binary classification. The paper describes submission to the Censorship shared task of the NLP4IF 2021 workshop. We used various transformer-based pre-trained models, and XLNet outputs a better accuracy among all. We fine-tuned the model for better performance and achieved a reasonable accuracy, and calculated other performance metrics.

References used

https://aclanthology.org/

rate research

Period Classification in Chinese Historical Texts

738 - Association for Computation Linguistics 2021 مقالة

In this study, we study language change in Chinese Biji by using a classification task: classifying Ancient Chinese texts by time periods. Specifically, we focus on a unique genre in classical Chinese literature: Biji (literally notebook'' or brush n otes''), i.e., collections of anecdotes, quotations, etc., anything authors consider noteworthy, Biji span hundreds of years across many dynasties and conserve informal language in written form. For these reasons, they are regarded as a good resource for investigating language change in Chinese (Fang, 2010). In this paper, we create a new dataset of 108 Biji across four dynasties. Based on the dataset, we first introduce a time period classification task for Chinese. Then we investigate different feature representation methods for classification. The results show that models using contextualized embeddings perform best. An analysis of the top features chosen by the word n-gram model (after bleaching proper nouns) confirms that these features are informative and correspond to observations and assumptions made by historical linguists.

ancient chinese texts chinese historical texts classifying ancient chinese النصوص الصينية القديمة النصوص التاريخية الصينية تصنيف الصينيين القديم صناعة حمض الفوسفور المزيد..

Classification of COVID19 tweets using Machine Learning Approaches

725 - Association for Computation Linguistics 2021 مقالة

The reported work is a description of our participation in the Classification of COVID19 tweets containing symptoms'' shared task, organized by the Social Media Mining for Health Applications (SMM4H)'' workshop. The literature describes two machine l earning approaches that were used to build a three class classification system, that categorizes tweets related to COVID19, into three classes, viz., self-reports, non-personal reports, and literature/news mentions. The steps for pre-processing tweets, feature extraction, and the development of the machine learning models, are described extensively in the documentation. Both the developed learning models, when evaluated by the organizers, garnered F1 scores of 0.93 and 0.92 respectively.

machine learning approaches social media mining نهج التعلم الآلي تعدين وسائل التواصل الاجتماعي التعلم الالي صناعة حمض الفوسفور

Classification of Tweets Self-reporting Adverse Pregnancy Outcomes and Potential COVID-19 Cases Using RoBERTa Transformers

773 - Association for Computation Linguistics 2021 مقالة

This study describes our proposed model design for SMM4H 2021 shared tasks. We fine-tune the language model of RoBERTa transformers and their connecting classifier to complete the classification tasks of tweets for adverse pregnancy outcomes (Task 4) and potential COVID-19 cases (Task 5). The evaluation metric is F1-score of the positive class for both tasks. For Task 4, our best score of 0.93 exceeded the mean score of 0.925. For Task 5, our best of 0.75 exceeded the mean score of 0.745.

self-reporting adverse pregnancy tweets self-reporting adverse adverse pregnancy outcomes الإبلاغ عن الذات الحمل الضار تغريدات ذاتية التقارير الضارة نتائج الحمل الضارة صناعة حمض الفوسفور المزيد..

Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions

711 - Association for Computation Linguistics 2021 مقالة

In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked k nowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.

incorporating domain knowledge chinese medical questions classification of chinese دمج المعرفة المجال أسئلة طبية الصينية تصنيف الصينية صناعة حمض الفوسفور المزيد..

Multi-Label Classification of Chinese Humor Texts Using Hypergraph Attention Networks

692 - Association for Computation Linguistics 2021 مقالة

We use Hypergraph Attention Networks (HyperGAT) to recognize multiple labels of Chinese humor texts. We firstly represent a joke as a hypergraph. The sequential hyperedge and semantic hyperedge structures are used to construct hyperedges. Then, atten tion mechanisms are adopted to aggregate context information embedded in nodes and hyperedges. Finally, we use trained HyperGAT to complete the multi-label classification task. Experimental results on the Chinese humor multi-label dataset showed that HyperGAT model outperforms previous sequence-based (CNN, BiLSTM, FastText) and graph-based (Graph-CNN, TextGCN, Text Level GNN) deep learning models.

hypergraph attention networks chinese humor texts attention networks شبكات انتباه Hypergraph الفكاهة الصينية النصوص انتباه الشبكات صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Classification of Censored Tweets in Chinese Language using XLNet

تصنيف تغريدات الرقابة باللغة الصينية باستخدام XLNet

Ask ChatGPT about the research

Read More

suggested questions