ReINTEL Challenge 2020: Exploiting Transfer Learning Models for Reliable Intelligence Identification on Vietnamese Social Network Sites

148 0 0.0 ( 0 )

Download Cite

Added by Kim Thi-Thanh Nguyen

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Kim Thi-Thanh Nguyen - Kiet Van Nguyen

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper presents the system that we propose for the Reliable Intelligence Indentification on Vietnamese Social Network Sites (ReINTEL) task of the Vietnamese Language and Speech Processing 2020 (VLSP 2020) Shared Task. In this task, the VLSP 2020 provides a dataset with approximately 6,000 trainning news/posts annotated with reliable or unreliable labels, and a test set consists of 2,000 examples without labels. In this paper, we conduct experiments on different transfer learning models, which are bert4news and PhoBERT fine-tuned to predict whether the news is reliable or not. In our experiments, we achieve the AUC score of 94.52% on the private test set from ReINTELs organizers.

rate research

Leveraging Transfer Learning for Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL)

231 - Trung-Hieu Tran , Long Phan , Truong-Son Nguyen 2020

This paper proposed several transformer-based approaches for Reliable Intelligence Identification on Vietnamese social network sites at VLSP 2020 evaluation campaign. We exploit both of monolingual and multilingual pre-trained models. Besides, we utilize the ensemble method to improve the robustness of different approaches. Our team achieved a score of 0.9378 at ROC-AUC metric in the private test set which is competitive to other participants.

Computation and Language Artificial Intelligence Machine Learning

NLPBK at VLSP-2020 shared task: Compose transformer pretrained models for Reliable Intelligence Identification on Social network

139 - Thanh Chinh Nguyen , Van Nha Nguyen 2021

This paper describes our method for tuning a transformer-based pretrained model, to adaptation with Reliable Intelligence Identification on Vietnamese SNSs problem. We also proposed a model that combines bert-base pretrained models with some metadata features, such as the number of comments, number of likes, images of SNS documents,... to improved results for VLSP shared task: Reliable Intelligence Identification on Vietnamese SNSs. With appropriate training techniques, our model is able to achieve 0.9392 ROC-AUC on public test set and the final version settles at top 2 ROC-AUC (0.9513) on private test set.

Computation and Language Information Retrieval

Error Analysis for Vietnamese Named Entity Recognition on Deep Neural Network Models

123 - Binh An Nguyen , Kiet Van Nguyen , Ngan Luu-Thuy Nguyen 2019

In recent years, Vietnamese Named Entity Recognition (NER) systems have had a great breakthrough when using Deep Neural Network methods. This paper describes the primary errors of the state-of-the-art NER systems on Vietnamese language. After conducting experiments on BLSTM-CNN-CRF and BLSTM-CRF models with different word embeddings on the Vietnamese NER dataset. This dataset is provided by VLSP in 2016 and used to evaluate most of the current Vietnamese NER systems. We noticed that BLSTM-CNN-CRF gives better results, therefore, we analyze the errors on this model in detail. Our error-analysis results provide us thorough insights in order to increase the performance of NER for the Vietnamese language and improve the quality of the corpus in the future works.

Computation and Language Machine Learning Neural and Evolutionary Computing

Emotion Recognition for Vietnamese Social Media Text

76 - Vong Anh Ho , Duong Huynh-Cong Nguyen , Danh Hoang Nguyen 2019

Emotion recognition or emotion prediction is a higher approach or a special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of analysis in which the results are depicted in more expressions like sadness, enjoyment, anger, disgust, fear, and surprise. Emotion recognition plays a critical role in measuring the brand value of a product by recognizing specific emotions of customers comments. In this study, we have achieved two targets. First and foremost, we built a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with exactly 6,927 emotion-annotated sentences, contributing to emotion recognition research in Vietnamese which is a low-resource language in natural language processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC corpus. As a result, the CNN model achieved the highest performance with the weighted F1-score of 59.74%. Our corpus is available at our research website.

Computation and Language

BANANA at WNUT-2020 Task 2: Identifying COVID-19 Information on Twitter by Combining Deep Learning and Transfer Learning Models

82 - Tin Van Huynh , Luan Thanh Nguyen , Son T. Luu 2020

The outbreak COVID-19 virus caused a significant impact on the health of people all over the world. Therefore, it is essential to have a piece of constant and accurate information about the disease with everyone. This paper describes our prediction system for WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets. The dataset for this task contains size 10,000 tweets in English labeled by humans. The ensemble model from our three transformer and deep learning models is used for the final prediction. The experimental result indicates that we have achieved F1 for the INFORMATIVE label on our systems at 88.81% on the test set.

Computation and Language Social and Information Networks