Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions

دمج معرفة المجال إلى محولات اللغة للحصول على تصنيف متعدد التسميات للأسئلة الطبية الصينية

744 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

incorporating domain knowledge chinese medical questions classification of chinese دمج المعرفة المجال أسئلة طبية الصينية تصنيف الصينية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked knowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.

References used

https://aclanthology.org/

rate research

Incorporating medical knowledge in BERT for clinical relation extraction

655 - Association for Computation Linguistics 2021 مقالة

In recent years pre-trained language models (PLM) such as BERT have proven to be very effective in diverse NLP tasks such as Information Extraction, Sentiment Analysis and Question Answering. Trained with massive general-domain text, these pre-traine d language models capture rich syntactic, semantic and discourse information in the text. However, due to the differences between general and specific domain text (e.g., Wikipedia versus clinic notes), these models may not be ideal for domain-specific tasks (e.g., extracting clinical relations). Furthermore, it may require additional medical knowledge to understand clinical text properly. To solve these issues, in this research, we conduct a comprehensive examination of different techniques to add medical knowledge into a pre-trained BERT model for clinical relation extraction. Our best model outperforms the state-of-the-art systems on the benchmark i2b2/VA 2010 clinical relation extraction dataset.

حقيقي clinical relation extraction استخراج العلاقة السريرية صناعة حمض الفوسفور

Multi-Label Classification of Chinese Humor Texts Using Hypergraph Attention Networks

717 - Association for Computation Linguistics 2021 مقالة

We use Hypergraph Attention Networks (HyperGAT) to recognize multiple labels of Chinese humor texts. We firstly represent a joke as a hypergraph. The sequential hyperedge and semantic hyperedge structures are used to construct hyperedges. Then, atten tion mechanisms are adopted to aggregate context information embedded in nodes and hyperedges. Finally, we use trained HyperGAT to complete the multi-label classification task. Experimental results on the Chinese humor multi-label dataset showed that HyperGAT model outperforms previous sequence-based (CNN, BiLSTM, FastText) and graph-based (Graph-CNN, TextGCN, Text Level GNN) deep learning models.

hypergraph attention networks chinese humor texts attention networks شبكات انتباه Hypergraph الفكاهة الصينية النصوص انتباه الشبكات صناعة حمض الفوسفور المزيد..

Multi-label Diagnosis Classification of Swedish Discharge Summaries -- ICD-10 Code Assignment Using KB-BERT

1088 - Association for Computation Linguistics 2021 مقالة

The International Classification of Diseases (ICD) is a system for systematically recording patients' diagnoses. Clinicians or professional coders assign ICD codes to patients' medical records to facilitate funding, research, and administration. In m ost health facilities, clinical coding is a manual, time-demanding task that is prone to errors. A tool that automatically assigns ICD codes to free-text clinical notes could save time and reduce erroneous coding. While many previous studies have focused on ICD coding, research on Swedish patient records is scarce. This study explored different approaches to pairing Swedish clinical notes with ICD codes. KB-BERT, a BERT model pre-trained on Swedish text, was compared to the traditional supervised learning models Support Vector Machines, Decision Trees, and K-nearest Neighbours used as the baseline. When considering ICD codes grouped into ten blocks, the KB-BERT was superior to the baseline models, obtaining an F1-micro of 0.80 and an F1-macro of 0.58. When considering the 263 full ICD codes, the KB-BERT was outperformed by all baseline models at an F1-micro and F1-macro of zero. Wilcoxon signed-rank tests showed that the performance differences between the KB-BERT and the baseline models were statistically significant.

swedish discharge summaries multi-label diagnosis classification discharge summaries ملخصات التفريغ السويدية تصنيف التشخيص متعدد التسميات ملخصات التفريغ صناعة حمض الفوسفور المزيد..

Classification of Censored Tweets in Chinese Language using XLNet

742 - Association for Computation Linguistics 2021 مقالة

In the growth of today's world and advanced technology, social media networks play a significant role in impacting human lives. Censorship is the overthrowing of speech, public transmission, or other details that play a vast role in social media. The content may be considered harmful, sensitive, or inconvenient. Authorities like institutes, governments, and other organizations conduct Censorship. This paper has implemented a model that helps classify censored and uncensored tweets as a binary classification. The paper describes submission to the Censorship shared task of the NLP4IF 2021 workshop. We used various transformer-based pre-trained models, and XLNet outputs a better accuracy among all. We fine-tuned the model for better performance and achieved a reasonable accuracy, and calculated other performance metrics.

chinese language tweets in chinese اللغة الصينية تويت باللغة الصينية صناعة حمض الفوسفور

DyLex: Incorporating Dynamic Lexicons into BERT for Sequence Labeling

1071 - Association for Computation Linguistics 2021 مقالة

Incorporating lexical knowledge into deep learning models has been proved to be very effective for sequence labeling tasks. However, previous works commonly have difficulty dealing with large-scale dynamic lexicons which often cause excessive matchin g noise and problems of frequent updates. In this paper, we propose DyLex, a plug-in lexicon incorporation approach for BERT based sequence labeling tasks. Instead of leveraging embeddings of words in the lexicon as in conventional methods, we adopt word-agnostic tag embeddings to avoid re-training the representation while updating the lexicon. Moreover, we employ an effective supervised lexical knowledge denoising method to smooth out matching noise. Finally, we introduce a col-wise attention based knowledge fusion mechanism to guarantee the pluggability of the proposed framework. Experiments on ten datasets of three tasks show that the proposed framework achieves new SOTA, even with very large scale lexicons.

sequence labeling tasks incorporating dynamic lexicons مهام تسلسل وضع التسلسل دمج المعجم الديناميكي صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions

دمج معرفة المجال إلى محولات اللغة للحصول على تصنيف متعدد التسميات للأسئلة الطبية الصينية

Ask ChatGPT about the research

Read More

suggested questions