نقدم خوارزمية استنادا إلى محولات متعددة الطبقات لتحديد ردود الفعل الدوائية الضارة (ADR) في بيانات وسائل التواصل الاجتماعي.يعتمد نموذجنا على خصائص المشكلة وخصائص ASTDDings Word السياقي لاستخراج وجهات نظرتين من المستندات.ثم يتم تدريب المصنف على كل طريقة عرض لتسمية مجموعة من المستندات غير المستخدمة لاستخدامها كتهيئة لتصنيف جديد في الرأي الآخر.أخيرا، يتم تدريب المصنف التهيئي في كل طريقة عرض باستخدام أمثلة التدريب الأولي.قمنا بتقييم نموذجنا في أكبر مجموعة بيانات ADR المتاحة للجمهور.تشهد التجارب أن نموذجنا يتفوق بشكل كبير على النماذج القائمة على المحولات مسبقا على البيانات الخاصة بالمجال.
We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in the other view. Finally, the initialized classifier in each view is further trained using the initial training examples. We evaluated our model in the largest publicly available ADR dataset. The experiments testify that our model significantly outperforms the transformer-based models pretrained on domain-specific data.
References used
https://aclanthology.org/
User-generated texts include various types of stylistic properties, or noises. Such texts are not properly processed by existing morpheme analyzers or language models based on formal texts such as encyclopedias or news articles. In this paper, we pro
Unsupervised Data Augmentation (UDA) is a semisupervised technique that applies a consistency loss to penalize differences between a model's predictions on (a) observed (unlabeled) examples; and (b) corresponding noised' examples produced via data au
Adverse Drug Event (ADE) extraction models can rapidly examine large collections of social media texts, detecting mentions of drug-related adverse reactions and trigger medical investigations. However, despite the recent advances in NLP, it is curren
Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comp
In most of neural machine translation distillation or stealing scenarios, the highest-scoring hypothesis of the target model (teacher) is used to train a new model (student). If reference translations are also available, then better hypotheses (with