Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Comparative Analysis of Fine-tuned Deep Learning Language Models for ICD-10 Classification Task for Bulgarian Language

التحليل المقارن لنماذج لغة التعلم العميقة التي تم ضبطها بشكل جيد لمهمة تصنيف ICD-10 للغة البلغارية

391 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

deep learning language fine-tuned deep learning bulgarian لغة التعلم العميق بالتعلم العميق البلغارية صناعة حمض الفوسفور

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

إن مهمة التشخيص التلقائي تشفيرها في التصنيفات الطبية القياسية والاتحاد، لها أهمية كبيرة في الطب - كلاهما لدعم المهام اليومية للأطباء في إعداد الوثائق السريرية والإبلاغ عن التقارير السريرية. في هذه الورقة، نحقق في تطبيق وأداء محولات التعلم العميق المختلفة للترميز التلقائي في ICD-10 من النصوص السريرية في البلغارية. يحاول التحليل المقارن العثور على النهج الذي هو أكثر كفاءة لاستخدامه في ضبط محول الأسرة برت المحدود إلى التعامل مع مصطلحات مجال معين على لغة نادرة مثل البلغارية. على جانب واحد، تستخدم سلافيكبرت و Multirigualbert، والتي يتم الاحترام من أجل المفردات الشائعة في البلغارية، ولكن تفتقر إلى المصطلحات الطبية. من ناحية أخرى، يتم استخدام BioBert، Clinicalbert، Sapbert، Bluebert، والتي يتم الاحتراج بها للمصطلحات الطبية باللغة الإنجليزية، ولكنها تفتقر إلى التدريب لنماذج اللغة باللغة البلغارية، وأكثر من اللازم للمفردات في السيريلية. في دراسة الأبحاث الخاصة بنا، يتم ضبط جميع نماذج Bert بشكل جيد مع نصوص طبية إضافية في البلغارية ثم تطبق على مهمة التصنيف لترميز التشخيصات الطبية في البلغارية في رموز ICD-10. يستخدم Big Corpora للتشخيص في البلغاري المشروح مع رموز ICD-10 لمهمة التصنيف. يمنح مثل هذا التحليل فكرة جيدة عن النماذج مناسبة لمهام نوع مماثل ومجال. تظهر نتائج التجارب والتقييم أن كلا النهجتين لها دقة مماثلة.

The task of automatic diagnosis encoding into standard medical classifications and ontologies, is of great importance in medicine - both to support the daily tasks of physicians in the preparation and reporting of clinical documentation, and for automatic processing of clinical reports. In this paper we investigate the application and performance of different deep learning transformers for automatic encoding in ICD-10 of clinical texts in Bulgarian. The comparative analysis attempts to find which approach is more efficient to be used for fine-tuning of pretrained BERT family transformer to deal with a specific domain terminology on a rare language as Bulgarian. On the one side are used SlavicBERT and MultiligualBERT, that are pretrained for common vocabulary in Bulgarian, but lack medical terminology. On the other hand in the analysis are used BioBERT, ClinicalBERT, SapBERT, BlueBERT, that are pretrained for medical terminology in English, but lack training for language models in Bulgarian, and more over for vocabulary in Cyrillic. In our research study all BERT models are fine-tuned with additional medical texts in Bulgarian and then applied to the classification task for encoding medical diagnoses in Bulgarian into ICD-10 codes. Big corpora of diagnosis in Bulgarian annotated with ICD-10 codes is used for the classification task. Such an analysis gives a good idea of which of the models would be suitable for tasks of a similar type and domain. The experiments and evaluation results show that both approaches have comparable accuracy.

References used

https://aclanthology.org/

rate research

IAPUCP at SemEval-2021 Task 1: Stacking Fine-Tuned Transformers is Almost All You Need for Lexical Complexity Prediction

364 - Association for Computation Linguistics 2021 مقالة

This paper describes our submission to SemEval-2021 Task 1: predicting the complexity score for single words. Our model leverages standard morphosyntactic and frequency-based features that proved helpful for Complex Word Identification (a related tas k), and combines them with predictions made by Transformer-based pre-trained models that were fine-tuned on the Shared Task data. Our submission system stacks all previous models with a LightGBM at the top. One novelty of our approach is the use of multi-task learning for fine-tuning a pre-trained model for both Lexical Complexity Prediction and Word Sense Disambiguation. Our analysis shows that all independent models achieve a good performance in the task, but that stacking them obtains a Pearson correlation of 0.7704, merely 0.018 points behind the winning submission.

تمثيل السياق stacking fine-tuned transformers التراص المحولات التي تم ضبطها الجميلة صناعة حمض الفوسفور

Multi-label Diagnosis Classification of Swedish Discharge Summaries -- ICD-10 Code Assignment Using KB-BERT

647 - Association for Computation Linguistics 2021 مقالة

The International Classification of Diseases (ICD) is a system for systematically recording patients' diagnoses. Clinicians or professional coders assign ICD codes to patients' medical records to facilitate funding, research, and administration. In m ost health facilities, clinical coding is a manual, time-demanding task that is prone to errors. A tool that automatically assigns ICD codes to free-text clinical notes could save time and reduce erroneous coding. While many previous studies have focused on ICD coding, research on Swedish patient records is scarce. This study explored different approaches to pairing Swedish clinical notes with ICD codes. KB-BERT, a BERT model pre-trained on Swedish text, was compared to the traditional supervised learning models Support Vector Machines, Decision Trees, and K-nearest Neighbours used as the baseline. When considering ICD codes grouped into ten blocks, the KB-BERT was superior to the baseline models, obtaining an F1-micro of 0.80 and an F1-macro of 0.58. When considering the 263 full ICD codes, the KB-BERT was outperformed by all baseline models at an F1-micro and F1-macro of zero. Wilcoxon signed-rank tests showed that the performance differences between the KB-BERT and the baseline models were statistically significant.

swedish discharge summaries multi-label diagnosis classification discharge summaries ملخصات التفريغ السويدية تصنيف التشخيص متعدد التسميات ملخصات التفريغ صناعة حمض الفوسفور المزيد..

Sociolectal Analysis of Pretrained Language Models

533 - Association for Computation Linguistics 2021 مقالة

Using data from English cloze tests, in which subjects also self-reported their gender, age, education, and race, we examine performance differences of pretrained language models across demographic groups, defined by these (protected) attributes. We demonstrate wide performance gaps across demographic groups and show that pretrained language models systematically disfavor young non-white male speakers; i.e., not only do pretrained language models learn social biases (stereotypical associations) -- pretrained language models also learn sociolectal biases, learning to speak more like some than like others. We show, however, that, with the exception of BERT models, larger pretrained language models reduce some the performance gaps between majority and minority groups.

لغة ملثم ومقرها المحول صناعة حمض الفوسفور

Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data

822 - Association for Computation Linguistics 2021 مقالة

Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases. To accommodate such requirements, we introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data. Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance and weave in rich pre-trained generative language models into the iterative weak supervision strategy. Specifically, we first propose a label-conditioned fine-tuning formulation to attune these generators for our task. Furthermore, we devise a regularization objective based on the coarse-fine label constraints derived from our problem setting, giving us even further improvements over the prior formulation. Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement. Extensive experiments and case studies on two real-world datasets demonstrate superior performance over SOTA zero-shot classification baselines.

coarsely-grained annotated data coarsely-grained annotated fine-grained text classification البيانات المشروحة المشجعية مشاحنة خشنة تصنيف النص غرامة الحبيبات صناعة حمض الفوسفور المزيد..

Image classification with Deep Convolutional Neural Network Using Tensorflow and Transfer of Learning

2497 - جامعة بغداد 2020 مقالة

The deep learning algorithm has recently achieved a lot of success, especially in the field of computer vision. This research aims to describe the classification method applied to the dataset of multiple types of images (Synthetic Aperture Radar (SAR ) images and non-SAR images). In such a classification, transfer learning was used followed by fine-tuning methods. Besides, pre-trained architectures were used on the known image database ImageNet. The model VGG16 was indeed used as a feature extractor and a new classifier was trained based on extracted features.The input data mainly focused on the dataset consist of five classes including the SAR images class (houses) and the non-SAR images classes (Cats, Dogs, Horses, and Humans). The Convolutional Neural Network (CNN) has been chosen as a better option for the training process because it produces a high accuracy. The final accuracy has reached 91.18% in five different classes. The results are discussed in terms of the probability of accuracy for each class in the image classification in percentage. Cats class got 99.6 %, while houses class got 100 %.Other types of classes were with an average score of 90 % and above.

CNN الشبكات العصبونية الالتفافية convolutional neural network Synthetic Aperture Radar Transfer learning TensorFlow Tensor flow Visual Geometry Group المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Comparative Analysis of Fine-tuned Deep Learning Language Models for ICD-10 Classification Task for Bulgarian Language

التحليل المقارن لنماذج لغة التعلم العميقة التي تم ضبطها بشكل جيد لمهمة تصنيف ICD-10 للغة البلغارية

Ask ChatGPT about the research

Read More

suggested questions