An Ensemble Approach for Automatic Structuring of Radiology Reports

122 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Morteza Pourreza Shahri

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Morteza Pourreza Shahri - Amir Tahmasebi - Bingyang Ye

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Automatic structuring of electronic medical records is of high demand for clinical workflow solutions to facilitate extraction, storage, and querying of patient care information. However, developing a scalable solution is extremely challenging, specifically for radiology reports, as most healthcare institutes use either no template or department/institute specific templates. Moreover, radiologists reporting style varies from one to another as sentences are telegraphic and do not follow general English grammar rules. We present an ensemble method that consolidates the predictions of three models, capturing various attributes of textual information for automatic labeling of sentences with section labels. These three models are: 1) Focus Sentence model, capturing context of the target sentence; 2) Surrounding Context model, capturing the neighboring context of the target sentence; and finally, 3) Formatting/Layout model, aimed at learning report formatting cues. We utilize Bi-directional LSTMs, followed by sentence encoders, to acquire the context. Furthermore, we define several features that incorporate the structure of reports. We compare our proposed approach against multiple baselines and state-of-the-art approaches on a proprietary dataset as well as 100 manually annotated radiology notes from the MIMIC-III dataset, which we are making publicly available. Our proposed approach significantly outperforms other approaches by achieving 97.1% accuracy.

قيم البحث

365 - Farhad Nooralahzadeh , Nicolas Perez Gonzalez , Thomas Frauenfelder 2021

Inspired by Curriculum Learning, we propose a consecutive (i.e., image-to-text-to-text) generation framework where we divide the problem of radiology report generation into two steps. Contrary to generating the full radiology report from the image at once, the model generates global concepts from the image in the first step and then reforms them into finer and coherent texts using a transformer architecture. We follow the transformer-based sequence-to-sequence paradigm at each step. We improve upon the state-of-the-art on two benchmark datasets.

الحساب واللغة

Automatic Structuring Of Semantic Web Services An Approach

139 - B. Kamala , J. M. Nandhini 2013

Ontologies have become the effective modeling for various applications and significantly in the semantic web. The difficulty of extracting information from the web, which was created mainly for visualising information, has driven the birth of the sem antic web, which will contain much more resources than the web and will attach machine-readable semantic information to these resources. Ontological bootstrapping on a set of predefined sources, such as web services, must address the problem of multiple, largely unrelated concepts. The web services consist of basically two components, Web Services Description Language (WSDL) descriptors and free text descriptors. The WSDL descriptor is evaluated using two methods, namely Term Frequency/Inverse Document Frequency (TF/IDF) and web context generation. The proposed bootstrapping ontological process integrates TF/IDF and web context generation and applies validation using the free text descriptor service, so that, it offers more accurate definition of ontologies. This paper uses ranking adaption model which predicts the rank for a collection of web service documents which leads to the automatic construction, enrichment and adaptation of ontologies.

استرجاع المعلومات

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

110 - Saahil Jain , Ashwin Agrawal , Adriel Saporta 2021

Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on the MIMIC-CXR and CheXpert test sets respectively. Additionally, we release an inference dataset, which contains annotations automatically generated by RadGraph Benchmark across 220,763 MIMIC-CXR reports (around 6 million entities and 4 million relations) and 500 CheXpert reports (13,783 entities and 9,908 relations) with mappings to associated chest radiographs. Our freely available dataset can facilitate a wide range of research in medical natural language processing, as well as computer vision and multi-modal learning when linked to chest radiographs.

الحساب واللغة الذكاء الاصطناعي استرجاع المعلومات

A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis

179 - Honglei Liu , Yan Xu , Zhiqiang Zhang 2020

Despite the rapid development of natural language processing (NLP) implementation in electronic medical records (EMRs), Chinese EMRs processing remains challenging due to the limited corpus and specific grammatical characteristics, especially for rad iology reports. In this study, we designed an NLP pipeline for the direct extraction of clinically relevant features from Chinese radiology reports, which is the first key step in computer-aided radiologic diagnosis. The pipeline was comprised of named entity recognition, synonyms normalization, and relationship extraction to finally derive the radiological features composed of one or more terms. In named entity recognition, we incorporated lexicon into deep learning model bidirectional long short-term memory-conditional random field (BiLSTM-CRF), and the model finally achieved an F1 score of 93.00%. With the extracted radiological features, least absolute shrinkage and selection operator and machine learning methods (support vector machine, random forest, decision tree, and logistic regression) were used to build the classifiers for liver cancer prediction. For liver cancer diagnosis, random forest had the highest predictive performance in liver cancer diagnosis (F1 score 86.97%, precision 87.71%, and recall 86.25%). This work was a comprehensive NLP study focusing on Chinese radiology reports and the application of NLP in cancer risk prediction. The proposed NLP pipeline for the radiological feature extraction could be easily implemented in other kinds of Chinese clinical texts and other disease predictive tasks.

الحساب واللغة

Towards automatic extractive text summarization of A-133 Single Audit reports with machine learning

189 - Vivian T. Chou , LeAnna Kent , Joel A. Gongora 2019

The rapid growth of text data has motivated the development of machine-learning based automatic text summarization strategies that concisely capture the essential ideas in a larger text. This study aimed to devise an extractive summarization method f or A-133 Single Audits, which assess if recipients of federal grants are compliant with program requirements for use of federal funding. Currently, these voluminous audits must be manually analyzed by officials for oversight, risk management, and prioritization purposes. Automated summarization has the potential to streamline these processes. Analysis focused on the Findings section of ~20,000 Single Audits spanning 2016-2018. Following text preprocessing and GloVe embedding, sentence-level k-means clustering was performed to partition sentences by topic and to establish the importance of each sentence. For each audit, key summary sentences were extracted by proximity to cluster centroids. Summaries were judged by non-expert human evaluation and compared to human-generated summaries using the ROUGE metric. Though the goal was to fully automate summarization of A-133 audits, human input was required at various stages due to large variability in audit writing style, content, and context. Examples of human inputs include the number of clusters, the choice to keep or discard certain clusters based on their content relevance, and the definition of a top sentence. Overall, this approach made progress towards automated extractive summaries of A-133 audits, with future work to focus on full automation and improving summary consistency. This work highlights the inherent difficulty and subjective nature of automated summarization in a real-world application.

الحساب واللغة التعلم الآلي التعلم الالي