أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Xiaosong Wang

Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays

74 - Xiaosong Wang , Ziyue Xu , Leo Tam 2021

Pre-trained models, e.g., from ImageNet, have proven to be effective in boosting the performance of many downstream applications. It is too demanding to acquire large-scale annotations to build such models for medical imaging. Meanwhile, there are nu merous clinical data (in the form of images and text reports) stored in the hospital information systems. The paired image-text data from the same patient study could be utilized for the pre-training task in a weakly supervised manner. However, the integrity, accessibility, and amount of such raw data vary across different institutes, e.g., paired vs. unpaired (image-only or text-only). In this work, we introduce an image-text pre-training framework that can learn from these raw data with mixed data inputs, i.e., paired image-text data, a mixture of paired and unpaired data. The unpaired data can be sourced from one or multiple institutes (e.g., images from one institute coupled with texts from another). Specifically, we propose a transformer-based training framework for jointly learning the representation of both the image and text data. In addition to the existing masked language modeling, multi-scale masked vision modeling is introduced as a self-supervised training task for image patch regeneration. We not only demonstrate the feasibility of pre-training across mixed data inputs but also illustrate the benefits of adopting such pre-trained models in 3 chest X-ray applications, i.e., classification, retrieval, and image regeneration. Superior results are reported in comparison to prior art using MIMIC-CXR, NIH14-CXR, and OpenI-CXR datasets.

الرؤية الحاسوبية وتمييز الأنماط

Transformer Query-Target Knowledge Discovery (TEND): Drug Discovery from CORD-19

278 - Leo K. Tam , Xiaosong Wang , Daguang Xu 2020

Previous work established skip-gram word2vec models could be used to mine knowledge in the materials science literature for the discovery of thermoelectrics. Recent transformer architectures have shown great progress in language modeling and associat ed fine-tuned tasks, but they have yet to be adapted for drug discovery. We present a RoBERTa transformer-based method that extends the masked language token prediction using query-target conditioning to treat the specificity challenge. The transformer discovery method entails several benefits over the word2vec method including domain-specific (antiviral) analogy performance, negation handling, and flexible query analysis (specific) and is demonstrated on influenza drug discovery. To stimulate COVID-19 research, we release an influenza clinical trials and antiviral analogies dataset used in conjunction with the COVID-19 Open Research Dataset Challenge (CORD-19) literature dataset in the study. We examine k-shot fine-tuning to improve the downstream analogies performance as well as to mine analogies for model explainability. Further, the query-target analysis is verified in a forward chaining analysis against the influenza drug clinical trials dataset, before adapted for COVID-19 drugs (combinations and side-effects) and on-going clinical trials. In consideration of the present topic, we release the model, dataset, and code.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Learning Image Labels On-the-fly for Training Robust Classification Models

143 - Xiaosong Wang , Ziyue Xu , Dong Yang 2020

Current deep learning paradigms largely benefit from the tremendous amount of annotated data. However, the quality of the annotations often varies among labelers. Multi-observer studies have been conducted to study these annotation variances (by labe ling the same data for multiple times) and its effects on critical applications like medical image analysis. This process indeed adds an extra burden to the already tedious annotation work that usually requires professional training and expertise in the specific domains. On the other hand, automated annotation methods based on NLP algorithms have recently shown promise as a reasonable alternative, relying on the existing diagnostic reports of those images that are widely available in the clinical system. Compared to human labelers, different algorithms provide labels with varying qualities that are even noisier. In this paper, we show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks. Specifically, the concept of attention-on-label is introduced to sample better label sets on-the-fly as the training data. A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes. We apply the attention-on-label scheme on the classification task of a synthetic noisy CIFAR-10 dataset to prove the concept, and then demonstrate superior results (3-5% increase on average in multiple disease classification AUCs) on the chest x-ray images from a hospital-scale dataset (MIMIC-CXR) and hand-labeled dataset (OpenI) in comparison to regular training paradigms.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies

66 - Leo K. Tam , Xiaosong Wang , Evrim Turkbey 2020

Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural la nguage information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset especially focussed on pneumonia and pneumothorax. Along with the dataset, we present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI) alongside strong baseline comparisons with class activation mapping (CAM), gradient CAM, and relevant implementations on the NIH ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language architectures, the LITERATI method demonstrates joint image and referring expression (objects localized in the image using natural language) input for detection that scales in a purely weakly supervised fashion. The architectural modifications address three obstacles -- implementing a supervised vision and language detection method in a weakly supervised fashion, incorporating clinical referring expression natural language information, and generating high fidelity detections with map probabilities. Nevertheless, the challenging clinical nature of the radiologist annotations including subtle references, multi-instance specifications, and relatively verbose underlying medical reports, ensures the vision language detection task at scale remains stimulating for future investigation.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Multi-Domain Image Completion for Random Missing Input Data

325 - Liyue Shen , Wentao Zhu , Xiaosong Wang 2020

Multi-domain data are widely leveraged in vision applications taking advantage of complementary information from different modalities, e.g., brain tumor segmentation from multi-parametric magnetic resonance imaging (MRI). However, due to possible dat a corruption and different imaging protocols, the availability of images for each domain could vary amongst multiple data sources in practice, which makes it challenging to build a universal model with a varied set of input data. To tackle this problem, we propose a general approach to complete the random missing domain(s) data in real applications. Specifically, we develop a novel multi-domain image completion method that utilizes a generative adversarial network (GAN) with a representational disentanglement scheme to extract shared skeleton encoding and separate flesh encoding across multiple domains. We further illustrate that the learned representation in multi-domain image completion could be leveraged for high-level tasks, e.g., segmentation, by introducing a unified framework consisting of image completion and segmentation with a shared content encoder. The experiments demonstrate consistent performance improvement on three datasets for brain tumor segmentation, prostate segmentation, and facial expression image completion respectively.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Enhancing Foreground Boundaries for Medical Image Segmentation

177 - Dong Yang , Holger Roth , Xiaosong Wang 2020

Object segmentation plays an important role in the modern medical image analysis, which benefits clinical study, disease diagnosis, and surgery planning. Given the various modalities of medical images, the automated or semi-automated segmentation app roaches have been used to identify and parse organs, bones, tumors, and other regions-of-interest (ROI). However, these contemporary segmentation approaches tend to fail to predict the boundary areas of ROI, because of the fuzzy appearance contrast caused during the imaging procedure. To further improve the segmentation quality of boundary areas, we propose a boundary enhancement loss to enforce additional constraints on optimizing machine learning models. The proposed loss function is light-weighted and easy to implement without any pre- or post-processing. Our experimental results validate that our loss function are better than, or at least comparable to, other state-of-the-art loss functions in terms of segmentation accuracy.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Correlation via synthesis: end-to-end nodule image generation and radiogenomic map learning based on generative adversarial network

158 - Ziyue Xu , Xiaosong Wang , Hoo-Chang Shin 2019

Radiogenomic map linking image features and gene expression profiles is useful for noninvasively identifying molecular properties of a particular type of disease. Conventionally, such map is produced in three separate steps: 1) gene-clustering to met agenes, 2) image feature extraction, and 3) statistical correlation between metagenes and image features. Each step is independently performed and relies on arbitrary measurements. In this work, we investigate the potential of an end-to-end method fusing gene data with image features to generate synthetic image and learn radiogenomic map simultaneously. To achieve this goal, we develop a generative adversarial network (GAN) conditioned on both background images and gene expression profiles, synthesizing the corresponding image. Image and gene features are fused at different scales to ensure the realism and quality of the synthesized image. We tested our method on non-small cell lung cancer (NSCLC) dataset. Results demonstrate that the proposed method produces realistic synthetic images, and provides a promising way to find gene-image relationship in a holistic end-to-end manner.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Attention-Guided Curriculum Learning for Weakly Supervised Classification and Localization of Thoracic Diseases on Chest Radiographs

329 - Yuxing Tang , Xiaosong Wang , Adam P. Harrison 2018

In this work, we exploit the task of joint classification and weakly supervised localization of thoracic diseases from chest radiographs, with only image-level disease labels coupled with disease severity-level (DSL) information of a subset. A convol utional neural network (CNN) based attention-guided curriculum learning (AGCL) framework is presented, which leverages the severity-level attributes mined from radiology reports. Images in order of difficulty (grouped by different severity-levels) are fed to CNN to boost the learning gradually. In addition, highly confident samples (measured by classification probabilities) and their corresponding class-conditional heatmaps (generated by the CNN) are extracted and further fed into the AGCL framework to guide the learning of more distinctive convolutional features in the next iteration. A two-path network architecture is designed to regress the heatmaps from selected seed samples in addition to the original classification task. The joint learning scheme can improve the classification and localization performance along with more seed samples for the next iteration. We demonstrate the effectiveness of this iterative refinement framework via extensive experimental evaluations on the publicly available ChestXray14 dataset. AGCL achieves over 5.7% (averaged over 14 diseases) increase in classification AUC and 7%/11% increases in Recall/Precision for the localization task compared to the state of the art.

الرؤية الحاسوبية وتمييز الأنماط

Deep Lesion Graphs in the Wild: Relationship Learning and Organization of Significant Radiology Image Findings in a Diverse Large-scale Lesion Database

54 - Ke Yan , Xiaosong Wang , Le Lu 2017

Radiologists in their daily work routinely find and annotate significant abnormalities on a large number of radiology images. Such abnormalities, or lesions, have collected over years and stored in hospitals picture archiving and communication system s. However, they are basically unsorted and lack semantic annotations like type and location. In this paper, we aim to organize and explore them by learning a deep feature representation for each lesion. A large-scale and comprehensive dataset, DeepLesion, is introduced for this task. DeepLesion contains bounding boxes and size measurements of over 32K lesions. To model their similarity relationship, we leverage multiple supervision information including types, self-supervised location coordinates and sizes. They require little manual annotation effort but describe useful attributes of the lesions. Then, a triplet network is utilized to learn lesion embeddings with a sequential sampling strategy to depict their hierarchical similarity structure. Experiments show promising qualitative and quantitative results on lesion retrieval, clustering, and classification. The learned embeddings can be further employed to build a lesion graph for various clinically useful applications. We propose algorithms for intra-patient lesion matching and missing annotation mining. Experimental results validate their effectiveness.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد