ﻻ يوجد ملخص باللغة العربية
Medical imaging plays a pivotal role in diagnosis and treatment in clinical practice. Inspired by the significant progress in automatic image captioning, various deep learning (DL)-based architectures have been proposed for generating radiology reports for medical images. However, model uncertainty (i.e., model reliability/confidence on report generation) is still an under-explored problem. In this paper, we propose a novel method to explicitly quantify both the visual uncertainty and the textual uncertainty for the task of radiology report generation. Such multi-modal uncertainties can sufficiently capture the model confidence scores at both the report-level and the sentence-level, and thus they are further leveraged to weight the losses for achieving more comprehensive model optimization. Our experimental results have demonstrated that our proposed method for model uncertainty characterization and estimation can provide more reliable confidence scores for radiology report generation, and our proposed uncertainty-weighted losses can achieve more comprehensive model optimization and result in state-of-the-art performance on a public radiology report dataset.
Automatically generating radiology reports can improve current clinical practice in diagnostic radiology. On one hand, it can relieve radiologists from the heavy burden of report writing; On the other hand, it can remind radiologists of abnormalities
Gathering manually annotated images for the purpose of training a predictive model is far more challenging in the medical domain than for natural images as it requires the expertise of qualified radiologists. We therefore propose to take advantage of
Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative
Inspired by Curriculum Learning, we propose a consecutive (i.e., image-to-text-to-text) generation framework where we divide the problem of radiology report generation into two steps. Contrary to generating the full radiology report from the image at
Video salient object detection (VSOD) aims to locate and segment the most attractive object by exploiting both spatial cues and temporal cues hidden in video sequences. However, spatial and temporal cues are often unreliable in real-world scenarios,