No Arabic abstract
This paper addresses the task of detecting and localising fetal anatomical regions in 2D ultrasound images, where only image-level labels are present at training, i.e. without any localisation or segmentation information. We examine the use of convolutional neural network architectures coupled with soft proposal layers. The resulting network simultaneously performs anatomical region detection (classification) and localisation tasks. We generate a proposal map describing the attention of the network for a particular class. The network is trained on 85,500 2D fetal Ultrasound images and their associated labels. Labels correspond to six anatomical regions: head, spine, thorax, abdomen, limbs, and placenta. Detection achieves an average accuracy of 90% on individual regions, and show that the proposal maps correlate well with relevant anatomical structures. This work presents itself as a powerful and essential step towards subsequent tasks such as fetal position and pose estimation, organ-specific segmentation, or image-guided navigation. Code and additional material is available at https://ntoussaint.github.io/fetalnav
Detecting acoustic shadows in ultrasound images is important in many clinical and engineering applications. Real-time feedback of acoustic shadows can guide sonographers to a standardized diagnostic viewing plane with minimal artifacts and can provide additional information for other automatic image analysis algorithms. However, automatically detecting shadow regions using learning-based algorithms is challenging because pixel-wise ground truth annotation of acoustic shadows is subjective and time consuming. In this paper we propose a weakly supervised method for automatic confidence estimation of acoustic shadow regions. Our method is able to generate a dense shadow-focused confidence map. In our method, a shadow-seg module is built to learn general shadow features for shadow segmentation, based on global image-level annotations as well as a small number of coarse pixel-wise shadow annotations. A transfer function is introduced to extend the obtained binary shadow segmentation to a reference confidence map. Additionally, a confidence estimation network is proposed to learn the mapping between input images and the reference confidence maps. This network is able to predict shadow confidence maps directly from input images during inference. We use evaluation metrics such as DICE, inter-class correlation and etc. to verify the effectiveness of our method. Our method is more consistent than human annotation, and outperforms the state-of-the-art quantitatively in shadow segmentation and qualitatively in confidence estimation of shadow regions. We further demonstrate the applicability of our method by integrating shadow confidence maps into tasks such as ultrasound image classification, multi-view image fusion and automated biometric measurements.
Fetal brain magnetic resonance imaging (MRI) offers exquisite images of the developing brain but is not suitable for anomaly screening. For this ultrasound (US) is employed. While expert sonographers are adept at reading US images, MR images are much easier for non-experts to interpret. Hence in this paper we seek to produce images with MRI-like appearance directly from clinical US images. Our own clinical motivation is to seek a way to communicate US findings to patients or clinical professionals unfamiliar with US, but in medical image analysis such a capability is potentially useful, for instance, for US-MRI registration or fusion. Our model is self-supervised and end-to-end trainable. Specifically, based on an assumption that the US and MRI data share a similar anatomical latent space, we first utilise an extractor to determine shared latent features, which are then used for data synthesis. Since paired data was unavailable for our study (and rare in practice), we propose to enforce the distributions to be similar instead of employing pixel-wise constraints, by adversarial learning in both the image domain and latent space. Furthermore, we propose an adversarial structural constraint to regularise the anatomical structures between the two modalities during the synthesis. A cross-modal attention scheme is proposed to leverage non-local spatial correlations. The feasibility of the approach to produce realistic looking MR images is demonstrated quantitatively and with a qualitative evaluation compared to real fetal MR images.
Ultrasound (US) imaging is highly effective with regards to both cost and versatility in real-time diagnosis; however, determination of fetal gender by US scan in the early stages of pregnancy is also a cause of sex-selective abortion. This work proposes a deep learning object detection approach to accurately mask fetal gender in US images in order to increase the accessibility of the technology. We demonstrate how the YOLOv5L architecture exhibits superior performance relative to other object detection models on this task. Our model achieves 45.8% AP[0.5:0.95], 92% F1-score and 0.006 False Positive Per Image rate on our test set. Furthermore, we introduce a bounding box delay rule based on frame-to-frame structural similarity to reduce the false negative rate by 85%, further improving masking reliability.
Accurate and efficient catheter segmentation in 3D ultrasound (US) is essential for cardiac intervention. Currently, the state-of-the-art segmentation algorithms are based on convolutional neural networks (CNNs), which achieved remarkable performances in a standard Cartesian volumetric data. Nevertheless, these approaches suffer the challenges of low efficiency and GPU unfriendly image size. Therefore, such difficulties and expensive hardware requirements become a bottleneck to build accurate and efficient segmentation models for real clinical application. In this paper, we propose a novel Frustum ultrasound based catheter segmentation method. Specifically, Frustum ultrasound is a polar coordinate based image, which includes same information of standard Cartesian image but has much smaller size, which overcomes the bottleneck of efficiency than conventional Cartesian images. Nevertheless, the irregular and deformed Frustum images lead to more efforts for accurate voxel-level annotation. To address this limitation, a weakly supervised learning framework is proposed, which only needs 3D bounding box annotations overlaying the region-of-interest to training the CNNs. Although the bounding box annotation includes noise and inaccurate annotation to mislead to model, it is addressed by the proposed pseudo label generated scheme. The labels of training voxels are generated by incorporating class activation maps with line filtering, which is iteratively updated during the training. Our experimental results show the proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume. More crucially, the Frustum image segmentation provides a much faster and cheaper solution for segmentation in 3D US image, which meet the demands of clinical applications.
Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels. How-ever, these networks always highlight the most discriminativeparts to perform the task, the located areas are much smallerthan entire targeted objects. In this work, we propose a novelend-to-end model to enlarge CAMs generated from classifi-cation models, which can localize targeted objects more pre-cisely. In detail, we add an additional module in traditionalclassification networks to extract foreground object propos-als from images without classifying them into specific cate-gories. Then we set these normalized regions as unrestrictedpixel-level mask supervision for the following classificationtask. We collect a set of images defined as Background ImageSet from the Internet. The number of them is much smallerthan the targeted dataset but surprisingly well supports themethod to extract foreground regions from different pictures.The region extracted is independent from classification task,where the extracted region in each image covers almost en-tire object rather than just a significant part. Therefore, theseregions can serve as masks to supervise the response mapgenerated from classification models to become larger andmore precise. The method achieves state-of-the-art results onCUB-200-2011 in terms of Top-1 and Top-5 localization er-ror while has a competitive result on ILSVRC2016 comparedwith other approaches.