ترغب بنشر مسار تعليمي؟ اضغط هنا

Deep learning algorithms mine knowledge from the training data and thus would likely inherit the datasets bias information. As a result, the obtained model would generalize poorly and even mislead the decision process in real-life applications. We pr opose to remove the bias information misused by the target task with a cross-sample adversarial debiasing (CSAD) method. CSAD explicitly extracts target and bias features disentangled from the latent representation generated by a feature extractor and then learns to discover and remove the correlation between the target and bias features. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. Moreover, we propose joint content and local structural representation learning to boost mutual information estimation for better performance. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
A radiograph visualizes the internal anatomy of a patient through the use of X-ray, which projects 3D information onto a 2D plane. Hence, radiograph analysis naturally requires physicians to relate the prior about 3D human anatomy to 2D radiographs. Synthesizing novel radiographic views in a small range can assist physicians in interpreting anatomy more reliably; however, radiograph view synthesis is heavily ill-posed, lacking in paired data, and lacking in differentiable operations to leverage learning-based approaches. To address these problems, we use Computed Tomography (CT) for radiograph simulation and design a differentiable projection algorithm, which enables us to achieve geometrically consistent transformations between the radiography and CT domains. Our method, XraySyn, can synthesize novel views on real radiographs through a combination of realistic simulation and finetuning on real radiographs. To the best of our knowledge, this is the first work on radiograph view synthesis. We show that by gaining an understanding of radiography in 3D space, our method can be applied to radiograph bone extraction and suppression without groundtruth bone labels.
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image. In the task, the additional exemplar image provides the style guidance that controls the appearance of the synthesized output. Despite the controllability advantage, the existing models are designed on datasets with specific and roughly aligned objects. In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map. To this end, we first propose a Masked Spatial-Channel Attention (MSCA) module which models the correspondence between two arbitrary scenes via efficient decoupled attention. Next, we propose an end-to-end network for joint global and local feature alignment and synthesis. Finally, we propose a novel self-supervision task to enable training. Experiments on the large-scale and more diverse COCO-stuff dataset show significant improvements over the existing methods. Moreover, our approach provides interpretability and can be readily extended to other content manipulation tasks including style and spatial interpolation or extrapolation.
We present a novel unsupervised learning approach to image landmark discovery by incorporating the inter-subject landmark consistencies on facial images. This is achieved via an inter-subject mapping module that transforms original subject landmarks based on an auxiliary subject-related structure. To recover from the transformed images back to the original subject, the landmark detector is forced to learn spatial locations that contain the consistent semantic meanings both for the paired intra-subject images and between the paired inter-subject images. Our proposed method is extensively evaluated on two public facial image datasets (MAFL, AFLW) with various settings. Experimental results indicate that our method can extract the consistent landmarks for both datasets and achieve better performances compared to the previous state-of-the-art methods quantitatively and qualitatively.
Deep learning-based single image super-resolution (SISR) methods face various challenges when applied to 3D medical volumetric data (i.e., CT and MR images) due to the high memory cost and anisotropic resolution, which adversely affect their performa nce. Furthermore, mainstream SISR methods are designed to work over specific upsampling factors, which makes them ineffective in clinical practice. In this paper, we introduce a Spatially Aware Interpolation NeTwork (SAINT) for medical slice synthesis to alleviate the memory constraint that volumetric data poses. Compared to other super-resolution methods, SAINT utilizes voxel spacing information to provide desirable levels of details, and allows for the upsampling factor to be determined on the fly. Our evaluations based on 853 CT scans from four datasets that contain liver, colon, hepatic vessels, and kidneys show that SAINT consistently outperforms other SISR methods in terms of medical slice synthesis quality, while using only a single model to deal with different upsampling factors.
Metal artifact reduction (MAR) in computed tomography (CT) is a notoriously challenging task because the artifacts are structured and non-local in the image domain. However, they are inherently local in the sinogram domain. Thus, one possible approac h to MAR is to exploit the latter characteristic by learning to reduce artifacts in the sinogram. However, if we directly treat the metal-affected regions in sinogram as missing and replace them with the surrogate data generated by a neural network, the artifact-reduced CT images tend to be over-smoothed and distorted since fine-grained details within the metal-affected regions are completely ignored. In this work, we provide analytical investigation to the issue and propose to address the problem by (1) retaining the metal-affected regions in sinogram and (2) replacing the binarized metal trace with the metal mask projection such that the geometry information of metal implants is encoded. Extensive experiments on simulated datasets and expert evaluations on clinical images demonstrate that our novel network yields anatomically more precise artifact-reduced images than the state-of-the-art approaches, especially when metallic objects are large.
Spinal surgery planning necessitates automatic segmentation of vertebrae in cone-beam computed tomography (CBCT), an intraoperative imaging modality that is widely used in intervention. However, CBCT images are of low-quality and artifact-laden due t o noise, poor tissue contrast, and the presence of metallic objects, causing vertebra segmentation, even manually, a demanding task. In contrast, there exists a wealth of artifact-free, high quality CT images with vertebra annotations. This motivates us to build a CBCT vertebra segmentation model using unpaired CT images with annotations. To overcome the domain and artifact gaps between CBCT and CT, it is a must to address the three heterogeneous tasks of vertebra segmentation, artifact reduction and modality translation all together. To this, we propose a novel anatomy-aware artifact disentanglement and segmentation network (A$^3$DSegNet) that intensively leverages knowledge sharing of these three tasks to promote learning. Specifically, it takes a random pair of CBCT and CT images as the input and manipulates the synthesis and segmentation via different decoding combinations from the disentangled latent layers. Then, by proposing various forms of consistency among the synthesized images and among segmented vertebrae, the learning is achieved without paired (i.e., anatomically identical) data. Finally, we stack 2D slices together and build 3D networks on top to obtain final 3D segmentation result. Extensive experiments on a large number of clinical CBCT (21,364) and CT (17,089) images show that the proposed A$^3$DSegNet performs significantly better than state-of-the-art competing methods trained independently for each task and, remarkably, it achieves an average Dice coefficient of 0.926 for unpaired 3D CBCT vertebra segmentation.
Example-guided image synthesis has been recently attempted to synthesize an image from a semantic label map and an exemplary image. In the task, the additional exemplary image serves to provide style guidance that controls the appearance of the synth esized output. Despite the controllability advantage, the previous models are designed on datasets with specific and roughly aligned objects. In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically unaligned to the given label map. To this end, we first propose a new Masked Spatial-Channel Attention (MSCA) module which models the correspondence between two unstructured scenes via cross-attention. Next, we propose an end-to-end network for joint global and local feature alignment and synthesis. In addition, we propose a novel patch-based self-supervision scheme to enable training. Experiments on the large-scale CCOO-stuff dataset show significant improvements over existing methods. Moreover, our approach provides interpretability and can be readily extended to other tasks including style and spatial interpolation or extrapolation, as well as other content manipulation.
Automated whole slide image (WSI) tagging has become a growing demand due to the increasing volume and diversity of WSIs collected nowadays in histopathology. Various methods have been studied to classify WSIs with single tags but none of them focuse s on labeling WSIs with multiple tags. To this end, we propose a novel end-to-end trainable deep neural network named Patch Transformer which can effectively predict multiple slide-level tags from WSI patches based on both the correlations and the uniqueness between the tags. Specifically, the proposed method learns patch characteristics considering 1) patch-wise relations through a patch transformation module and 2) tag-wise uniqueness for each tagging task through a multi-tag attention module. Extensive experiments on a large and diverse dataset consisting of 4,920 WSIs prove the effectiveness of the proposed model.
Caricature generation is an interesting yet challenging task. The primary goal is to generate plausible caricatures with reasonable exaggerations given face images. Conventional caricature generation approaches mainly use low-level geometric transfor mations such as image warping to generate exaggerated images, which lack richness and diversity in terms of content and style. The recent progress in generative adversarial networks (GANs) makes it possible to learn an image-to-image transformation from data, so that richer contents and styles can be generated. However, directly applying the GAN-based models to this task leads to unsatisfactory results because there is a large variance in the caricature distribution. Moreover, some models require strictly paired training data which largely limits their usage scenarios. In this paper, we propose CariGAN overcome these problems. Instead of training on paired data, CariGAN learns transformations only from weakly paired images. Specifically, to enforce reasonable exaggeration and facial deformation, facial landmarks are adopted as an additional condition to constrain the generated image. Furthermore, an attention mechanism is introduced to encourage our model to focus on the key facial parts so that more vivid details in these regions can be generated. Finally, a Diversity Loss is proposed to encourage the model to produce diverse results to help alleviate the `mode collapse problem of the conventional GAN-based models. Extensive experiments on a new large-scale `WebCaricature dataset show that the proposed CariGAN can generate more plausible caricatures with larger diversity compared with the state-of-the-art models.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا