No Arabic abstract
Restoring the clean background from the superimposed images containing a noisy layer is the common crux of a classical category of tasks on image restoration such as image reflection removal, image deraining and image dehazing. These tasks are typically formulated and tackled individually due to the diverse and complicated appearance patterns of noise layers within the image. In this work we present the Deep-Masking Generative Network (DMGN), which is a unified framework for background restoration from the superimposed images and is able to cope with different types of noise. Our proposed DMGN follows a coarse-to-fine generative process: a coarse background image and a noise image are first generated in parallel, then the noise image is further leveraged to refine the background image to achieve a higher-quality background image. In particular, we design the novel Residual Deep-Masking Cell as the core operating unit for our DMGN to enhance the effective information and suppress the negative information during image generation via learning a gating mask to control the information flow. By iteratively employing this Residual Deep-Masking Cell, our proposed DMGN is able to generate both high-quality background image and noisy image progressively. Furthermore, we propose a two-pronged strategy to effectively leverage the generated noise image as contrasting cues to facilitate the refinement of the background image. Extensive experiments across three typical tasks for image background restoration, including image reflection removal, image rain steak removal and image dehazing, show that our DMGN consistently outperforms state-of-the-art methods specifically designed for each single task.
In medical applications, the same anatomical structures may be observed in multiple modalities despite the different image characteristics. Currently, most deep models for multimodal segmentation rely on paired registered images. However, multimodal paired registered images are difficult to obtain in many cases. Therefore, developing a model that can segment the target objects from different modalities with unpaired images is significant for many clinical applications. In this work, we propose a novel two-stream translation and segmentation unified attentional generative adversarial network (UAGAN), which can perform any-to-any image modality translation and segment the target objects simultaneously in the case where two or more modalities are available. The translation stream is used to capture modality-invariant features of the target anatomical structures. In addition, to focus on segmentation-related features, we add attentional blocks to extract valuable features from the translation stream. Experiments on three-modality brain tumor segmentation indicate that UAGAN outperforms the existing methods in most cases.
Ultrasound (US) imaging is highly effective with regards to both cost and versatility in real-time diagnosis; however, determination of fetal gender by US scan in the early stages of pregnancy is also a cause of sex-selective abortion. This work proposes a deep learning object detection approach to accurately mask fetal gender in US images in order to increase the accessibility of the technology. We demonstrate how the YOLOv5L architecture exhibits superior performance relative to other object detection models on this task. Our model achieves 45.8% AP[0.5:0.95], 92% F1-score and 0.006 False Positive Per Image rate on our test set. Furthermore, we introduce a bounding box delay rule based on frame-to-frame structural similarity to reduce the false negative rate by 85%, further improving masking reliability.
Modern deep neural network models are large and computationally intensive. One typical solution to this issue is model pruning. However, most current pruning algorithms depend on hand crafted rules or domain expertise. To overcome this problem, we propose a learning based auto pruning algorithm for deep neural network, which is inspired by recent automatic machine learning(AutoML). A two objectives problem that aims for the the weights and the best channels for each layer is first formulated. An alternative optimization approach is then proposed to derive the optimal channel numbers and weights simultaneously. In the process of pruning, we utilize a searchable hyperparameter, remaining ratio, to denote the number of channels in each convolution layer, and then a dynamic masking process is proposed to describe the corresponding channel evolution. To control the trade-off between the accuracy of a model and the pruning ratio of floating point operations, a novel loss function is further introduced. Preliminary experimental results on benchmark datasets demonstrate that our scheme achieves competitive results for neural network pruning.
Chinese character synthesis involves two related aspects, i.e., style maintenance and content consistency. Although some methods have achieved remarkable success in synthesizing a character with specified style from standard font, how to map characters to a specified style domain without losing their identifiability remains very challenging. In this paper, we propose a novel model named FontGAN, which integrates the character stylization and de-stylization into a unified framework. In our model, we decouple character images into style representation and content representation, which facilitates more precise control of these two types of variables, thereby improving the quality of the generated results. We also introduce two modules, namely, font consistency module (FCM) and content prior module (CPM). FCM exploits a category guided Kullback-Leibler loss to embedding the style representation into different Gaussian distributions. It constrains the characters of the same font in the training set globally. On the other hand, it enables our model to obtain style variables through sampling in testing phase. CPM provides content prior for the model to guide the content encoding process and alleviates the problem of stroke deficiency during de-stylization. Extensive experimental results on character stylization and de-stylization have demonstrated the effectiveness of our method.
Named Entity Recognition (NER) is the task of identifying spans that represent entities in sentences. Whether the entity spans are nested or discontinuous, the NER task can be categorized into the flat NER, nested NER, and discontinuous NER subtasks. These subtasks have been mainly solved by the token-level sequence labelling or span-level classification. However, these solutions can hardly tackle the three kinds of NER subtasks concurrently. To that end, we propose to formulate the NER subtasks as an entity span sequence generation task, which can be solved by a unified sequence-to-sequence (Seq2Seq) framework. Based on our unified framework, we can leverage the pre-trained Seq2Seq model to solve all three kinds of NER subtasks without the special design of the tagging schema or ways to enumerate spans. We exploit three types of entity representations to linearize entities into a sequence. Our proposed framework is easy-to-implement and achieves state-of-the-art (SoTA) or near SoTA performance on eight English NER datasets, including two flat NER datasets, three nested NER datasets, and three discontinuous NER datasets.