Semi-Supervised Video Deraining with Dynamical Rain Generator

99 0 0.0 ( 0 )

Download Cite

Added by Zongsheng Yue

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Zongsheng Yue - Jianwen Xie - Qian Zhao

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

While deep learning (DL)-based video deraining methods have achieved significant success recently, they still exist two major drawbacks. Firstly, most of them do not sufficiently model the characteristics of rain layers of rainy videos. In fact, the rain layers exhibit strong physical properties (e.g., direction, scale and thickness) in spatial dimension and natural continuities in temporal dimension, and thus can be generally modelled by the spatial-temporal process in statistics. Secondly, current DL-based methods seriously depend on the labeled synthetic training data, whose rain types are always deviated from those in unlabeled real data. Such gap between synthetic and real data sets leads to poor performance when applying them in real scenarios. Against these issues, this paper proposes a new semi-supervised video deraining method, in which a dynamic rain generator is employed to fit the rain layer, expecting to better depict its insightful characteristics. Specifically, such dynamic generator consists of one emission model and one transition model to simultaneously encode the spatially physical structure and temporally continuous changes of rain streaks, respectively, which both are parameterized as deep neural networks (DNNs). Further more, different prior formats are designed for the labeled synthetic and unlabeled real data, so as to fully exploit the common knowledge underlying them. Last but not least, we also design a Monte Carlo EM algorithm to solve this model. Extensive experiments are conducted to verify the superiorities of the proposed semi-supervised deraining model.

rate research

Semi-DerainGAN: A New Semi-supervised Single Image Deraining Network

342 - Yanyan Wei , Zhao Zhang , Yang Wang 2020

Removing the rain streaks from single image is still a challenging task, since the shapes and directions of rain streaks in the synthetic datasets are very different from real images. Although supervised deep deraining networks have obtained impressive results on synthetic datasets, they still cannot obtain satisfactory results on real images due to weak generalization of rain removal capacity, i.e., the pre-trained models usually cannot handle new shapes and directions that may lead to over-derained/under-derained results. In this paper, we propose a new semi-supervised GAN-based deraining network termed Semi-DerainGAN, which can use both synthetic and real rainy images in a uniform network using two supervised and unsupervised processes. Specifically, a semi-supervised rain streak learner termed SSRML sharing the same parameters of both processes is derived, which makes the real images contribute more rain streak information. To deliver better deraining results, we design a paired discriminator for distinguishing the real pairs from fake pairs. Note that we also contribute a new real-world rainy image dataset Real200 to alleviate the difference between the synthetic and real image do-mains. Extensive results on public datasets show that our model can obtain competitive performance, especially on real images.

Computer Vision and Pattern Recognition

Unpaired Adversarial Learning for Single Image Deraining with Rain-Space Contrastive Constraints

287 - Xiang Chen , Jinshan Pan , Kui Jiang 2021

Deep learning-based single image deraining (SID) with unpaired information is of immense importance, as relying on paired synthetic data often limits their generality and scalability in real-world applications. However, we noticed that direct employ of unpaired adversarial learning and cycle-consistency constraints in the SID task is insufficient to learn the underlying relationship from rainy input to clean outputs, since the domain knowledge between rainy and rain-free images is asymmetrical. To address such limitation, we develop an effective unpaired SID method which explores mutual properties of the unpaired exemplars by a contrastive learning manner in a GAN framework, named as CDR-GAN. The proposed method mainly consists of two cooperative branches: Bidirectional Translation Branch (BTB) and Contrastive Guidance Branch (CGB). Specifically, BTB takes full advantage of the circulatory architecture of adversarial consistency to exploit latent feature distributions and guide transfer ability between two domains by equipping it with bidirectional mapping. Simultaneously, CGB implicitly constrains the embeddings of different exemplars in rain space by encouraging the similar feature distributions closer while pushing the dissimilar further away, in order to better help rain removal and image restoration. During training, we explore several loss functions to further constrain the proposed CDR-GAN. Extensive experiments show that our method performs favorably against existing unpaired deraining approaches on both synthetic and real-world datasets, even outperforms several fully-supervised or semi-supervised models.

Computer Vision and Pattern Recognition

Multiview Pseudo-Labeling for Semi-supervised Learning from Video

118 - Bo Xiong , Haoqi Fan , Kristen Grauman 2021

We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video. The complementary views help obtain more reliable pseudo-labels on unlabeled video, to learn stronger video representations than from purely supervised data. Though our method capitalizes on multiple views, it nonetheless trains a model that is shared across appearance and motion input and thus, by design, incurs no additional computation overhead at inference time. On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.

Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning

Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation

88 - Yuxi Li , Ning Xu , Jinlong Peng 2020

In this paper, we address several inadequacies of current video object segmentation pipelines. Firstly, a cyclic mechanism is incorporated to the standard semi-supervised process to produce more robust representations. By relying on the accurate reference mask in the starting frame, we show that the error propagation problem can be mitigated. Next, we introduce a simple gradient correction module, which extends the offline pipeline to an online method while maintaining the efficiency of the former. Finally we develop cycle effective receptive field (cycle-ERF) based on gradient correction to provide a new perspective into analyzing object-specific regions of interests. We conduct comprehensive experiments on challenging benchmarks of DAVIS17 and Youtube-VOS, demonstrating that the cyclic mechanism is beneficial to segmentation quality.

Computer Vision and Pattern Recognition

Semi-supervised Breast Lesion Detection in Ultrasound Video Based on Temporal Coherence

135 - Sihong Chen , Weiping Yu , Kai Ma 2019

Breast lesion detection in ultrasound video is critical for computer-aided diagnosis. However, detecting lesion in video is quite challenging due to the blurred lesion boundary, high similarity to soft tissue and lack of video annotations. In this paper, we propose a semi-supervised breast lesion detection method based on temporal coherence which can detect the lesion more accurately. We aggregate features extracted from the historical key frames with adaptive key-frame scheduling strategy. Our proposed method accomplishes the unlabeled videos detection task by leveraging the supervision information from a different set of labeled images. In addition, a new WarpNet is designed to replace both the traditional spatial warping and feature aggregation operation, leading to a tremendous increase in speed. Experiments on 1,060 2D ultrasound sequences demonstrate that our proposed method achieves state-of-the-art video detection result as 91.3% in mean average precision and 19 ms per frame on GPU, compared to a RetinaNet based detection method in 86.6% and 32 ms.

Computer Vision and Pattern Recognition