Semantic-guided Automatic Natural Image Matting with Trimap Generation Network and Light-weight Non-local Attention

159 0 0.0 ( 0 )

Download Cite

Added by Yuhongze Zhou

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yuhongze Zhou - Liguang Zhou - Tin Lun Lam

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Natural image matting aims to precisely separate foreground objects from background using alpha matte. Fully automatic natural image matting without external annotation is challenging. Well-performed matting methods usually require accurate labor-intensive handcrafted trimap as extra input, while the performance of automatic trimap generation method of dilating foreground segmentation fluctuates with segmentation quality. Therefore, we argue that how to handle trade-off of additional information input is a major issue in automatic matting. This paper presents a semantic-guided automatic natural image matting pipeline with Trimap Generation Network and light-weight non-local attention, which does not need trimap and background as input. Specifically, guided by foreground segmentation, Trimap Generation Network estimates accurate trimap. Then, with estimated trimap as guidance, our light-weight Non-local Matting Network with Refinement produces final alpha matte, whose trimap-guided global aggregation attention block is equipped with stride downsampling convolution, reducing computation complexity and promoting performance. Experimental results show that our matting algorithm has competitive performance with state-of-the-art methods in both trimap-free and trimap-needed aspects.

rate research

CGNet: A Light-weight Context Guided Network for Semantic Segmentation

141 - Tianyi Wu , Sheng Tang , Rui Zhang 2018

The demand of applying semantic segmentation model on mobile devices has been increasing rapidly. Current state-of-the-art networks have enormous amount of parameters hence unsuitable for mobile devices, while other small memory footprint models follow the spirit of classification network and ignore the inherent characteristic of semantic segmentation. To tackle this problem, we propose a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation. We first propose the Context Guided (CG) block, which learns the joint feature of both local feature and surrounding context, and further improves the joint feature with the global context. Based on the CG block, we develop CGNet which captures contextual information in all stages of the network and is specially tailored for increasing segmentation accuracy. CGNet is also elaborately designed to reduce the number of parameters and save memory footprint. Under an equivalent number of parameters, the proposed CGNet significantly outperforms existing segmentation networks. Extensive experiments on Cityscapes and CamVid datasets verify the effectiveness of the proposed approach. Specifically, without any post-processing and multi-scale testing, the proposed CGNet achieves 64.8% mean IoU on Cityscapes with less than 0.5 M parameters. The source code for the complete system can be found at https://github.com/wutianyiRosun/CGNet.

Computer Vision and Pattern Recognition

Attention-guided Temporally Coherent Video Object Matting

116 - Yunke Zhang , Chi Wang , Miaomiao Cui 2021

This paper proposes a novel deep learning-based video object matting method that can achieve temporally coherent matting results. Its key component is an attention-based temporal aggregation module that maximizes image matting networks strength for video matting networks. This module computes temporal correlations for pixels adjacent to each other along the time axis in feature space, which is robust against motion noises. We also design a novel loss term to train the attention weights, which drastically boosts the video matting performance. Besides, we show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network with a sparse set of user-annotated keyframes. To facilitate video matting and trimap generation networks training, we construct a large-scale video matting dataset with 80 training and 28 validation foreground video clips with ground-truth alpha mattes. Experimental results show that our method can generate high-quality alpha mattes for various videos featuring appearance change, occlusion, and fast motion. Our code and dataset can be found at: https://github.com/yunkezhang/TCVOM

Computer Vision and Pattern Recognition

Mask Guided Matting via Progressive Refinement Network

88 - Qihang Yu , Jianming Zhang , He Zhang 2020

We propose Mask Guided (MG) Matting, a robust matting framework that takes a general coarse mask as guidance. MG Matting leverages a network (PRN) design which encourages the matting model to provide self-guidance to progressively refine the uncertain regions through the decoding process. A series of guidance mask perturbation operations are also introduced in the training to further enhance its robustness to external guidance. We show that PRN can generalize to unseen types of guidance masks such as trimap and low-quality alpha matte, making it suitable for various application pipelines. In addition, we revisit the foreground color prediction problem for matting and propose a surprisingly simple improvement to address the dataset issue. Evaluation on real and synthetic benchmarks shows that MG Matting achieves state-of-the-art performance using various types of guidance inputs. Code and models are available at https://github.com/yucornetto/MGMatting.

Computer Vision and Pattern Recognition

Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

134 - Hao Tang , Xiaojuan Qi , Dan Xu 2020

We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) for photo-realistic image synthesis from semantic layouts. Although considerable improvement has been achieved, the quality of synthesized images is far from satisfactory due to two largely unresolved challenges. First, the semantic labels do not provide detailed structural information, making it difficult to synthesize local details and structures. Second, the widely adopted CNN operations such as convolution, down-sampling and normalization usually cause spatial resolution loss and thus are unable to fully preserve the original semantic information, leading to semantically inconsistent results (e.g., missing small objects). To tackle the first challenge, we propose to use the edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module. Edge information is produced by a convolutional generator and introduces detailed structure information. Further, to preserve the semantic information, we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout. Extensive experiments on two challenging datasets show that the proposed EdgeGAN can generate significantly better results than state-of-the-art methods. The source code and trained models are available at https://github.com/Ha0Tang/EdgeGAN.

Computer Vision and Pattern Recognition Machine Learning Image and Video Processing

Unsupervised Attention-guided Image to Image Translation

297 - Youssef A. Mejjati , Christian Richardt , James Tompkin 2018

Current unsupervised image-to-image translation techniques struggle to focus their attention on individual objects without altering the background or the way multiple objects interact within a scene. Motivated by the important role of attention in human perception, we tackle this limitation by introducing unsupervised attention mechanisms that are jointly adversarialy trained with the generators and discriminators. We demonstrate qualitatively and quantitatively that our approach is able to attend to relevant regions in the image without requiring supervision, and that by doing so it achieves more realistic mappings compared to recent approaches.

Computer Vision and Pattern Recognition Artificial Intelligence