RectiNet-v2: A stacked network architecture for document image dewarping

211 0 0.0 ( 0 )

Download Cite

Added by Nibaran Das

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Hmrishav Bandyopadhyay - Tanmoy Dasgupta - Nibaran Das

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

With the advent of mobile and hand-held cameras, document images have found their way into almost every domain. Dewarping of these images for the removal of perspective distortions and folds is essential so that they can be understood by document recognition algorithms. For this, we propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input. We train this model on warped document images simulated synthetically to compensate for lack of enough natural data. Our method is novel in the use of a bifurcated decoder with shared weights to prevent intermingling of grid coordinates, in the use of residual networks in the U-Net skip connections to allow flow of data from different receptive fields in the model, and in the use of a gated network to help the model focus on structure and line level detail of the document image. We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.

rate research

A Gated and Bifurcated Stacked U-Net Module for Document Image Dewarping

121 - Hmrishav Bandyopadhyay , Tanmoy Dasgupta , Nibaran Das 2020

Capturing images of documents is one of the easiest and most used methods of recording them. These images however, being captured with the help of handheld devices, often lead to undesirable distortions that are hard to remove. We propose a supervised Gated and Bifurcated Stacked U-Net module to predict a dewarping grid and create a distortion free image from the input. While the network is trained on synthetically warped document images, results are calculated on the basis of real world images. The novelty in our methods exists not only in a bifurcation of the U-Net to help eliminate the intermingling of the grid coordinates, but also in the use of a gated network which adds boundary and other minute line level details to the model. The end-to-end pipeline proposed by us achieves state-of-the-art performance on the DocUNet dataset after being trained on just 8 percent of the data used in previous methods.

Computer Vision and Pattern Recognition

T-Net: Deep Stacked Scale-Iteration Network for Image Dehazing

74 - Lirong Zheng , Yanshan Li , Kaihao Zhang 2021

Hazy images reduce the visibility of the image content, and haze will lead to failure in handling subsequent computer vision tasks. In this paper, we address the problem of image dehazing by proposing a dehazing network named T-Net, which consists of a backbone network based on the U-Net architecture and a dual attention module. And it can achieve multi-scale feature fusion by using skip connections with a new fusion strategy. Furthermore, by repeatedly unfolding the plain T-Net, Stack T-Net is proposed to take advantage of the dependence of deep features across stages via a recursive strategy. In order to reduce network parameters, the intra-stage recursive computation of ResNet is adopted in our Stack T-Net. And we take both the stage-wise result and the original hazy image as input to each T-Net and finally output the prediction of clean image. Experimental results on both synthetic and real-world images demonstrate that our plain T-Net and the advanced Stack T-Net perform favorably against the state-of-the-art dehazing algorithms, and show that our Stack T-Net could further improve the dehazing effect, demonstrating the effectiveness of the recursive strategy.

Computer Vision and Pattern Recognition

Clearing the Skies: A deep network architecture for single-image rain removal

132 - Xueyang Fu , Jiabin Huang , Xinghao Ding 2016

We introduce a deep network architecture called DerainNet for removing rain streaks from an image. Based on the deep convolutional neural network (CNN), we directly learn the mapping relationship between rainy and clean image detail layers from data. Because we do not possess the ground truth corresponding to real-world rainy images, we synthesize images with rain for training. In contrast to other common strategies that increase depth or breadth of the network, we use image processing domain knowledge to modify the objective function and improve deraining with a modestly-sized CNN. Specifically, we train our DerainNet on the detail (high-pass) layer rather than in the image domain. Though DerainNet is trained on synthetic data, we find that the learned network translates very effectively to real-world images for testing. Moreover, we augment the CNN framework with image enhancement to improve the visual results. Compared with state-of-the-art single image de-raining methods, our method has improved rain removal and much faster computation time after network training.

Computer Vision and Pattern Recognition

Stacked Deep Multi-Scale Hierarchical Network for Fast Bokeh Effect Rendering from a Single Image

166 - Saikat Dutta , Sourya Dipta Das , Nisarg A. Shah 2021

The Bokeh Effect is one of the most desirable effects in photography for rendering artistic and aesthetic photos. Usually, it requires a DSLR camera with different aperture and shutter settings and certain photography skills to generate this effect. In smartphones, computational methods and additional sensors are used to overcome the physical lens and sensor limitations to achieve such effect. Most of the existing methods utilized additional sensors data or pretrained network for fine depth estimation of the scene and sometimes use portrait segmentation pretrained network module to segment salient objects in the image. Because of these reasons, networks have many parameters, become runtime intensive and unable to run in mid-range devices. In this paper, we used an end-to-end Deep Multi-Scale Hierarchical Network (DMSHN) model for direct Bokeh effect rendering of images captured from the monocular camera. To further improve the perceptual quality of such effect, a stacked model consisting of two DMSHN modules is also proposed. Our model does not rely on any pretrained network module for Monocular Depth Estimation or Saliency Detection, thus significantly reducing the size of model and run time. Stacked DMSHN achieves state-of-the-art results on a large scale EBB! dataset with around 6x less runtime compared to the current state-of-the-art model in processing HD quality images.

Computer Vision and Pattern Recognition Artificial Intelligence

Stacked Autoencoders for Medical Image Search

160 - S. Sharma , I. Umar , L. Ospina 2016

Medical images can be a valuable resource for reliable information to support medical diagnosis. However, the large volume of medical images makes it challenging to retrieve relevant information given a particular scenario. To solve this challenge, content-based image retrieval (CBIR) attempts to characterize images (or image regions) with invariant content information in order to facilitate image search. This work presents a feature extraction technique for medical images using stacked autoencoders, which encode images to binary vectors. The technique is applied to the IRMA dataset, a collection of 14,410 x-ray images in order to demonstrate the ability of autoencoders to retrieve similar x-rays given test queries. Using IRMA dataset as a benchmark, it was found that stacked autoencoders gave excellent results with a retrieval error of 376 for 1,733 test images with a compression of 74.61%.

Computer Vision and Pattern Recognition