Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

87 0 0.0 ( 0 )

Download Cite

Added by Shi-Xue Zhang

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Shi-Xue Zhang - Xiaobin Zhu - Chun Yang

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Arbitrary shape text detection is a challenging task due to the high complexity and variety of scene texts. In this work, we propose a novel adaptive boundary proposal network for arbitrary shape text detection, which can learn to directly produce accurate boundary for arbitrary shape text without any post-processing. Our method mainly consists of a boundary proposal model and an innovative adaptive boundary deformation model. The boundary proposal model constructed by multi-layer dilated convolutions is adopted to produce prior information (including classification map, distance field, and direction field) and coarse boundary proposals. The adaptive boundary deformation model is an encoder-decoder network, in which the encoder mainly consists of a Graph Convolutional Network (GCN) and a Recurrent Neural Network (RNN). It aims to perform boundary deformation in an iterative way for obtaining text instance shape guided by prior information from the boundary proposal model. In this way, our method can directly and efficiently generate accurate text boundaries without complex post-processing. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method.

rate research

Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

126 - Shi-Xue Zhang , Xiaobin Zhu , Jie-Bo Hou 2020

Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.

Computer Vision and Pattern Recognition

Comprehensive Studies for Arbitrary-shape Scene Text Detection

132 - Pengwen Dai , Xiaochun Cao 2021

Numerous scene text detection methods have been proposed in recent years. Most of them declare they have achieved state-of-the-art performances. However, the performance comparison is unfair, due to lots of inconsistent settings (e.g., training data, backbone network, multi-scale feature fusion, evaluation protocols, etc.). These various settings would dissemble the pros and cons of the proposed core techniques. In this paper, we carefully examine and analyze the inconsistent settings, and propose a unified framework for the bottom-up based scene text detection methods. Under the unified framework, we ensure the consistent settings for non-core modules, and mainly investigate the representations of describing arbitrary-shape scene texts, e.g., regressing points on text contours, clustering pixels with predicted auxiliary information, grouping connected components with learned linkages, etc. With the comprehensive investigations and elaborate analyses, it not only cleans up the obstacle of understanding the performance differences between existing methods but also reveals the advantages and disadvantages of previous models under fair comparisons.

Computer Vision and Pattern Recognition

RayNet: Real-time Scene Arbitrary-shape Text Detection with Multiple Rays

101 - Chuang Yang , Mulin Chen , Qi Wang 2021

Existing object detection-based text detectors mainly concentrate on detecting horizontal and multioriented text. However, they do not pay enough attention to complex-shape text (curved or other irregularly shaped text). Recently, segmentation-based text detection methods have been introduced to deal with the complex-shape text; however, the pixel level processing increases the computational cost significantly. To further improve the accuracy and efficiency, we propose a novel detection framework for arbitrary-shape text detection, termed as RayNet. RayNet uses Center Point Set (CPS) and Ray Distance (RD) to fit text, where CPS is used to determine the text general position and the RD is combined with CPS to compute Ray Points (RP) to localize the text accurate shape. Since RP are disordered, we develop the Ray Points Connection (RPC) algorithm to reorder RP, which significantly improves the detection performance of complex-shape text. RayNet achieves impressive performance on existing curved text dataset (CTW1500) and quadrangle text dataset (ICDAR2015), which demonstrate its superiority against several state-of-the-art methods.

Computer Vision and Pattern Recognition

All you need is a second look: Towards Tighter Arbitrary shape text detection

142 - Meng Cao , Yuexian Zou 2020

Deep learning-based scene text detection methods have progressed substantially over the past years. However, there remain several problems to be solved. Generally, long curve text instances tend to be fragmented because of the limited receptive field size of CNN. Besides, simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts. In addition, the scale of text instances varies greatly which leads to the difficulty of accurate prediction through a single segmentation network. To address these problems, we innovatively propose a two-stage segmentation based arbitrary text detector named textit{NASK} (textbf{N}eed textbf{A} textbf{S}econd lootextbf{K}). Specifically, textit{NASK} consists of a Text Instance Segmentation network namely textit{TIS} ((1^{st}) stage), a Text RoI Pooling module and a Fiducial pOint eXpression module termed as textit{FOX} ((2^{nd}) stage). Firstly, textit{TIS} conducts instance segmentation to obtain rectangle text proposals with a proposed Group Spatial and Channel Attention module (textit{GSCA}) to augment the feature expression. Then, Text RoI Pooling transforms these rectangles to the fixed size. Finally, textit{FOX} is introduced to reconstruct text instances with a more tighter representation using the predicted geometrical attributes including text center line, text line orientation, character scale and character orientation. Experimental results on two public benchmarks including textit{Total-Text} and textit{SCUT-CTW1500} have demonstrated that the proposed textit{NASK} achieves state-of-the-art results.

Computer Vision and Pattern Recognition

Proposal Relation Network for Temporal Action Detection

100 - Xiang Wang , Zhiwu Qing , Ziyuan Huang 2021

This technical report presents our solution for temporal action detection task in AcitivityNet Challenge 2021. The purpose of this task is to locate and identify actions of interest in long untrimmed videos. The crucial challenge of the task comes from that the temporal duration of action varies dramatically, and the target actions are typically embedded in a background of irrelevant activities. Our solution builds on BMN, and mainly contains three steps: 1) action classification and feature encoding by Slowfast, CSN and ViViT; 2) proposal generation. We improve BMN by embedding the proposed Proposal Relation Network (PRN), by which we can generate proposals of high quality; 3) action detection. We calculate the detection results by assigning the proposals with corresponding classification results. Finally, we ensemble the results under different settings and achieve 44.7% on the test set, which improves the champion result in ActivityNet 2020 by 1.9% in terms of average mAP.

Computer Vision and Pattern Recognition