ﻻ يوجد ملخص باللغة العربية
Text detection, the key technology for understanding scene text, has become an attractive research topic. For detecting various scene texts, researchers propose plenty of detectors with different advantages: detection-based models enjoy fast detection speed, and segmentation-based algorithms are not limited by text shapes. However, for most intelligent systems, the detector needs to detect arbitrary-shaped texts with high speed and accuracy simultaneously. Thus, in this study, we design an efficient pipeline named as MT, which can detect adhesive arbitrary-shaped texts with only a single binary mask in the inference stage. This paper presents the contributions on three aspects: (1) a light-weight detection framework is designed to speed up the inference process while keeping high detection accuracy; (2) a multi-perspective feature module is proposed to learn more discriminative representations to segment the mask accurately; (3) a multi-factor constraints IoU minimization loss is introduced for training the proposed model. The effectiveness of MT is evaluated on four real-world scene text datasets, and it surpasses all the state-of-the-art competitors to a large extent.
A novel framework named Markov Clustering Network (MCN) is proposed for fast and robust scene text detection. MCN predicts instance-level bounding boxes by firstly converting an image into a Stochastic Flow Graph (SFG) and then performing Markov Clus
Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge. However, vast majority of the existing methods detect text within local regions, t
Scene parsing is challenging as it aims to assign one of the semantic categories to each pixel in scene images. Thus, pixel-level features are desired for scene parsing. However, classification networks are dominated by the discriminative portion, so
Remote sensing (RS) scene classification is a challenging task to predict scene categories of RS images. RS images have two main characters: large intra-class variance caused by large resolution variance and confusing information from large geographi
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem. Though achieving excellent performance, these methods usually neglect an important fact that text in images ar