ﻻ يوجد ملخص باللغة العربية
Scene text recognition has witnessed rapid development with the advance of convolutional neural networks. Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images. An intuitive solution is to introduce super-resolution techniques as pre-processing. However, conventional super-resolution methods in the literature mainly focus on reconstructing the detailed texture of natural images, which typically do not work well for text due to the unique characteristics of text. To tackle these problems, in this work, we propose a content-aware text super-resolution network to generate the information desired for text recognition. In particular, we design an end-to-end network that can perform super-resolution and text recognition simultaneously. Different from previous super-resolution methods, we use the loss of text recognition as the Text Perceptual Loss to guide the training of the super-resolution network, and thus it pays more attention to the text content, rather than the irrelevant background area. Extensive experiments on several challenging benchmarks demonstrate the effectiveness of our proposed method in restoring a sharp high-resolution image from a small blurred one, and show that the recognition performance clearly boosts up the performance of text recognizer. To our knowledge, this is the first work focusing on text super-resolution. Code will be released in https://github.com/xieenze/TextSR.
In this paper, we address the problem of having characters with different scales in scene text recognition. We propose a novel scale aware feature encoder (SAFE) that is designed specifically for encoding characters with different scales. SAFE is com
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Recognizing low-resolution text images is challenging because they lose detailed content information, leading to poor recognition accuracy. An in
Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects
While depth sensors are becoming increasingly popular, their spatial resolution often remains limited. Depth super-resolution therefore emerged as a solution to this problem. Despite much progress, state-of-the-art techniques suffer from two drawback
Deep convolutional networks have attracted great attention in image restoration and enhancement. Generally, restoration quality has been improved by building more and more convolutional block. However, these methods mostly learn a specific model to h