Handwriting Recognition of Historical Documents with few labeled data

95 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Edgard Chammas

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Edgard Chammas - Chafic Mokbel - Laurence Likforman-Sulem

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated textlines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multiscale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition.

قيم البحث

118 - Hai Pham , Amrith Setlur , Saket Dingliwal 2020

Despite the advent of deep learning in computer vision, the general handwriting recognition problem is far from solved. Most existing approaches focus on handwriting datasets that have clearly written text and carefully segmented labels. In this pape r, we instead focus on learning handwritten characters from maintenance logs, a constrained setting where data is very limited and noisy. We break the problem into two consecutive stages of word segmentation and word recognition respectively and utilize data augmentation techniques to train both stages. Extensive comparisons with popular baselines for scene-text detection and word recognition show that our system achieves a lower error rate and is more suited to handle noisy and difficult documents

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Handwriting Recognition with Novelty

73 - Derek S. Prijatelj 2021

This paper introduces an agent-centric approach to handle novelty in the visual recognition domain of handwriting recognition (HWR). An ideal transcription agent would rival or surpass human perception, being able to recognize known and new character s in an image, and detect any stylistic changes that may occur within or across documents. A key confound is the presence of novelty, which has continued to stymie even the best machine learning-based algorithms for these tasks. In handwritten documents, novelty can be a change in writer, character attributes, writing attributes, or overall document appearance, among other things. Instead of looking at each aspect independently, we suggest that an integrated agent that can process known characters and novelties simultaneously is a better strategy. This paper formalizes the domain of handwriting recognition with novelty, describes a baseline agent, introduces an evaluation protocol with benchmark data, and provides experimentation to set the state-of-the-art. Results show feasibility for the agent-centric approach, but more work is needed to approach human-levels of reading ability, giving the HWR community a formal basis to build upon as they solve this challenging problem.

الرؤية الحاسوبية وتمييز الأنماط

Text Recognition in Real Scenarios with a Few Labeled Samples

220 - Jinghuang Lin , Zhanzhan Cheng , Fan Bai 2020

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each characters feature representation with an attention mechanism and establishing the corresponding character-level latent subspace with adversarial learning. Our approach can maximize the character-level confusion between the source domain and the target domain, thus achieves the sequence-level adaptation with even a small number of labeled samples in the target domain. Extensive experiments on various datasets show that our method significantly outperforms the finetuning scheme, and obtains comparable performance to the state-of-the-art STR methods.

الرؤية الحاسوبية وتمييز الأنماط

Few-Shot Object Recognition from Machine-Labeled Web Images

115 - Zhongwen Xu , Linchao Zhu , Yi Yang 2016

With the tremendous advances of Convolutional Neural Networks (ConvNets) on object recognition, we can now obtain reliable enough machine-labeled annotations easily by predictions from off-the-shelf ConvNets. In this work, we present an abstraction m emory based framework for few-shot learning, building upon machine-labeled image annotations. Our method takes some large-scale machine-annotated datasets (e.g., OpenImages) as an external memory bank. In the external memory bank, the information is stored in the memory slots with the form of key-value, where image feature is regarded as key and label embedding serves as value. When queried by the few-shot examples, our model selects visually similar data from the external memory bank, and writes the useful information obtained from related external data into another memory bank, i.e., abstraction memory. Long Short-Term Memory (LSTM) controllers and attention mechanisms are utilized to guarantee the data written to the abstraction memory is correlated to the query example. The abstraction memory concentrates information from the external memory bank, so that it makes the few-shot recognition effective. In the experiments, we firstly confirm that our model can learn to conduct few-shot object recognition on clean human-labeled data from ImageNet dataset. Then, we demonstrate that with our model, machine-labeled image annotations are very effective and abundant resources to perform object recognition on novel categories. Experimental results show that our proposed model with machine-labeled annotations achieves great performance, only with a gap of 1% between of the one with human-labeled annotations.

الرؤية الحاسوبية وتمييز الأنماط

Motion-Based Handwriting Recognition

105 - Junshen Kevin Chen , Wanze Xie , Yutong He 2021

We attempt to overcome the restriction of requiring a writing surface for handwriting recognition. In this study, we design a prototype of a stylus equipped with motion sensor, and utilizes gyroscopic and acceleration sensor reading to perform writte n letter classification using various deep learning techniques such as CNN and RNNs. We also explore various data augmentation techniques and their effects, reaching up to 86% accuracy.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي