ترغب بنشر مسار تعليمي؟ اضغط هنا

Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition

92   0   0.0 ( 0 )
 نشر من قبل Chen Jingye
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Chinese character recognition has attracted much research interest due to its wide applications. Although it has been studied for many years, some issues in this field have not been completely resolved yet, e.g. the zero-shot problem. Previous character-based and radical-based methods have not fundamentally addressed the zero-shot problem since some characters or radicals in test sets may not appear in training sets under a data-hungry condition. Inspired by the fact that humans can generalize to know how to write characters unseen before if they have learned stroke orders of some characters, we propose a stroke-based method by decomposing each character into a sequence of strokes, which are the most basic units of Chinese characters. However, we observe that there is a one-to-many relationship between stroke sequences and Chinese characters. To tackle this challenge, we employ a matching-based strategy to transform the predicted stroke sequence to a specific character. We evaluate the proposed method on handwritten characters, printed artistic characters, and scene characters. The experimental results validate that the proposed method outperforms existing methods on both character zero-shot and radical zero-shot tasks. Moreover, the proposed method can be easily generalized to other languages whose characters can be decomposed into strokes.



قيم البحث

اقرأ أيضاً

Chinese characters have a huge set of character categories, more than 20,000 and the number is still increasing as more and more novel characters continue being created. However, the enormous characters can be decomposed into a compact set of about 5 00 fundamental and structural radicals. This paper introduces a novel radical analysis network (RAN) to recognize printed Chinese characters by identifying radicals and analyzing two-dimensional spatial structures among them. The proposed RAN first extracts visual features from input by employing convolutional neural networks as an encoder. Then a decoder based on recurrent neural networks is employed, aiming at generating captions of Chinese characters by detecting radicals and two-dimensional structures through a spatial attention mechanism. The manner of treating a Chinese character as a composition of radicals rather than a single character class largely reduces the size of vocabulary and enables RAN to possess the ability of recognizing unseen Chinese character classes, namely zero-shot learning.
Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance. However, in real application scenarios, users always write multiple Chinese characters to form one complete sentence and the contextual inform ation within these characters holds the significant potential to improve the accuracy, robustness and efficiency of sentence-level OLHCCR. In this work, we first propose a simple and straightforward end-to-end network, namely vanilla compositional network~(VCN) to tackle the sentence-level OLHCCR. It couples convolutional neural network with sequence modeling architecture to exploit the handwritten characters previous contextual information. Although VCN performs much better than the state-of-the-art single OLHCCR model, it exposes high fragility when confronting with not well written characters such as sloppy writing, missing or broken strokes. To improve the robustness of sentence-level OLHCCR, we further propose a novel deep spatial-temporal fusion network~(DSTFN). It utilizes a pre-trained autoregresssive framework as the backbone component, which projects each Chinese character into word embeddings, and integrates the spatial glyph features of handwritten characters and their contextual information multiple times at multi-layer fusion module. We also construct a large-scale sentence-level handwriting dataset, named as CSOHD to evaluate models. Extensive experiment results demonstrate that DSTFN achieves the state-of-the-art performance, which presents strong robustness compared with VCN and exiting single OLHCCR models. The in-depth empirical analysis and case studies indicate that DSTFN can significantly improve the efficiency of handwriting input, with the handwritten Chinese character with incomplete strokes being recognized precisely.
The recognition of Chinese characters has always been a challenging task due to their huge variety and complex structures. The latest research proves that such an enormous character set can be decomposed into a collection of about 500 fundamental Chi nese radicals, and based on which this problem can be solved effectively. While with the constant advent of novel Chinese characters, the number of basic radicals is also expanding. The current methods that entirely rely on existing radicals are not flexible for identifying these novel characters and fail to recognize these Chinese characters without learning all of their radicals in the training stage. To this end, this paper proposes a novel Hippocampus-heuristic Character Recognition Network (HCRN), which references the way of hippocampus thinking, and can recognize unseen Chinese characters (namely zero-shot learning) only by training part of radicals. More specifically, the network architecture of HCRN is a new pseudo-siamese network designed by us, which can learn features from pairs of input training character samples and use them to predict unseen Chinese characters. The experimental results show that HCRN is robust and effective. It can accurately predict about 16,330 unseen testing Chinese characters relied on only 500 trained Chinese characters. The recognition accuracy of HCRN outperforms the state-of-the-art Chinese radical recognition approach by 15% (from 85.1% to 99.9%) for recognizing unseen Chinese characters.
Recently, great success has been achieved in offline handwritten Chinese character recognition by using deep learning methods. Chinese characters are mainly logographic and consist of basic radicals, however, previous research mostly treated each Chi nese character as a whole without explicitly considering its internal two-dimensional structure and radicals. In this study, we propose a novel radical analysis network with densely connected architecture (DenseRAN) to analyze Chinese character radicals and its two-dimensional structures simultaneously. DenseRAN first encodes input image to high-level visual features by employing DenseNet as an encoder. Then a decoder based on recurrent neural networks is employed, aiming at generating captions of Chinese characters by detecting radicals and two-dimensional structures through attention mechanism. The manner of treating a Chinese character as a composition of two-dimensional structures and radicals can reduce the size of vocabulary and enable DenseRAN to possess the capability of recognizing unseen Chinese character classes, only if the corresponding radicals have been seen in training set. Evaluated on ICDAR-2013 competition database, the proposed approach significantly outperforms whole-character modeling approach with a relative character error rate (CER) reduction of 18.54%. Meanwhile, for the case of recognizing 3277 unseen Chinese characters in CASIA-HWDB1.2 database, DenseRAN can achieve a character accuracy of about 41% while the traditional whole-character method has no capability to handle them.
We present a novel counterfactual framework for both Zero-Shot Learning (ZSL) and Open-Set Recognition (OSR), whose common challenge is generalizing to the unseen-classes by only training on the seen-classes. Our idea stems from the observation that the generated samples for unseen-classes are often out of the true distribution, which causes severe recognition rate imbalance between the seen-class (high) and unseen-class (low). We show that the key reason is that the generation is not Counterfactual Faithful, and thus we propose a faithful one, whose generation is from the sample-specific counterfactual question: What would the sample look like, if we set its class attribute to a certain class, while keeping its sample attribute unchanged? Thanks to the faithfulness, we can apply the Consistency Rule to perform unseen/seen binary classification, by asking: Would its counterfactual still look like itself? If ``yes, the sample is from a certain class, and ``no otherwise. Through extensive experiments on ZSL and OSR, we demonstrate that our framework effectively mitigates the seen/unseen imbalance and hence significantly improves the overall performance. Note that this framework is orthogonal to existing methods, thus, it can serve as a new baseline to evaluate how ZSL/OSR models generalize. Codes are available at https://github.com/yue-zhongqi/gcm-cf.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا