ﻻ يوجد ملخص باللغة العربية
In this paper, we develop a low than character feature embedding called radical embedding, and apply it on LSTM model for sentence segmentation of pre modern Chinese texts. The datasets includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles. LSTM CRF model is a state of art method for the sequence labeling problem. Our new model adds a component of radical embedding, which leads to improved performances. Experimental results based on the aforementioned Chinese books demonstrates a better accuracy than earlier methods on sentence segmentation, especial in Tang Epitaph texts.
We consider three major text sources about the Tang Dynasty of China in our experiments that aim to segment text written in classical Chinese. These corpora include a collection of Tang Tomb Biographies, the New Tang Book, and the Old Tang Book. We s
Chinese word segmentation (CWS) is the basic of Chinese natural language processing (NLP). The quality of word segmentation will directly affect the rest of NLP tasks. Recently, with the artificial intelligence tide rising again, Long Short-Term Memo
Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common s
Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while the latter
There is a small but growing body of research on statistical scripts, models of event sequences that allow probabilistic inference of implicit events from documents. These systems operate on structured verb-argument events produced by an NLP pipeline