ﻻ يوجد ملخص باللغة العربية
Multi-criteria Chinese word segmentation is a promising but challenging task, which exploits several different segmentation criteria and mines their common underlying knowledge. In this paper, we propose a flexible multi-criteria learning for Chinese word segmentation. Usually, a segmentation criterion could be decomposed into multiple sub-criteria, which are shareable with other segmentation criteria. The process of word segmentation is a routing among these sub-criteria. From this perspective, we present Switch-LSTMs to segment words, which consist of several long short-term memory neural networks (LSTM), and a switcher to automatically switch the routing among these LSTMs. With these auto-switched LSTMs, our model provides a more flexible solution for multi-criteria CWS, which is also easy to transfer the learned knowledge to new criteria. Experiments show that our model obtains significant improvements on eight corpora with heterogeneous segmentation criteria, compared to the previous method and single-criterion learning.
Different linguistic perspectives causes many diverse segmentation criteria for Chinese word segmentation (CWS). Most existing methods focus on improve the performance for each single criterion. However, it is interesting to exploit these different c
Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks
Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing. In this paper, we build a new toolkit, named PKUSEG, for multi-domain word segmentation. Unlike existing single-model toolkits, PKUSEG targets at multi-doma
Chinese word segmentation (CWS) is often regarded as a character-based sequence labeling task in most current works which have achieved great success with the help of powerful neural networks. However, these works neglect an important clue: Chinese c
Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while the latter