ﻻ يوجد ملخص باللغة العربية
Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks, which are learned together under the multi-task learning framework. In this paper, we propose a concise but effective unified model for MCCWS, which is fully-shared for all the criteria. By leveraging the powerful ability of the Transformer encoder, the proposed unified model can segment Chinese text according to a unique criterion-token indicating the output criterion. Besides, the proposed unified model can segment both simplified and traditional Chinese and has an excellent transfer capability. Experiments on eight datasets with different criteria show that our model outperforms our single-criterion baseline model and other multi-criteria models. Source codes of this paper are available on Github https://github.com/acphile/MCCWS.
Multi-criteria Chinese word segmentation is a promising but challenging task, which exploits several different segmentation criteria and mines their common underlying knowledge. In this paper, we propose a flexible multi-criteria learning for Chinese
Different linguistic perspectives causes many diverse segmentation criteria for Chinese word segmentation (CWS). Most existing methods focus on improve the performance for each single criterion. However, it is interesting to exploit these different c
Chinese word segmentation and dependency parsing are two fundamental tasks for Chinese natural language processing. The dependency parsing is defined on word-level. Therefore word segmentation is the precondition of dependency parsing, which makes de
Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing. In this paper, we build a new toolkit, named PKUSEG, for multi-domain word segmentation. Unlike existing single-model toolkits, PKUSEG targets at multi-doma
Chinese word segmentation (CWS) is often regarded as a character-based sequence labeling task in most current works which have achieved great success with the help of powerful neural networks. However, these works neglect an important clue: Chinese c