A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing

119 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xipeng Qiu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Hang Yan - Xipeng Qiu - Xuanjing Huang

الحساب واللغة الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Chinese word segmentation and dependency parsing are two fundamental tasks for Chinese natural language processing. The dependency parsing is defined on word-level. Therefore word segmentation is the precondition of dependency parsing, which makes dependency parsing suffer from error propagation and unable to directly make use of the character-level pre-trained language model (such as BERT). In this paper, we propose a graph-based model to integrate Chinese word segmentation and dependency parsing. Different from previous transition-based joint models, our proposed model is more concise, which results in fewer efforts of feature engineering. Our graph-based joint model achieves better performance than previous joint models and state-of-the-art results in both Chinese word segmentation and dependency parsing. Besides, when BERT is combined, our model can substantially reduce the performance gap of dependency parsing between joint models and gold-segmented word-based models. Our code is publicly available at https://github.com/fastnlp/JointCwsParser.

قيم البحث

170 - Xipeng Qiu , Hengzhi Pei , Hang Yan 2019

Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks , which are learned together under the multi-task learning framework. In this paper, we propose a concise but effective unified model for MCCWS, which is fully-shared for all the criteria. By leveraging the powerful ability of the Transformer encoder, the proposed unified model can segment Chinese text according to a unique criterion-token indicating the output criterion. Besides, the proposed unified model can segment both simplified and traditional Chinese and has an excellent transfer capability. Experiments on eight datasets with different criteria show that our model outperforms our single-criterion baseline model and other multi-criteria models. Source codes of this paper are available on Github https://github.com/acphile/MCCWS.

الحساب واللغة الذكاء الاصطناعي

A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

96 - Xinchi Chen , Xipeng Qiu , Xuanjing Huang 2016

Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability of alleviating the burden of manual feature engineering. However, the previous neural models cannot extract the complicated feat ure compositions as the traditional methods with discrete features. In this work, we propose a feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging task. Specifically, to simulate the feature templates of traditional discrete feature based models, we use different filters to model the complex compositional features with convolutional and pooling layer, and then utilize long distance dependency information with recurrent layer. Experimental results on five different datasets show the effectiveness of our proposed model.

الحساب واللغة

Scene Graph Parsing as Dependency Parsing

169 - Yu-Siang Wang , Chenxi Liu , Xiaohui Zeng 2018

In this paper, we study the problem of parsing structured knowledge graphs from textual descriptions. In particular, we consider the scene graph representation that considers objects together with their attributes and relations: this representation h as been proved useful across a variety of vision and language applications. We begin by introducing an alternative but equivalent edge-centric view of scene graphs that connect to dependency parses. Together with a careful redesign of label and action space, we combine the two-stage pipeline used in prior work (generic dependency parsing followed by simple post-processing) into one, enabling end-to-end training. The scene graphs generated by our learned neural dependency parser achieve an F-score similarity of 49.67% to ground truth graphs on our evaluation set, surpassing best previous approaches by 5%. We further demonstrate the effectiveness of our learned parser on image retrieval applications.

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط

Transition-Based Dependency Parsing using Perceptron Learner

101 - Rahul Radhakrishnan Iyer , Miguel Ballesteros , Chris Dyer 2020

Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora. In this paper, w e tackle transition-based dependency parsing using a Perceptron Learner. Our proposed model, which adds more relevant features to the Perceptron Learner, outperforms a baseline arc-standard parser. We beat the UAS of the MALT and LSTM parsers. We also give possible ways to address parsing of non-projective trees.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Dependency Language Models for Transition-based Dependency Parsing

260 - Juntao Yu , Bernd Bohnet 2016

In this paper, we present an approach to improve the accuracy of a strong transition-based dependency parser by exploiting dependency language models that are extracted from a large parsed corpus. We integrated a small number of features based on the dependency language models into the parser. To demonstrate the effectiveness of the proposed approach, we evaluate our parser on standard English and Chinese data where the base parser could achieve competitive accuracy scores. Our enhanced parser achieved state-of-the-art accuracy on Chinese data and competitive results on English data. We gained a large absolute improvement of one point (UAS) on Chinese and 0.5 points for English.

الحساب واللغة