Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

An Alignment-Agnostic Model for Chinese Text Error Correction

نموذج محاذاة غير مرغقة لتصحيح خطأ النص الصيني

520 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which are common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters, but cannot handle missing or redundant characters due to inconsistency between model inputs and outputs. Although Seq2Seq-based or sequence tagging methods provide solutions to the three error types and achieved relatively good results in English context, they do not perform well in Chinese context according to our experiments. In our work, we propose a novel alignment-agnostic detect-correct framework that can handle both text aligned and non-aligned situations and can serve as a cold start model when no annotation data are provided. Experimental results on three datasets demonstrate that our method is effective and achieves a better performance than most recent published models.

References used

https://aclanthology.org/

rate research

Hierarchical Character Tagger for Short Text Spelling Error Correction

384 - Association for Computation Linguistics 2021 مقالة

State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, wh ich involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate that HCTagger is an accurate and much faster approach than many existing models.

spelling error correction text spelling error hierarchical character tagger تصحيح الأخطاء الإملائي خطأ تهجئة النص الطابع الهرمي Tagger. صناعة حمض الفوسفور المزيد..

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

737 - Association for Computation Linguistics 2021 مقالة

Grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs for training, but obtaining such annotation can be prohibitively expensive. Recently, the Break-It-Fix-It (BIFI) framework has demonstrated strong results on learning to repair a broken program without any labeled examples, but this relies on a perfect critic (e.g., a compiler) that returns whether an example is valid or not, which does not exist for the GEC task. In this work, we show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical if the LM assigns it a higher probability than its local perturbations. We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector. We evaluate our approach on GEC datasets on multiple domains (CoNLL-2014, BEA-2019, GMEG-wiki and GMEG-yahoo) and show that it outperforms existing methods in both the unsupervised setting (+7.7 F0.5) and the supervised setting (+0.5 F0.5).

نماذج اللغة المحصول unsupervised grammatical error خطأ نحوي غير مؤكد صناعة حمض الفوسفور

SpellBERT: A Lightweight Pretrained Model for Chinese Spelling Check

647 - Association for Computation Linguistics 2021 مقالة

Chinese Spelling Check (CSC) is to detect and correct Chinese spelling errors. Many models utilize a predefined confusion set to learn a mapping between correct characters and its visually similar or phonetically similar misuses but the mapping may b e out-of-domain. To that end, we propose SpellBERT, a pretrained model with graph-based extra features and independent on confusion set. To explicitly capture the two erroneous patterns, we employ a graph neural network to introduce radical and pinyin information as visual and phonetic features. For better fusing these features with character representations, we devise masked language model alike pre-training tasks. With this feature-rich pre-training, SpellBERT with only half size of BERT can show competitive performance and make a state-of-the-art result on the OCR dataset where most of the errors are not covered by the existing confusion set.

chinese spelling check spelling check chinese spelling التدقيق الإملائي الصيني التدقيق الإملائي الهجاء الصينية صناعة حمض الفوسفور المزيد..

Parallel Text Alignment and Monolingual Parallel Corpus Creation from Philosophical Texts for Text Simplification

432 - Association for Computation Linguistics 2021 مقالة

Text simplification is a growing field with many potential useful applications. Training text simplification algorithms generally requires a lot of annotated data, however there are not many corpora suitable for this task. We propose a new unsupervis ed method for aligning text based on Doc2Vec embeddings and a new alignment algorithm, capable of aligning texts at different levels. Initial evaluation shows promising results for the new approach. We used the newly developed approach to create a new monolingual parallel corpus composed of the works of English early modern philosophers and their corresponding simplified versions.

creation from philosophical parallel corpus creation philosophical texts إنشاء من الفلسفية موازية إنشاء كوربوس النصوص الفلسفية صناعة حمض الفوسفور المزيد..

GECko+: a Grammatical and Discourse Error Correction Tool

376 - Association for Computation Linguistics 2021 مقالة

GECko+ : a Grammatical and Discourse Error Correction Tool We introduce GECko+, a web-based writing assistance tool for English that corrects errors both at the sentence and at the discourse level. It is based on two state-of-the-art models for gramm ar error correction and sentence ordering. GECko+ is available online as a web application that implements a pipeline combining the two models.

discourse error correction error correction tool خطاب الخطأ تصحيح أداة تصحيح الخطأ تصحيح الاخطاء صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Alignment-Agnostic Model for Chinese Text Error Correction

نموذج محاذاة غير مرغقة لتصحيح خطأ النص الصيني

Ask ChatGPT about the research

Read More

suggested questions