New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Spelling Correction for Russian: A Comparative Study of Datasets and Methods

تصحيح إملائي للروسية: دراسة مقارنة لمجموعات البيانات والأساليب

262 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We develop a minimally-supervised model for spelling correction and evaluate its performance on three datasets annotated for spelling errors in Russian. The first corpus is a dataset of Russian social media data that was recently used in a shared task on Russian spelling correction. The other two corpora contain texts produced by learners of Russian as a foreign language. Evaluating on three diverse datasets allows for a cross-corpus comparison. We compare the performance of the minimally-supervised model to two baseline models that do not use context for candidate re-ranking, as well as to a character-level statistical machine translation system with context-based re-ranking. We show that the minimally-supervised model outperforms all of the other models. We also present an analysis of the spelling errors and discuss the difficulty of the task compared to the spelling correction problem in English.

References used

https://aclanthology.org/

rate research

Three-part diachronic semantic change dataset for Russian

626 - Association for Computation Linguistics 2021 مقالة

We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval. Its novelty is ensured by a single set of target words annotated for their diachronic semantic shifts across three time periods, while the previous work either used only two time periods, or different sets of target words. The paper describes the composition and annotation procedure for the dataset. In addition, it is shown how the ternary nature of RuShiftEval allows to trace specific diachronic trajectories: changed at a particular time period and stable afterwards' or was changing throughout all time periods'. Based on the analysis of the submissions to the recent shared task on semantic change detection for Russian, we argue that correctly identifying such trajectories can be an interesting sub-task itself.

semantic change dataset dataset for russian three-part diachronic semantic مجموعة بيانات التغيير الدلالي DataSet للروسية الدلالية الثالثة من ثلاثة أجزاء صناعة حمض الفوسفور المزيد..

GECko+: a Grammatical and Discourse Error Correction Tool

334 - Association for Computation Linguistics 2021 مقالة

GECko+ : a Grammatical and Discourse Error Correction Tool We introduce GECko+, a web-based writing assistance tool for English that corrects errors both at the sentence and at the discourse level. It is based on two state-of-the-art models for gramm ar error correction and sentence ordering. GECko+ is available online as a web application that implements a pipeline combining the two models.

discourse error correction error correction tool خطاب الخطأ تصحيح أداة تصحيح الخطأ تصحيح الاخطاء صناعة حمض الفوسفور

Comparison of Grammatical Error Correction Using Back-Translation Models

666 - Association for Computation Linguistics 2021 مقالة

Grammatical error correction (GEC) suffers from a lack of sufficient parallel data. Studies on GEC have proposed several methods to generate pseudo data, which comprise pairs of grammatical and artificially produced ungrammatical sentences. Currently , a mainstream approach to generate pseudo data is back-translation (BT). Most previous studies using BT have employed the same architecture for both the GEC and BT models. However, GEC models have different correction tendencies depending on the architecture of their models. Thus, in this study, we compare the correction tendencies of GEC models trained on pseudo data generated by three BT models with different architectures, namely, Transformer, CNN, and LSTM. The results confirm that the correction tendencies for each error type are different for every BT model. In addition, we investigate the correction tendencies when using a combination of pseudo data generated by different BT models. As a result, we find that the combination of different BT models improves or interpolates the performance of each error type compared with using a single BT model with different seeds.

grammatical error correction grammatical error تصحيح الأخطاء النحوية خطأ نحوي صناعة حمض الفوسفور

Hierarchical Character Tagger for Short Text Spelling Error Correction

350 - Association for Computation Linguistics 2021 مقالة

State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, wh ich involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate that HCTagger is an accurate and much faster approach than many existing models.

spelling error correction text spelling error hierarchical character tagger تصحيح الأخطاء الإملائي خطأ تهجئة النص الطابع الهرمي Tagger. صناعة حمض الفوسفور المزيد..

Comparative study for various methods of calculating the volumes of dredging in harbors

1594 - Tishreen University 2018 ورقة بحثية

The objective of the research is to complete a theoretical and practical study related to coastal marine works in order to calculate the amounts of silt removal from harbor basins and entrances, and to present the methods and devices used in the pe rformance of topographic survey and numerical methods in the calculation and comparison of quantities. In the theoretical part, the factors that lead to the formation of silt deposits in the port basins, the methods of their removal and the deepening of the navigational pathways to enter and exit the harbors were addressed. In the practical part, the results, methods of measurements and topographic results were presented during the stages of investment of the port, at least two stages, at the beginning of the investment and before the process of direct withdrawal, and then calculating the quantities of the implemented and comparing them, to obtain maritime plans and final quantities. The research concluded with specific proposals on the methods of calculating the quantities of the isolated port, the method of constructing the measured geodetic networks, the achievement of the topographic elevation under the water surface, and the identification of the software parts related to the various marine works and ways of benefiting from them.

نظم المعلومات الجغرافية Geographic Information Systems شبكة مراقبة جيوديزية مساحة بحرية مرفأ برنامج الرسم المساحي port تعزيل المرافئ الطمي رواسب المرافئ marine area geodetic observation network surveying program port removal silt harbor deposits المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Spelling Correction for Russian: A Comparative Study of Datasets and Methods

تصحيح إملائي للروسية: دراسة مقارنة لمجموعات البيانات والأساليب

Ask ChatGPT about the research

Read More

suggested questions