This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which are common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters,
but cannot handle missing or redundant characters due to inconsistency between model inputs and outputs. Although Seq2Seq-based or sequence tagging methods provide solutions to the three error types and achieved relatively good results in English context, they do not perform well in Chinese context according to our experiments. In our work, we propose a novel alignment-agnostic detect-correct framework that can handle both text aligned and non-aligned situations and can serve as a cold start model when no annotation data are provided. Experimental results on three datasets demonstrate that our method is effective and achieves a better performance than most recent published models.
State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, wh
ich involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate that HCTagger is an accurate and much faster approach than many existing models.
Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they n
ot only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention that demonstrate the tradeoff between bias removal and information retention. To address this challenge, we propose OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale. Our experiments on gender biases show that OSCaR is a well-balanced approach that ensures that semantic information is retained in the embeddings and bias is also effectively mitigated.
We develop a minimally-supervised model for spelling correction and evaluate its performance on three datasets annotated for spelling errors in Russian. The first corpus is a dataset of Russian social media data that was recently used in a shared tas
k on Russian spelling correction. The other two corpora contain texts produced by learners of Russian as a foreign language. Evaluating on three diverse datasets allows for a cross-corpus comparison. We compare the performance of the minimally-supervised model to two baseline models that do not use context for candidate re-ranking, as well as to a character-level statistical machine translation system with context-based re-ranking. We show that the minimally-supervised model outperforms all of the other models. We also present an analysis of the spelling errors and discuss the difficulty of the task compared to the spelling correction problem in English.
GECko+ : a Grammatical and Discourse Error Correction Tool We introduce GECko+, a web-based writing assistance tool for English that corrects errors both at the sentence and at the discourse level. It is based on two state-of-the-art models for gramm
ar error correction and sentence ordering. GECko+ is available online as a web application that implements a pipeline combining the two models.
Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems. Correcting these errors manually is a time-consuming
process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We build on previous work on fully automatic unsupervised extraction of parallel data to train a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction designed for English, and adapt it to Finnish by proposing solutions that take the rich morphology of the language into account. Our new method shows increased performance while remaining fully unsupervised, with the added benefit of spelling normalisation. The source code and models are available on GitHub and Zenodo.
Grammatical error correction (GEC) suffers from a lack of sufficient parallel data. Studies on GEC have proposed several methods to generate pseudo data, which comprise pairs of grammatical and artificially produced ungrammatical sentences. Currently
, a mainstream approach to generate pseudo data is back-translation (BT). Most previous studies using BT have employed the same architecture for both the GEC and BT models. However, GEC models have different correction tendencies depending on the architecture of their models. Thus, in this study, we compare the correction tendencies of GEC models trained on pseudo data generated by three BT models with different architectures, namely, Transformer, CNN, and LSTM. The results confirm that the correction tendencies for each error type are different for every BT model. In addition, we investigate the correction tendencies when using a combination of pseudo data generated by different BT models. As a result, we find that the combination of different BT models improves or interpolates the performance of each error type compared with using a single BT model with different seeds.
The game of handball is a group of games characterized by a variety of basic skills
offensive and defensive, and varied plans, whether in the attack or defense, it is worth
mentioning that all movements of the attack aimed at finishing the correcti
on against the
opposing team, which is one of the most important duties in the practice of handball and
stop The result of the match on the accuracy of the technical performance of this skill. This
is what called for the selection of the skill of correction in this study because of its
importance, and that it is considered a key skills players in this game.
The research sample consisted of (12) young players from the center of Lattakia
Governorate. The experimental method was used in a group style.
The aim of the research is to prepare qualitative exercises to improve the accuracy of the
skill of the correction and to identify the role of the exercises in the development of the
skill level of correction, which can give positive results after application.
Where the sample was subjected to tribal testing to measure accuracy in the skill of
correction and then applied a set of specific exercises aimed at developing the skill of
correction and then telemetry when the completion of the exercise.
In comparing the tribal and remote measurements of the sample, the results of the study
showed that the specific exercises used in teaching and developing the skill of accuracy of
the correction of the base resulted in a significant improvement in the level of the research
sample.
The study recommended the use of training programs based on scientific foundations in the
development of skillful performance in handball. And the need to use targeted and
organized exercises and take into account the age groups and skill level when using
exercises to develop the skill of correction.
This study aims is to analyze the effect of spatial accuracy of the control points on the
images geometric correction accuracy, and this is done by applying tests on the same
image (IKONOS), where polynomial transformations were applied using sets
of control
points, each with absolute accuracy different from the other. These points were
extrapolated from a 1/1000 topographic map and from a georeferenced MOMS satellite
image with geometric accuracy of 2m and measured by GPS. The study showed that it is
possible to obtain the most accurate geometric correction by using control points with
absolute accuracy close to the spatial resolution of the image. It also showed that the use of
more precise control points would not ameliorate the accuracy of the geometric correction,
because the measurement of these points on the image is limited by its spatial resolution.
This study investigates the relationship between the financial
development and the economic growth in Syria during the period
(1980-2010). The financial development was measured by the
credit granted to the private sector and the broad money M2
whereas the economic growth was measured by the real gross
domestic product per capita.