ترغب بنشر مسار تعليمي؟ اضغط هنا

Spelling Error Trends and Patterns in Sindhi

34   0   0.0 ( 0 )
 نشر من قبل Zeeshan Bhatti
 تاريخ النشر 2014
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Statistical error Correction technique is the most accurate and widely used approach today, but for a language like Sindhi which is a low resourced language the trained corporas are not available, so the statistical techniques are not possible at all. Instead a useful alternative would be to exploit various spelling error trends in Sindhi by using a Rule based approach. For designing such technique an essential prerequisite would be to study the various error patterns in a language. This pa per presents various studies of spelling error trends and their types in Sindhi Language. The research shows that the error trends common to all languages are also encountered in Sindhi but their do exist some error patters that are catered specifically to a Sindhi language.

قيم البحث

اقرأ أيضاً

284 - Sina Ahmadi 2018
Automatic spelling and grammatical correction systems are one of the most widely used tools within natural language applications. In this thesis, we assume the task of error correction as a type of monolingual machine translation where the source sen tence is potentially erroneous and the target sentence should be the corrected form of the input. Our main focus in this project is building neural network models for the task of error correction. In particular, we investigate sequence-to-sequence and attention-based models which have recently shown a higher performance than the state-of-the-art of many language processing problems. We demonstrate that neural machine translation models can be successfully applied to the task of error correction. While the experiments of this research are performed on an Arabic corpus, our methods in this thesis can be easily applied to any language.
Existing natural language processing systems are vulnerable to noisy inputs resulting from misspellings. On the contrary, humans can easily infer the corresponding correct words from their misspellings and surrounding context. Inspired by this, we ad dress the stand-alone spelling correction problem, which only corrects the spelling of each token without additional token insertion or deletion, by utilizing both spelling information and global context representations. We present a simple yet powerful solution that jointly detects and corrects misspellings as a sequence labeling task by fine-turning a pre-trained language model. Our solution outperforms the previous state-of-the-art result by 12.8% absolute F0.5 score.
A sequence-to-sequence learning with neural networks has empirically proven to be an effective framework for Chinese Spelling Correction (CSC), which takes a sentence with some spelling errors as input and outputs the corrected one. However, CSC mode ls may fail to correct spelling errors covered by the confusion sets, and also will encounter unseen ones. We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances, and apply a task-specific pre-training strategy to enhance the model. The generated adversarial examples are gradually added to the training set. Experimental results show that such an adversarial training method combined with the pretraining strategy can improve both the generalization and robustness of multiple CSC models across three different datasets, achieving stateof-the-art performance for CSC task.
Chinese Spelling Check (CSC) is a task to detect and correct spelling errors in Chinese natural language. Existing methods have made attempts to incorporate the similarity knowledge between Chinese characters. However, they take the similarity knowle dge as either an external input resource or just heuristic rules. This paper proposes to incorporate phonological and visual similarity knowledge into language models for CSC via a specialized graph convolutional network (SpellGCN). The model builds a graph over the characters, and SpellGCN is learned to map this graph into a set of inter-dependent character classifiers. These classifiers are applied to the representations extracted by another network, such as BERT, enabling the whole network to be end-to-end trainable. Experiments (The dataset and all code for this paper are available at https://github.com/ACL2020SpellGCN/SpellGCN) are conducted on three human-annotated datasets. Our method achieves superior performance against previous models by a large margin.
It is proved in this work that exhaustively determining bad patterns in arbitrary, finite low-density parity-check (LDPC) codes, including stopping sets for binary erasure channels (BECs) and trapping sets (also known as near-codewords) for general m emoryless symmetric channels, is an NP-complete problem, and efficient algorithms are provided for codes of practical short lengths n~=500. By exploiting the sparse connectivity of LDPC codes, the stopping sets of size <=13 and the trapping sets of size <=11 can be efficiently exhaustively determined for the first time, and the resulting exhaustive list is of great importance for code analysis and finite code optimization. The featured tree-based narrowing search distinguishes this algorithm from existing ones for which inexhaustive methods are employed. One important byproduct is a pair of upper bounds on the bit-error rate (BER) & frame-error rate (FER) iterative decoding performance of arbitrary codes over BECs that can be evaluated for any value of the erasure probability, including both the waterfall and the error floor regions. The tightness of these upper bounds and the exhaustion capability of the proposed algorithm are proved when combining an optimal leaf-finding module with the tree-based search. These upper bounds also provide a worst-case-performance guarantee which is crucial to optimizing LDPC codes for extremely low error rate applications, e.g., optical/satellite communications. Extensive numerical experiments are conducted that include both randomly and algebraically constructed LDPC codes, the results of which demonstrate the superior efficiency of the exhaustion algorithm and its significant value for finite length code optimization.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا