Research papers, master and doctoral theses about noisy user-generated

The Korean Morphologically Tight-Fitting Tokenizer for Noisy User-Generated Texts

316 - Association for Computation Linguistics 2021 مقالة

User-generated texts include various types of stylistic properties, or noises. Such texts are not properly processed by existing morpheme analyzers or language models based on formal texts such as encyclopedias or news articles. In this paper, we pro pose a simple morphologically tight-fitting tokenizer (K-MT) that can better process proper nouns, coinages, and internet slang among other types of noise in Korean user-generated texts. We tested our tokenizer by performing classification tasks on Korean user-generated movie reviews and hate speech datasets, and the Korean Named Entity Recognition dataset. Through our tests, we found that K-MT is better fit to process internet slangs, proper nouns, and coinages, compared to a morpheme analyzer and a character-level WordPiece tokenizer.

noisy user-generated texts noisy user-generated morphologically tight-fitting tokenizer النصوص التي أنشأها المستخدم صاخبة صاخبة المستخدم مظلمة ضيقة مورفولوجية صناعة حمض الفوسفور المزيد..

Google-trickers, Yaminjeongeum, and Leetspeak: An Empirical Taxonomy for Intentionally Noisy User-Generated Text

392 - Association for Computation Linguistics 2021 مقالة

WARNING: This article contains contents that may offend the readers. Strategies that insert intentional noise into text when posting it are commonly observed in the online space, and sometimes they aim to let only certain community users understand t he genuine semantics. In this paper, we explore the purpose of such actions by categorizing them into tricks, memes, fillers, and codes, and organize the linguistic strategies that are used for each purpose. Through this, we identify that such strategies can be conducted by authors for multiple purposes, regarding the presence of stakeholders such as Peers' and Others'. We finally analyze how these strategies appear differently in each circumstance, along with the unified taxonomy accompanying examples.

معالجة اللغات الطبيعية تصنيف النصوص intentionally noisy user-generated noisy user-generated text intentionally noisy النص الفوضوي النص الضّاج المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد