Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Google-trickers, Yaminjeongeum, and Leetspeak: An Empirical Taxonomy for Intentionally Noisy User-Generated Text

Google-trickers، Yaminjeongeum، و Leetspeak: تصنيف تجريبي للنص المولّد من المستخدم الحاوي على ضجيج

784 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

WARNING: This article contains contents that may offend the readers. Strategies that insert intentional noise into text when posting it are commonly observed in the online space, and sometimes they aim to let only certain community users understand the genuine semantics. In this paper, we explore the purpose of such actions by categorizing them into tricks, memes, fillers, and codes, and organize the linguistic strategies that are used for each purpose. Through this, we identify that such strategies can be conducted by authors for multiple purposes, regarding the presence of stakeholders such as Peers' and Others'. We finally analyze how these strategies appear differently in each circumstance, along with the unified taxonomy accompanying examples.

References used

https://aclanthology.org/

rate research

Understanding Model Robustness to User-generated Noisy Texts

834 - Association for Computation Linguistics 2021 مقالة

Sensitivity of deep-neural models to input noise is known to be a challenging problem. In NLP, model performance often deteriorates with naturally occurring noise, such as spelling errors. To mitigate this issue, models may leverage artificially nois ed data. However, the amount and type of generated noise has so far been determined arbitrarily. We therefore propose to model the errors statistically from grammatical-error-correction corpora. We present a thorough evaluation of several state-of-the-art NLP systems' robustness in multiple languages, with tasks including morpho-syntactic analysis, named entity recognition, neural machine translation, a subset of the GLUE benchmark and reading comprehension. We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external system for natural language correction. The code is released at https://github.com/ufal/kazitext.

user-generated noisy texts noisy texts user-generated noisy النصوص الناتجة عن المستخدم نصوص صاخبة صاخبة التي تم إنشاؤها صناعة حمض الفوسفور المزيد..

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

683 - Association for Computation Linguistics 2021 مقالة

Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comp rises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT.

evaluating japanese morphological japanese user-generated text japanese morphological analysis تقييم المورفولوجية اليابانية النص الياباني الناتج عن المستخدم التحليل المورفولوجي الياباني صناعة حمض الفوسفور المزيد..

The Korean Morphologically Tight-Fitting Tokenizer for Noisy User-Generated Texts

763 - Association for Computation Linguistics 2021 مقالة

User-generated texts include various types of stylistic properties, or noises. Such texts are not properly processed by existing morpheme analyzers or language models based on formal texts such as encyclopedias or news articles. In this paper, we pro pose a simple morphologically tight-fitting tokenizer (K-MT) that can better process proper nouns, coinages, and internet slang among other types of noise in Korean user-generated texts. We tested our tokenizer by performing classification tasks on Korean user-generated movie reviews and hate speech datasets, and the Korean Named Entity Recognition dataset. Through our tests, we found that K-MT is better fit to process internet slangs, proper nouns, and coinages, compared to a morpheme analyzer and a character-level WordPiece tokenizer.

noisy user-generated texts noisy user-generated morphologically tight-fitting tokenizer النصوص التي أنشأها المستخدم صاخبة صاخبة المستخدم مظلمة ضيقة مورفولوجية صناعة حمض الفوسفور المزيد..

View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data

790 - Association for Computation Linguistics 2021 مقالة

We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in the other view. Finally, the initialized classifier in each view is further trained using the initial training examples. We evaluated our model in the largest publicly available ADR dataset. The experiments testify that our model significantly outperforms the transformer-based models pretrained on domain-specific data.

extracting adverse drug adverse drug effects adverse drug reactions استخراج المخدرات السلبية تأثيرات المخدرات الضارة التفاعلات الدوائية الضارة صناعة حمض الفوسفور المزيد..

Proposition of a methodology for producing orthophotos from Google Earth free browser images

2334 - Tishreen University 2015 ورقة بحثية

Orthorectification is the process of geometrically correcting imagery for geometric distortions which can be caused by topography, camera geometry, and sensor related errors. The output of orthorectification has the same geometric characteristics o f a traditional map. But getting areal or satellite images is an expensive process requiring complex administrative procedures . In this study we propose take advantage of free images available in the browser Google Earth in order to produce an orthophoto. Then we will assess the horizontal accuracy of the resulted orthophoto to know the limitations of its use engineering applications such as maps production and updating. The proposed methodology is based on the flight simulation process within Google Earth to acquire a stereoscopic pair of overlapping images. After that, this pair will be oriented using control points. The oriented pair is then used to generate a Digital Terrain Model (DTM) and to generate the orthophoto. Later, we will examine the accuracy of this orthophoto by comparing it with a topographic plane (scale 1/1000) and with a rectified satellite image of the same area.

صور فضائية Satellite Images تصحيح عمودي نموذج رقمي للأرض المتصفح Google Earth Orthorectification Digital Terrain Model Google Earth المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Google-trickers, Yaminjeongeum, and Leetspeak: An Empirical Taxonomy for Intentionally Noisy User-Generated Text

Google-trickers، Yaminjeongeum، و Leetspeak: تصنيف تجريبي للنص المولّد من المستخدم الحاوي على ضجيج

Ask ChatGPT about the research

Read More

suggested questions