New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Lexicon for Profane and Obscene Text Identification in Bengali

معجم للحصول على تحديد نصي ونص فاحش في البنغالية

168 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Bengali is a low-resource language that lacks tools and resources for profane and obscene textual content detection. Until now, no lexicon exists for detecting obscenity in Bengali social media text. This study introduces a Bengali obscene lexicon consisting of over 200 Bengali terms, which can be considered filthy, slang, profane or obscene. A semi-automatic methodology is presented for developing the profane lexicon that leverages an obscene corpus, word embedding, and part-of-speech (POS) taggers. The developed lexicon achieves coverage of around 0.85 for obscene and profane content detection in an evaluation dataset. The experimental results imply that the developed lexicon is effective at identifying obscenity in Bengali social media content.

References used

https://aclanthology.org/

rate research

HisNet: A Polarity Lexicon based on WordNet for Emotion Analysis

309 - Association for Computation Linguistics 2021 مقالة

Dictionary-based methods in sentiment analysis have received scholarly attention recently, the most comprehensive examples of which can be found in English. However, many other languages lack polarity dictionaries, or the existing ones are small in s ize as in the case of SentiTurkNet, the first and only polarity dictionary in Turkish. Thus, this study aims to extend the content of SentiTurkNet by comparing the two available WordNets in Turkish, namely KeNet and TR-wordnet of BalkaNet. To this end, a current Turkish polarity dictionary has been created relying on 76,825 synsets matching KeNet, where each synset has been annotated with three polarity labels, which are positive, negative and neutral. Meanwhile, the comparison of KeNet and TR-wordnet of BalkaNet has revealed their weaknesses such as the repetition of the same senses, lack of necessary merges of the items belonging to the same synset and the presence of redundant narrower versions of synsets, which are discussed in light of their potential to the improvement of the current lexical databases of Turkish.

polarity lexicon based lexicon based القطبية المعجم مقرها معجم مقره صناعة حمض الفوسفور

Error Identification for Machine Translation with Metric Embedding and Attention

317 - Association for Computation Linguistics 2021 مقالة

Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations alo ng with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.

خط أنابيب مستقلة identification for machine error identification تحديد الهوية تحديد الخطأ صناعة حمض الفوسفور

Design a Computerized Lexicon for Machine Translation from Arabic to English

1361 - الجامعة التكنولوجية في بغداد 2013 ورقة بحثية

Lexicon plays an essential role in natural language processing systems and specially the machine translation systems, because it provides the system's components with the necessary information for the translation process. Although there have been a number of researches in natural language processing field, not enough attention has been given to the importance of the lexicon and specially the Arabic lexicon.

Arabic nlp معالجة اللغات الطبيعية (Natural Language Processing (NLP المعجم الترجمة الآلية

A ResNet-50-Based Convolutional Neural Network Model for Language ID Identification from Speech Recordings

285 - Association for Computation Linguistics 2021 مقالة

This paper describes the model built for the SIGTYP 2021 Shared Task aimed at identifying 18 typologically different languages from speech recordings. Mel-frequency cepstral coefficients derived from audio files are transformed into spectrograms, whi ch are then fed into a ResNet-50-based CNN architecture. The final model achieved validation and test accuracies of 0.73 and 0.53, respectively.

خطاب آلي متعدد اللغات neural network model convolutional neural التنافيل الشبكة العصبية نموذج الشبكة العصبية التنافيل العصبي صناعة حمض الفوسفور المزيد..

FITAnnotator: A Flexible and Intelligent Text Annotation System

725 - Association for Computation Linguistics 2021 مقالة

In this paper, we introduce FITAnnotator, a generic web-based tool for efficient text annotation. Benefiting from the fully modular architecture design, FITAnnotator provides a systematic solution for the annotation of a variety of natural language p rocessing tasks, including classification, sequence tagging and semantic role annotation, regardless of the language. Three kinds of interfaces are developed to annotate instances, evaluate annotation quality and manage the annotation task for annotators, reviewers and managers, respectively. FITAnnotator also gives intelligent annotations by introducing task-specific assistant to support and guide the annotators based on active learning and incremental learning strategies. This assistant is able to effectively update from the annotator feedbacks and easily handle the incremental labeling scenarios.

text annotation system annotation system intelligent text annotation نظام التوضيح النصي نظام التوضيحية شروح نصية ذكية صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Lexicon for Profane and Obscene Text Identification in Bengali

معجم للحصول على تحديد نصي ونص فاحش في البنغالية

Ask ChatGPT about the research

Read More

suggested questions