ترغب بنشر مسار تعليمي؟ اضغط هنا

Human Abnormality Detection Based on Bengali Text

119   0   0.0 ( 0 )
 نشر من قبل M. F. Mridha
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In the field of natural language processing and human-computer interaction, human attitudes and sentiments have attracted the researchers. However, in the field of human-computer interaction, human abnormality detection has not been investigated extensively and most works depend on image-based information. In natural language processing, effective meaning can potentially convey by all words. Each word may bring out difficult encounters because of their semantic connection with ideas or categories. In this paper, an efficient and effective human abnormality detection model is introduced, that only uses Bengali text. This proposed model can recognize whether the person is in a normal or abnormal state by analyzing their typed Bengali text. To the best of our knowledge, this is the first attempt in developing a text based human abnormality detection system. We have created our Bengali dataset (contains 2000 sentences) that is generated by voluntary conversations. We have performed the comparative analysis by using Naive Bayes and Support Vector Machine as classifiers. Two different feature extraction techniques count vector, and TF-IDF is used to experiment on our constructed dataset. We have achieved a maximum 89% accuracy and 92% F1-score with our constructed dataset in our experiment.



قيم البحث

اقرأ أيضاً

In recent years, emotion detection in text has become more popular due to its vast potential applications in marketing, political science, psychology, human-computer interaction, artificial intelligence, etc. In this work, we argue that current metho ds which are based on conventional machine learning models cannot grasp the intricacy of emotional language by ignoring the sequential nature of the text, and the context. These methods, therefore, are not sufficient to create an applicable and generalizable emotion detection methodology. Understanding these limitations, we present a new network based on a bidirectional GRU model to show that capturing more meaningful information from text can significantly improve the performance of these models. The results show significant improvement with an average of 26.8 point increase in F-measure on our test data and 38.6 increase on the totally new dataset.
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the Bengali sentences automatically into different groups in accordance with their underlying senses. The input sentences are collected from 50 di fferent categories of the Bengali text corpus developed in the TDIL project of the Govt. of India, while information about the different senses of particular ambiguous lexical item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses that render sentences in different meanings. In our experiment we have achieved around 84% accurate result on the sense classification over the total input sentences. We have analyzed those residual sentences that did not comply with our experiment and did affect the results to note that in many cases, wrong syntactic structures and less semantic information are the main hurdles in semantic classification of sentences. The applicational relevance of this study is attested in automatic text classification, machine learning, information extraction, and word sense disambiguation.
In recent years, large neural networks for natural language generation (NLG) have made leaps and bounds in their ability to generate fluent text. However, the tasks of evaluating quality differences between NLG systems and understanding how humans pe rceive the generated text remain both crucial and difficult. In this system demonstration, we present Real or Fake Text (RoFT), a website that tackles both of these challenges by inviting users to try their hand at detecting machine-generated text in a variety of domains. We introduce a novel evaluation task based on detecting the boundary at which a text passage that starts off human-written transitions to being machine-generated. We show preliminary results of using RoFT to evaluate detection of machine-generated news articles.
The exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices, but also enables people to express anti-social behaviour like online harassment, cyberbullying, a nd hate speech. Numerous works have been proposed to utilize textual data for social and anti-social behaviour analysis, by predicting the contexts mostly for highly-resourced languages like English. However, some languages are under-resourced, e.g., South Asian languages like Bengali, that lack computational resources for accurate natural language processing (NLP). In this paper, we propose an explainable approach for hate speech detection from the under-resourced Bengali language, which we called DeepHateExplainer. Bengali texts are first comprehensively preprocessed, before classifying them into political, personal, geopolitical, and religious hates using a neural ensemble method of transformer-based neural architectures (i.e., monolingual Bangla BERT-base, multilingual BERT-cased/uncased, and XLM-RoBERTa). Important(most and least) terms are then identified using sensitivity analysis and layer-wise relevance propagation(LRP), before providing human-interpretable explanations. Finally, we compute comprehensiveness and sufficiency scores to measure the quality of explanations w.r.t faithfulness. Evaluations against machine learning~(linear and tree-based models) and neural networks (i.e., CNN, Bi-LSTM, and Conv-LSTM with word embeddings) baselines yield F1-scores of 78%, 91%, 89%, and 84%, for political, personal, geopolitical, and religious hates, respectively, outperforming both ML and DNN baselines.
Exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices but also enables people to express anti-social behaviour like online harassment, cyberbullying, and ha te speech. Numerous works have been proposed to utilize these data for social and anti-social behaviours analysis, document characterization, and sentiment analysis by predicting the contexts mostly for highly resourced languages such as English. However, there are languages that are under-resources, e.g., South Asian languages like Bengali, Tamil, Assamese, Telugu that lack of computational resources for the NLP tasks. In this paper, we provide several classification benchmarks for Bengali, an under-resourced language. We prepared three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively. We built the largest Bengali word embedding models to date based on 250 million articles, which we call BengFastText. We perform three different experiments, covering document classification, sentiment analysis, and hate speech detection. We incorporate word embeddings into a Multichannel Convolutional-LSTM (MConv-LSTM) network for predicting different types of hate speech, document classification, and sentiment analysis. Experiments demonstrate that BengFastText can capture the semantics of words from respective contexts correctly. Evaluations against several baseline embedding models, e.g., Word2Vec and GloVe yield up to 92.30%, 82.25%, and 90.45% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا