Do you want to publish a course? Click here

Document Classification using Bayesian networks

تصنيف المستندات باستخدام شبكات بيز

2508   0   114   0 ( 0 )
 Publication date 2016
and research's language is العربية
 Created by Bana Omar




Ask ChatGPT about the research

No English abstract

References used
Stuart J. Russell , Peter Norvig ," Artificial Intelligence A Modern Approach" , Third Edition, New Jersey ,2010
Alfonso Eduardo Romero Lopez,Document Classification Models based on Bayesian Network
Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete ," Clustering terms in the Bayesian network retrieval model: a new approach with two term-layers
Dimitri P. Bertsekas , John N. Tsitsiklis," Introduction to Probability", Second Edition
rate research

Read More

The mix-up method (Zhang et al., 2017), one of the methods for data augmentation, is known to be easy to implement and highly effective. Although the mix-up method is intended for image identification, it can also be applied to natural language proce ssing. In this paper, we attempt to apply the mix-up method to a document classification task using bidirectional encoder representations from transformers (BERT) (Devlin et al., 2018). Since BERT allows for two-sentence input, we concatenated word sequences from two documents with different labels and used the multi-class output as the supervised data with a one-hot vector. In an experiment using the livedoor news corpus, which is Japanese, we compared the accuracy of document classification using two methods for selecting documents to be concatenated with that of ordinary document classification. As a result, we found that the proposed method is better than the normal classification when the documents with labels shortages are mixed preferentially. This indicates that how to choose documents for mix-up has a significant impact on the results.
A major challenge in analysing social me-dia data belonging to languages that use non-English script is its code-mixed nature. Recentresearch has presented state-of-the-art contex-tual embedding models (both monolingual s.a.BERT and multilingual s.a. XLM-R) as apromising approach. In this paper, we showthat the performance of such embedding mod-els depends on multiple factors, such as thelevel of code-mixing in the dataset, and thesize of the training dataset. We empiricallyshow that a newly introduced Capsule+biGRUclassifier could outperform a classifier built onthe English-BERT as well as XLM-R just witha training dataset of about 6500 samples forthe Sinhala-English code-mixed data.
اخترنا في هذا المشروع العمل على تطوير نظام يقوم بتصنيف المستندات العربية حسب محتواها, يقوم هذه النظام بالتحليل اللفظي لكلمات المستند ثم إجراء عملية Stemming"رد الأفعال إلى أصلها" ثم تطبيق عملية إحصائية على المستند في مرحلة تدريب النظام ثم بالاعتماد على خوارزميات في الذكاء الصنعي يتم تصنيف المستند حسب محتواه ضمن عناقيد
We use Hypergraph Attention Networks (HyperGAT) to recognize multiple labels of Chinese humor texts. We firstly represent a joke as a hypergraph. The sequential hyperedge and semantic hyperedge structures are used to construct hyperedges. Then, atten tion mechanisms are adopted to aggregate context information embedded in nodes and hyperedges. Finally, we use trained HyperGAT to complete the multi-label classification task. Experimental results on the Chinese humor multi-label dataset showed that HyperGAT model outperforms previous sequence-based (CNN, BiLSTM, FastText) and graph-based (Graph-CNN, TextGCN, Text Level GNN) deep learning models.
In recent years, the problem of classifying objects in images has increased by using deep learning as a result of the industrial sector requirements. Despite of many algorithms used in this field, such as Deep Learning Neural Network DNN and Convolut ional Neural Network CNN, the proposed systems to address this problem Lack of comprehensive solution to the difficulties of long training time and floating memory during the training process, low rating classification. Convolutional Neural Networks (CNNs), which are the most used algorithms for this task, were a mathematical pattern for analyzing images data. A new deep-traversal network pattern was proposed to solve the above problems. The aim of the research is to demonstrate the performance of the recognition system using CNNs networks on the available memory and training time by adapting appropriate variables for the bypass network. The database used in this research is CIFAR10, which consists of 60000 colorful images belonging to ten categories, as every 6,000 images are for a class of these items. Where there are 50,000 training images and 10,000 test tubes. When tested on a sample of selected images from the CIFAR10 database, the model achieved a rating classification of 98.87%.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا