Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Document Classification using Bayesian networks

تصنيف المستندات باستخدام شبكات بيز

3002 0 114 0 ( 0 )

Download Cite

Added by Tishreen University مشروع تخرج

Publication date 2016

and research's language is العربية

Authors بشار محمد( مشرف ) - بانة عمر( طالب ) - آلاء حسين( طالب ) - حسام ديب( طالب ) - علي محمد( طالب )

Created by Bana Omar

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

أصبحت قضية استرجاع المعلومات في يومنا هذا من أهم القضايا والتحدّيات التي تشغل العالم كنتيجة منطقية للتطوّر التكنولوجي المتسارع والتقدم الهائل في الفكر الإنساني والبحوث والدراسات العلمية في شتى فروع المعرفة وما رافقه من ازدياد في كميات المعلومات إلى حدّ يصعب التحكم بها والتعامل معها. لذا نهدف في مشروعنا إلى تقديم نظام استرجاع معلومات يقوم بتصنيف المستندات حسب محتواها إلا أن عمليّة استرجاع المعلومات تحوي درجة من عدم التأكد في كل مرحلة من مراحلها لذا اعتمدنا على شبكات بيز للقيام بعملية التصنيف وهي شبكات احتماليّة تحوّل المعلومات إلى علاقات cause-and-effect و تعتبر واحدة من أهم الطرق الواعدة لمعالجة حالة عدم التأكد . في البدء نقوم بالتعريف بأساسيّات شبكات بيز ونشرح مجموعة من خوارزميّات بنائها وخوارزميّات الاستدلال المستخدمة ( ولها نوعان دقيق وتقريبي). يقوم هذه النظام بإجراء مجموعة من عمليّات المعالجة الأوليّة لنصوص المستندات ثم تطبيق عمليات إحصائية واحتمالية في مرحلة تدريب النظام والحصول على بنية شبكة بيز الموافقة لبيانات التدريب و يتم تصنيف مستند مدخل باستخدام مجموعة من خوارزميات الاستدلال الدقيق في شبكة بيز الناتجة لدينا. بما أنّ أداء أي نظام استرجاع معلومات عادة ما يزداد دقّة عند استخدام العلاقات بين المفردات (terms) المتضمّنة في مجموعة مستندات فسنأخذ بعين الاعتبار نوعين من العلاقات في بناء الشبكة: 1- العلاقات بين المفردات(terms). 2- العلاقات بين المفردات والأصناف(classes).

No English abstract

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول هذه الورقة البحثية موضوع تصنيف المستندات باستخدام شبكات بيز الاحتمالية. تبدأ الورقة بمقدمة حول أهمية استرجاع المعلومات والتحديات التي تواجهها في ظل الكم الهائل من البيانات الرقمية المتاحة يومياً. تهدف الدراسة إلى تقديم نظام لتصنيف المستندات يعتمد على شبكات بيز، وهي شبكات احتمالية تمثل العلاقات السببية بين المتغيرات. يتم شرح أساسيات شبكات بيز وخوارزميات بنائها والاستدلال الاحتمالي المستخدم فيها. كما تتناول الورقة مقارنة بين المدرسة الكلاسيكية والمدرسة البيزيانية في علم الإحصاء، وتعرض مجموعة من عمليات المعالجة الأولية للنصوص مثل التقطيع وإزالة الكلمات الشائعة والتجريد. تتضمن الورقة أيضاً شرحاً مفصلاً لخوارزميات بناء شبكات بيز مثل خوارزمية Hill Climbing وخوارزمية Chow and Liu، بالإضافة إلى خوارزميات الاستدلال الدقيق مثل Variable Elimination وخوارزميات الاستدلال التقريبي مثل Sampling Methods. في القسم العملي، يتم تقديم نموذج جديد لشبكة بيز يتألف من طبقتين من المفردات وطبقة للأصناف، ويتم اختبار هذا النموذج على مجموعة من مقالات رويترز المصنفة يدوياً. تظهر النتائج أن إدخال العلاقات بين الكلمات يزيد من دقة التصنيف، إلا أن العدد الكبير من العقد والعلاقات قد يبطئ عملية التدريب والتصنيف. تختتم الورقة بتقديم توصيات لتحسين أداء النموذج في المستقبل.

Critical review

تعتبر هذه الورقة البحثية شاملة ومفصلة في تناولها لموضوع تصنيف المستندات باستخدام شبكات بيز الاحتمالية. ومع ذلك، يمكن توجيه بعض النقد البناء لتحسين الدراسة. أولاً، قد يكون من المفيد تقديم المزيد من الأمثلة العملية لتوضيح كيفية تطبيق الخوارزميات المختلفة في سياقات متنوعة. ثانياً، يمكن تحسين القسم العملي بإضافة المزيد من التفاصيل حول كيفية اختيار العتبات والمعايير المستخدمة في تقييم الأداء. ثالثاً، على الرغم من أن الدراسة تناولت العديد من خوارزميات الاستدلال، إلا أنه يمكن تحسينها بمقارنة أداء هذه الخوارزميات مع تقنيات أخرى حديثة في مجال تعلم الآلة. رابعاً، يمكن تحسين الدراسة بإضافة تحليل أعمق لتأثير حجم بيانات التدريب على دقة التصنيف وسرعة التنفيذ. أخيراً، يمكن تقديم توصيات أكثر تفصيلاً حول كيفية تحسين خوارزميات التجريد والمعالجة الأولية للنصوص لزيادة دقة التصنيف.

Questions related to the research

ما هي الفائدة الرئيسية لاستخدام شبكات بيز في تصنيف المستندات؟

الفائدة الرئيسية لاستخدام شبكات بيز في تصنيف المستندات هي قدرتها على تمثيل العلاقات السببية بين المتغيرات بشكل احتمالي، مما يتيح استدلالات دقيقة حتى في ظل وجود بيانات غير كاملة أو غير مؤكدة.
ما هي الخوارزميات المستخدمة في بناء شبكات بيز؟

تتضمن الخوارزميات المستخدمة في بناء شبكات بيز خوارزمية Hill Climbing، وخوارزمية Chow and Liu، وخوارزمية K2، وخوارزمية TPDA.
كيف يمكن تحسين دقة تصنيف المستندات باستخدام شبكات بيز؟

يمكن تحسين دقة تصنيف المستندات باستخدام شبكات بيز من خلال إدخال العلاقات بين الكلمات في بنية الشبكة، واستخدام خوارزميات استدلال دقيقة، وتحسين خوارزميات التجريد والمعالجة الأولية للنصوص.
ما هي التحديات التي تواجه استخدام شبكات بيز في تصنيف المستندات؟

من التحديات التي تواجه استخدام شبكات بيز في تصنيف المستندات هي العدد الكبير من العقد والعلاقات التي قد تبطئ عملية التدريب والتصنيف، والحاجة إلى بيانات تدريب عالية الجودة، وتعقيد الحسابات في الشبكات الكبيرة.

Keywords

شبكات بيز تصنيف المستندات استرجاع المعلومات خوارزميات الاستدلال المعالجة الأولية للنصوص التعلم الآلي

References used

Stuart J. Russell , Peter Norvig ," Artificial Intelligence A Modern Approach" , Third Edition, New Jersey ,2010

Alfonso Eduardo Romero Lopez,Document Classification Models based on Bayesian Network

Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete ," Clustering terms in the Bayesian network retrieval model: a new approach with two term-layers

Dimitri P. Bertsekas , John N. Tsitsiklis," Introduction to Probability", Second Edition

rate research

Application of Mix-Up Method in Document Classification Task Using BERT

841 - Association for Computation Linguistics 2021 مقالة

The mix-up method (Zhang et al., 2017), one of the methods for data augmentation, is known to be easy to implement and highly effective. Although the mix-up method is intended for image identification, it can also be applied to natural language proce ssing. In this paper, we attempt to apply the mix-up method to a document classification task using bidirectional encoder representations from transformers (BERT) (Devlin et al., 2018). Since BERT allows for two-sentence input, we concatenated word sequences from two documents with different labels and used the multi-class output as the supervised data with a one-hot vector. In an experiment using the livedoor news corpus, which is Japanese, we compared the accuracy of document classification using two methods for selecting documents to be concatenated with that of ordinary document classification. As a result, we found that the proposed method is better than the normal classification when the documents with labels shortages are mixed preferentially. This indicates that how to choose documents for mix-up has a significant impact on the results.

mix-up method document classification task document classification طريقة خلط مهمة تصنيف المستندات صناعة حمض الفوسفور

Classification of Code-Mixed Text Using Capsule Networks

866 - Association for Computation Linguistics 2021 مقالة

A major challenge in analysing social me-dia data belonging to languages that use non-English script is its code-mixed nature. Recentresearch has presented state-of-the-art contex-tual embedding models (both monolingual s.a.BERT and multilingual s.a. XLM-R) as apromising approach. In this paper, we showthat the performance of such embedding mod-els depends on multiple factors, such as thelevel of code-mixing in the dataset, and thesize of the training dataset. We empiricallyshow that a newly introduced Capsule+biGRUclassifier could outperform a classifier built onthe English-BERT as well as XLM-R just witha training dataset of about 6500 samples forthe Sinhala-English code-mixed data.

capsule networks code-mixed text text using capsule شبكات كبسولة النص المختلط النص باستخدام كبسولة صناعة حمض الفوسفور المزيد..

Arabic documents classification system

3906 - Tishreen University 2012 مشروع تخرج

اخترنا في هذا المشروع العمل على تطوير نظام يقوم بتصنيف المستندات العربية حسب محتواها, يقوم هذه النظام بالتحليل اللفظي لكلمات المستند ثم إجراء عملية Stemming"رد الأفعال إلى أصلها" ثم تطبيق عملية إحصائية على المستند في مرحلة تدريب النظام ثم بالاعتماد على خوارزميات في الذكاء الصنعي يتم تصنيف المستند حسب محتواه ضمن عناقيد

Machine learning Nlp Support vector machine fuzzy system Arabic nlp

Multi-Label Classification of Chinese Humor Texts Using Hypergraph Attention Networks

663 - Association for Computation Linguistics 2021 مقالة

We use Hypergraph Attention Networks (HyperGAT) to recognize multiple labels of Chinese humor texts. We firstly represent a joke as a hypergraph. The sequential hyperedge and semantic hyperedge structures are used to construct hyperedges. Then, atten tion mechanisms are adopted to aggregate context information embedded in nodes and hyperedges. Finally, we use trained HyperGAT to complete the multi-label classification task. Experimental results on the Chinese humor multi-label dataset showed that HyperGAT model outperforms previous sequence-based (CNN, BiLSTM, FastText) and graph-based (Graph-CNN, TextGCN, Text Level GNN) deep learning models.

hypergraph attention networks chinese humor texts attention networks شبكات انتباه Hypergraph الفكاهة الصينية النصوص انتباه الشبكات صناعة حمض الفوسفور المزيد..

Performance of Objects Classification System in an Image using Convolutional Neural Networks

1727 - Tishreen University 2019 ورقة بحثية

In recent years, the problem of classifying objects in images has increased by using deep learning as a result of the industrial sector requirements. Despite of many algorithms used in this field, such as Deep Learning Neural Network DNN and Convolut ional Neural Network CNN, the proposed systems to address this problem Lack of comprehensive solution to the difficulties of long training time and floating memory during the training process, low rating classification. Convolutional Neural Networks (CNNs), which are the most used algorithms for this task, were a mathematical pattern for analyzing images data. A new deep-traversal network pattern was proposed to solve the above problems. The aim of the research is to demonstrate the performance of the recognition system using CNNs networks on the available memory and training time by adapting appropriate variables for the bypass network. The database used in this research is CIFAR10, which consists of 60000 colorful images belonging to ten categories, as every 6,000 images are for a class of these items. Where there are 50,000 training images and 10,000 test tubes. When tested on a sample of selected images from the CIFAR10 database, the model achieved a rating classification of 98.87%.

الشبكات العصبونية الذكاء الاصطناعي تصنيف الصور الشبكات العصبونية الالتفافية

comments

Fetching comments

AlHawash Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Document Classification using Bayesian networks

تصنيف المستندات باستخدام شبكات بيز

Ask ChatGPT about the research

No English abstract

Read More