Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Text classification problem

تصنيف النصوص

4561 6 295 0 ( 0 )

Download Cite

Added by Damascus University حلقة بحث

Publication date 2018

and research's language is العربية

Authors شيماء الشحمة( طالب ) - يسرى البياتي( طالب ) - محمد عمار الكيلاني( طالب )

Created by Shaymaa Shahma

تصنيف النصوص التعلم الآلي

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Text classification is one of the important areas in natural language processing. The classification problem has been widely studied in data extraction, automated learning, database, and information retrieval with applications in many diverse fields, such as target marketing, medical diagnosis, newsgroup filtering, document organization, topic identification, . For example, in areas such as Computer Vision, there is a strong consensus on a general way of designing models, neural networks, and other approved methodologies. Otherwise, the classification of the text still lacks this general approach in many areas. In this paper, we aim to provide a comprehensive survey of a variety of methodologies and algorithms used to classify texts and their improvements. We will focus on the main general approaches to text classification algorithms and their usage cases.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

يتناول هذا البحث موضوع تصنيف النصوص، وهو أحد المجالات الهامة في معالجة اللغة الطبيعية. يهدف البحث إلى تقديم مسح شامل لمجموعة من المنهجيات والخوارزميات المستخدمة في تصنيف النصوص، مع التركيز على التحسينات التي طرأت عليها. تشمل هذه المنهجيات النهج اليدوي مثل حقيبة الكلمات المفتاحية، النهج الإحصائي باستخدام خوارزميات مثل Naïve Bayes وSupport Vector Machine، وأشجار القرار، بالإضافة إلى الشبكات العصبية مثل الشبكات العصبية المتكررة والشبكات العصبية التلافيفية. يوضح البحث أن تصنيف النصوص لا يزال يفتقر إلى طريقة عامة معتمدة، على عكس مجالات أخرى مثل الرؤية الحاسوبية. كما يسلط الضوء على التحديات التي تواجه هذا المجال مثل تعقيد البيانات والحاجة إلى تحسين الدقة. يهدف البحث إلى تقديم مرجعية شاملة يمكن الاستفادة منها لاحقاً في تطوير تقنيات تصنيف النصوص وتحسين المحتوى العلمي العربي في هذا المجال.

Critical review

دراسة نقدية: على الرغم من شمولية البحث وتغطيته لمجموعة واسعة من المنهجيات والخوارزميات، إلا أنه يفتقر إلى تقديم أمثلة تطبيقية واقعية توضح كيفية استخدام هذه الخوارزميات في مشاريع حقيقية. كما أن البحث يركز بشكل كبير على الجانب النظري دون تقديم تحليل عملي للنتائج أو مقارنة بين أداء الخوارزميات المختلفة في سياقات محددة. بالإضافة إلى ذلك، يمكن أن يكون هناك مزيد من التركيز على التحديات العملية التي تواجه تطبيق هذه الخوارزميات في البيئات الحقيقية وكيفية التغلب عليها. من الجيد أيضاً تضمين دراسات حالة أو أمثلة من الصناعة لتوضيح الفوائد العملية لتصنيف النصوص في مجالات مثل التسويق أو الطب.

Questions related to the research

ما هي المنهجيات الرئيسية المستخدمة في تصنيف النصوص؟

تشمل المنهجيات الرئيسية النهج اليدوي مثل حقيبة الكلمات المفتاحية، النهج الإحصائي باستخدام خوارزميات مثل Naïve Bayes وSupport Vector Machine، وأشجار القرار، بالإضافة إلى الشبكات العصبية مثل الشبكات العصبية المتكررة والشبكات العصبية التلافيفية.
ما هي التحديات الرئيسية التي تواجه تصنيف النصوص؟

تشمل التحديات الرئيسية تعقيد البيانات، الحاجة إلى تحسين الدقة، وتوفير بيانات تدريب كافية وملائمة، بالإضافة إلى التحديات المتعلقة بفهم السياق والمعنى في النصوص.
كيف يمكن تحسين أداء خوارزميات تصنيف النصوص؟

يمكن تحسين أداء خوارزميات تصنيف النصوص من خلال تحسين استخراج السمات، تقليل الأبعاد، تحسين المعاملات، واستخدام تقنيات مثل تضمين الكلمات والشبكات العصبية المتقدمة مثل LSTM وCNN.
ما هي الفوائد العملية لتصنيف النصوص في المجالات المختلفة؟

تشمل الفوائد العملية لتصنيف النصوص تحسين جودة خدمة المعلومات، التسويق المستهدف، التشخيص الطبي، تصفية الأخبار، تنظيم الوثائق، تحديد موضوع المقالات الإخبارية، وتحليل المشاعر.

Keywords

تصنيف النصوص معالجة اللغة الطبيعية الشبكات العصبية خوارزميات التعلم الآلي تحليل النصوص استكشاف النص

References used

https://link.springer.com/chapter/10.1007%2F978-1-4614-3223-4_6

rate research

Study about Arabic Text Documents Classification using Ontologies

3335 - Aِl-Baath University 2014 ورقة بحثية

In this paper, we introduce an algorithm for grouping Arabic documents for building an ontology and its words. We execute the algorithm on five ontologies using Java. We manage the documents by getting 338667 words with its weights corresponding to each ontology. The algorithm had proved its efficiency in optimizing classifiers (SVM, NB) performance, which we tested in this study, comparing with former classifiers results for Arabic language.

Ontology اللغة العربية Arabic Language semantic web الويب الدلالي Documents classification Text categorization Text mining SVM NB الأنطولوجيا تصنيف المستندات تصنيف النصوص تنقيب النصوص المزيد..

Period Classification in Chinese Historical Texts

781 - Association for Computation Linguistics 2021 مقالة

In this study, we study language change in Chinese Biji by using a classification task: classifying Ancient Chinese texts by time periods. Specifically, we focus on a unique genre in classical Chinese literature: Biji (literally notebook'' or brush n otes''), i.e., collections of anecdotes, quotations, etc., anything authors consider noteworthy, Biji span hundreds of years across many dynasties and conserve informal language in written form. For these reasons, they are regarded as a good resource for investigating language change in Chinese (Fang, 2010). In this paper, we create a new dataset of 108 Biji across four dynasties. Based on the dataset, we first introduce a time period classification task for Chinese. Then we investigate different feature representation methods for classification. The results show that models using contextualized embeddings perform best. An analysis of the top features chosen by the word n-gram model (after bleaching proper nouns) confirms that these features are informative and correspond to observations and assumptions made by historical linguists.

ancient chinese texts chinese historical texts classifying ancient chinese النصوص الصينية القديمة النصوص التاريخية الصينية تصنيف الصينيين القديم صناعة حمض الفوسفور المزيد..

Monolingual Word Sense Alignment as a Classification Problem

746 - Association for Computation Linguistics 2021 مقالة

Words are defined based on their meanings in various ways in different resources. Aligning word senses across monolingual lexicographic resources increases domain coverage and enables integration and incorporation of data. In this paper, we explore t he application of classification methods using manually-extracted features along with representation learning techniques in the task of word sense alignment and semantic relationship detection. We demonstrate that the performance of classification methods dramatically varies based on the type of semantic relationships due to the nature of the task but outperforms the previous experiments.

word sense alignment classification problem monolingual word sense محاذاة معنى كلمة مشكلة التصنيف كلمة أحادية الأحادية صناعة حمض الفوسفور المزيد..

Self-supervised Regularization for Text Classification

1086 - Association for Computation Linguistics 2021 مقالة

Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this prob lem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/UCSD-AI4H/SSReg.

ssl text classification SSL. تصنيف النص صناعة حمض الفوسفور

Practical Transformer-based Multilingual Text Classification

990 - Association for Computation Linguistics 2021 مقالة

Transformer-based methods are appealing for multilingual text classification, but common research benchmarks like XNLI (Conneau et al., 2018) do not reflect the data availability and task variety of industry applications. We present an empirical comp arison of transformer-based text classification models in a variety of practical monolingual and multilingual pretraining and fine-tuning settings. We evaluate these methods on two distinct tasks in five different languages. Departing from prior work, our results show that multilingual language models can outperform monolingual ones in some downstream tasks and target languages. We additionally show that practical modifications such as task- and domain-adaptive pretraining and data augmentation can improve classification performance without the need for additional labeled data.

multilingual text classification transformer-based text classification تصنيف النص متعدد اللغات تصنيف النص المستند إلى المحول صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Text classification problem

تصنيف النصوص

Ask ChatGPT about the research

Read More

suggested questions