Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Topic Detection and Tracking

اكتشاف الموضوع وتتبعه

2821 1 29 0 ( 0 )

Download Cite

Added by Tishreen University مشروع تخرج

Publication date 2016

and research's language is العربية

Authors دريد عبد الله( طالب ) - رهام البودي( طالب ) - بشار محمد( مشرف ) - علاء الناطور( طالب )

Created by Doried Abd-Allah

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

نتيجةً للتطور الهائل في العلوم والتكنولوجيا، والانتشار الواسع للإنترنت، باتت المعرفة البشرية في متناول كل شخص منا. لكن ومع هذا الكم الهائل من المعلومات، اصبح القارئ مشتتا بين مصادر عديدة تجعله يضيع في هذا الفضاء الواسع. انفجار المعلومات هذا تطلب وسائل للسيطرة عليه تقوم بتنظيم هذه المعلومات وترتيبها تحت عناوين عريضة، وتتتبعها. من هنا بدء المجتمع التقني بالاتجاه نحو مجال جديد اطلق عليه اسم اكتشاف الموضوع وتتبعه. يطبق هذا المفهوم بشكل واسع في مجال شبكات التواصل الاجتماعي، الاخبار، المقالات العلمية وغيرها الكثير. ففي مجال الاخبار كثيرا ما ترى آلاف وكالات الاخبار تبث عشرات الاف القصص الاخبارية حول نفس الحدث، ما دفع البوابات الاخبارية وفي مقدمتها Google news لتطبيق نظام اكتشاف للموضوع وتتبعه. يعنى هذا النظام بمجموعة من المهام المعرفة من قبل منظمة DARPA، أولها مراقبة سيل من القصص النصية المتصلة لمعرفة الحدود الفاصلة بين كل قصة والاخرى، وتدعى تقطيع القصص، ثانيها مهمتها الاجابة على السؤال: هل تناقش قصتان معطاتان نفس الموضوع او الحدث؟ وتدعى اكتشاف الصلة. ثالثها معنية بمراقبة سيل من القصص لاكتشاف تلك التي تناقش موضوعا معرفا من قبل المستخدم، وتدعى بتتبع الموضوع. رابعها تهتم بالتعرف على القصص التي تناقش احداثا جديدة فور وصولها، وتدعى اكتشاف القصة الاولى. واخرها تدعى اكتشاف الموضوع، وهي مسؤولة عن فصل مجموعة من القصص المختلطة الى مواضيع، بدون اي معرفة مسبقة بهذه المواضيع، اي تجميع القصص التي تناقش موضوعا واحدا في نفس العنقود. نعمل من خلال هذا المشروع على تطبيق المهام الاربع الاخيرة وتقييمها. يتم استلام القصص في الزمن الحقيقي، اجراء معالجة مسبقة عليها (معالجة لغوية وغير ذلك)، ثم يتم تمثيل القصص بشكل اشعة وتوزين كلمات كل قصة، يتم بعدها اختيار مجموعة كلمات لتمثيل القصة. اما تمثيل المواضيع فنختبر اشكالا مختلفة، كالتمثيل الشعاعي او التمثيل بالقصص وغير ذلك. نناقش خلال هذا المشروع ايضاً استخدام معايير مختلفة لتمثيل القصص وقياس تشابهها، ونختبر استخدام عنوان القصة وتاريخها كمميزات بالإضافة الى مجموعة الكلمات. كما ونتحدث عن منهج خاص بنا لتقييس التشابهات بين القصص والتخفيف من تأثير عمليات اختيار العتبات في النظام، ونعرض التحسينات المذهلة التي يبديها هذا المنهج، والتي تمكن من بناء نظام اكتشاف موضوع وتتبعه، دون القلق حول تحديد العتبة اطلاقا، والذي لطالما كان يمثل التحدي الاكبر لهذا النوع من الانظمة. نتحدث عن تطبيقنا لخوارزميات العنقدة الاكثر تطورا في مهمة اكتشاف الموضوع، ونعرض كيفية قيامنا بتعديل مصفوفة التجاذب في خوارزمية العنقدة الطيفية المطروحة واستخدام طريقة تقييس مختلفة تم تكييفها مع حالة نظامنا، والتي ادت الى تحسين اداء العنقدة من 0.89 الى 0.97 مقاسا على F-measure

No English abstract

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول هذه الورقة البحثية نظام اكتشاف الموضوع وتتبعه، وهو نظام يهدف إلى تنظيم المعلومات الهائلة المتاحة على الإنترنت وتصنيفها وتتبعها بشكل فعال. يتم تطبيق هذا النظام في مجالات متعددة مثل الأخبار وشبكات التواصل الاجتماعي والمقالات العلمية. يتضمن النظام عدة مهام رئيسية منها تقطيع القصص، اكتشاف الصلة بين القصص، تتبع الموضوع، اكتشاف القصة الأولى، واكتشاف الموضوع. يتمثل الهدف من المشروع في بناء نظام متكامل يقوم بهذه المهام وتقييم أدائه باستخدام بيانات حقيقية. تم استخدام تقنيات متقدمة مثل خوارزميات العنقدة الطيفية ومصفوفة التجاذب الضعيفة لتحسين أداء النظام وتقليل تأثير العتبات على القرارات. أظهرت النتائج التجريبية أن النظام يمكنه تحقيق أداء عالي مع استقرار جيد في القرارات، مما يجعله أداة فعالة لتنظيم وتتبع المعلومات في الزمن الحقيقي.

Critical review

تعتبر هذه الورقة البحثية خطوة مهمة في مجال اكتشاف الموضوع وتتبعه، حيث تقدم حلولاً مبتكرة لتحسين أداء النظام وتقليل تأثير العتبات على القرارات. ومع ذلك، يمكن ملاحظة بعض النقاط التي قد تحتاج إلى مزيد من البحث والتطوير. على سبيل المثال، قد يكون من المفيد استكشاف تأثير استخدام تقنيات تعلم الآلة الحديثة مثل الشبكات العصبية العميقة في تحسين دقة النظام. بالإضافة إلى ذلك، يمكن تحسين عملية جمع البيانات وتوسيعها لتشمل مصادر متعددة ومتنوعة لضمان شمولية النظام. أخيراً، قد يكون من المفيد تقديم دراسة مقارنة بين النظام المقترح والأنظمة الأخرى الموجودة في السوق لتحديد نقاط القوة والضعف بشكل أكثر دقة.

Questions related to the research

ما هي المهام الرئيسية التي يقوم بها نظام اكتشاف الموضوع وتتبعه؟

المهام الرئيسية هي تقطيع القصص، اكتشاف الصلة بين القصص، تتبع الموضوع، اكتشاف القصة الأولى، واكتشاف الموضوع.
ما هي التقنيات المستخدمة لتحسين أداء نظام اكتشاف الموضوع وتتبعه؟

تم استخدام خوارزميات العنقدة الطيفية ومصفوفة التجاذب الضعيفة لتحسين أداء النظام وتقليل تأثير العتبات على القرارات.
ما هو الهدف الرئيسي من المشروع المقدم في الورقة البحثية؟

الهدف الرئيسي هو بناء نظام متكامل يقوم باكتشاف الموضوع وتتبعه، وتقييم أدائه باستخدام بيانات حقيقية.
ما هي النتائج التجريبية التي توصلت إليها الورقة البحثية؟

أظهرت النتائج التجريبية أن النظام يمكنه تحقيق أداء عالي مع استقرار جيد في القرارات، مما يجعله أداة فعالة لتنظيم وتتبع المعلومات في الزمن الحقيقي.

Keywords

اكتشاف الموضوع تتبع الموضوع العنقدة الطيفية مصفوفة التجاذب الضعيفة تنظيم المعلومات العتبات القصص الإخبارية

References used

Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, Y. (1998). Topic Detection and Tracking Pilot Study- Final Report. UMass Amherst,CMU,DARPA and Dragon Systems.

Allan, J., Lavrenko, V., & Connell, M. E. (2003, September). A month to topic detection and tracking in Hindi. ACM Journal.

Bauhaus-Universität Weimar. (n.d.). Clusters Evaluation. Retrieved July 9, 2016, from Bauhaus-Universität Weimar: http://www.uni-weimar.de/medien/webis/teaching/lecturenotes/machine-learning/unit-en-cluster-analysis-evaluation.pdf

EL. Bhissy, K., EL. Faleet, F., & Ashour, W. (2014). Spectral Clustering Using Optimized Gaussian Kernel. International Journal of Artificial Intelligence and Applications for Smart Devices.

G. Fiscus, J., & R. Doddington , G. (2002). Topic Detection and Tracking Evaluation Overview. NIST publications.

Hiemstra, D. (2006). LANGUAGE MODELS. Retrieved July 9, 2016, from Universiteit Twente: http://doc.utwente.nl/64831/1/eds-lm-draft.pdf

Liu, X. (2011, December). Topic Detection with Hypergraph Partition. Journal of software.

Strang, G. (2016). Introduction to Linear Algebra. In G. Strang, Introduction to Linear Algebra (5 ed., pp. 283-297). MIT.

Wayne, C. L. (1998). Topic Detection & Tracking (TDT) Overview & Perspective. Retrieved July 8, 2016, from National Institute of Standards and Technology: http://www.itl.nist.gov/iad/mig/publications/proceedings/darpa98/html/tdt10/tdt10.htm

Y. Ng, A., I. Jodran, M., & Weiss, Y. (2001). on spectral clustering analysis and an algorithm. Neural Information Processing Systems.

Zelnik-Manor, L., & Perona, P. (2004). Self-Tuning Spectral Clustering. Neural Information Processing Systems.

rate research

Generalisability of Topic Models in Cross-corpora Abusive Language Detection

901 - Association for Computation Linguistics 2021 مقالة

Rapidly changing social media content calls for robust and generalisable abuse detection models. However, the state-of-the-art supervised models display degraded performance when they are evaluated on abusive comments that differ from the training co rpus. We investigate if the performance of supervised models for cross-corpora abuse detection can be improved by incorporating additional information from topic models, as the latter can infer the latent topic mixtures from unseen samples. In particular, we combine topical information with representations from a model tuned for classifying abusive comments. Our performance analysis reveals that topic models are able to capture abuse-related topics that can transfer across corpora, and result in improved generalisability.

طريقة مبادرة مقرها صناعة حمض الفوسفور

Topic Model or Topic Twaddle? Re-evaluating Semantic Interpretability Measures

843 - Association for Computation Linguistics 2021 مقالة

When developing topic models, a critical question that should be asked is: How well will this model work in an applied setting? Because standard performance evaluation of topic interpretability uses automated measures modeled on human evaluation test s that are dissimilar to applied usage, these models' generalizability remains in question. In this paper, we probe the issue of validity in topic model evaluation and assess how informative coherence measures are for specialized collections used in an applied setting. Informed by the literature, we propose four understandings of interpretability. We evaluate these using a novel experimental framework reflective of varied applied settings, including human evaluations using open labeling, typical of applied research. These evaluations show that for some specialized collections, standard coherence measures may not inform the most appropriate topic model or the optimal number of topics, and current interpretability performance validation methods are challenged as a means to confirm model quality in the absence of ground truth data.

topic twaddle twaddle topic موضوع Twaddle. تثق صناعة حمض الفوسفور

Spurious Correlations in Cross-Topic Argument Mining

544 - Association for Computation Linguistics 2021 مقالة

Recent work in cross-topic argument mining attempts to learn models that generalise across topics rather than merely relying on within-topic spurious correlations. We examine the effectiveness of this approach by analysing the output of single-task a nd multi-task models for cross-topic argument mining, through a combination of linear approximations of their decision boundaries, manual feature grouping, challenge examples, and ablations across the input vocabulary. Surprisingly, we show that cross-topic models still rely mostly on spurious correlations and only generalise within closely related topics, e.g., a model trained only on closed-class words and a few common open-class words outperforms a state-of-the-art cross-topic model on distant target topics.

cross-topic argument mining argument mining cross-topic argument تعدين الوسائط عبر الموضوع حجة التعدين حجة موضوعية صناعة حمض الفوسفور المزيد..

Apples to Apples: A Systematic Evaluation of Topic Models

821 - Association for Computation Linguistics 2021 مقالة

From statistical to neural models, a wide variety of topic modelling algorithms have been proposed in the literature. However, because of the diversity of datasets and metrics, there have not been many efforts to systematically compare their performa nce on the same benchmarks and under the same conditions. In this paper, we present a selection of 9 topic modelling techniques from the state of the art reflecting a diversity of approaches to the task, an overview of the different metrics used to compare their performance, and the challenges of conducting such a comparison. We empirically evaluate the performance of these models on different settings reflecting a variety of real-life conditions in terms of dataset size, number of topics, and distribution of topics, following identical preprocessing and evaluation processes. Using both metrics that rely on the intrinsic characteristics of the dataset (different coherence metrics), as well as external knowledge (word embeddings and ground-truth topic labels), our experiments reveal several shortcomings regarding the common practices in topic models evaluation.

systematic evaluation topic models evaluation apples to apples التقييم المنهجي تقييم نماذج الموضوع التفاح للتفاح صناعة حمض الفوسفور المزيد..

TeMoTopic: Temporal Mosaic Visualisation of Topic Distribution, Keywords, and Context

975 - Association for Computation Linguistics 2021 مقالة

In this paper we present TeMoTopic, a visualization component for temporal exploration of topics in text corpora. TeMoTopic uses the temporal mosaic metaphor to present topics as a timeline of stacked bars along with related keywords for each topic. The visualization serves as an overview of the temporal distribution of topics, along with the keyword contents of the topics, which collectively support detail-on-demand interactions with the source text of the corpora. Through these interactions and the use of keyword highlighting, the content related to each topic and its change over time can be explored.

temporal mosaic visualisation mosaic visualisation temporal mosaic التصور الفسيفساء الزمني تصور الفسيفساء الفسيفساء الزمنية صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Topic Detection and Tracking

اكتشاف الموضوع وتتبعه

Ask ChatGPT about the research

No English abstract

Read More

suggested questions