نظام استرجاع معلومات للغة العربية

نشر في جامعة تشرين بتاريخ 2014 في مجال والبحث باللغة العربية تحميل البحث

الملخص بالعربية

تحتل الدراسات التي تتناول حوسبة اللغة العربية أهمية كبيرة نظراً للانتشار الواسع للغة العربية , و اخترنا في هذه الدراسة العمل على معالجة اللغة العربية من خلال نظام استرجاع معلومات للمستندات باللغة العربية , الفكرة الأساسية لهذا النظام هو تحليل المستندات والنصوص العربية و إنشاء فهارس للمصطلحات الواردة فيها , ومن ثم استخلاص أشعة أوزان تعبر عن هذه المستندات من أجل المعالجة اللاحقة للاستعلام و المقارنة مع هذه الأشعة للحصول على المستندات الموافقة لهذا الاستعلام . من خلال عملية تجريد للمصطلحات الواردة في المستندات تم الحصول على كفاءة استرجاع أفضل , و تعرضنا للعديد من خوارزميات التجريد التي وصلت إليها الدراسات السابقة . و تأتي عملية عنقدة المستندات كإضافة هامة , حيث يتمكن المستخدم من معرفة المستندات المشابهة لنتيجة البحث و التي لها صلة بـالاستعلام المدخل . في التطبيق العملي , تم العمل على نظام استرجاع معلومات مكتبي , يقوم بقراءة نصوص ذات أنواع مختلفة و عرض النتائج مع العناقيد الموافقة لها .

المراجع المستخدمة

Castillo , Carlos. Effective Web Crawling . Thesis. Dept. of Computer Science – University of Chile, 2004

Lin, E.A.-S.a.J., A new Arabic stemming algorithm. In Proceedings of the 2008 ISCA Workshop on Experimental Linguistics, 2008

Porter, M.F., An algorithm for suffix stripping, in Readings in information retrieval, J. Karen Sparck and W. Peter, Editors. 1997, Morgan Kaufmann Publishers Inc. p. 313-316

Unine, Stop Word List, 2012

Al-Shammari, E.T. Improving Arabic document categorization: Introducing local stem. in Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on. 2010

Larkey, L.S., L. Ballesteros, and M.E. Connell, Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis, in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval2002, ACM: Tampere, Finland. p. 275-282

Khoja, S. Khoja Stemmer. 2012 [cited 2012 Mar 2012]; Available from: http//:zeus.cs.pacificu.edu/shereen/research.htm

Larkey, Leah S., Lisa Ballesteros, and Margaret E. Connell. "Light stemming for Arabic information retrieval." Arabic computational morphology. Springer Netherlands, 2007. 221-243

Luhn, H. P. (1957). A statistical approach to the mechanized encoding and starching of literary information. IBM Journal of Research and Development, 1 (4). pp 309-317

Warner, Amy J.; Ann Arbor and Aspen H. Wenzel (1991). A linguistic analysis and categorisation of nominal expressions. ASIS'9 I, pp. 186-191

Harter, Stephen P. ( 1986) . Online information retrieval: Concepts, principles, and techniques. Orlando: Academic Press INC

Salton, G. and M. McGill (1983). Introduction to Modern Information Retrieval. McGraw-Hill

Salton, G. and C. Yang (1973). On the specication of term values in automatic indexing. Jounral of Documentation 29 (4), 351-372

Maron, M. and J. Kuhns (1960). On relevance, probabilistic indexing and information retrieval. Journal of the Association for Computing Machin- ery 7, 216-244

Ponte, J., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st ACM SIGIR Annual International Conference on Research and Development in Information Retrieval (pp. 275-281). Melbourne, Australia

تحميل البحث