يصف هذا التقرير تقييم الدورة التدريبية لأنظمة NLP، التي تم تدريسها لطلاب البكالوريوس البالغة في فصل الشتاء 20/21 في جامعة بوتسدام، ألمانيا.كانت ندوة قائمة على المناقشة التي تغطي جوانب مختلفة من التقييم في NLP، وهي النماذج، والإجراءات المشتركة، والتصفية بالبيانات، ومقاييس وقياسات، واختبار الأهمية الإحصائية، وأفضل الممارسات والنهج المشتركة في مهام وتطبيقات NLP محددة.
This report describes the course Evaluation of NLP Systems, taught for Computational Linguistics undergraduate students during the winter semester 20/21 at the University of Potsdam, Germany. It was a discussion-based seminar that covered different aspects of evaluation in NLP, namely paradigms, common procedures, data annotation, metrics and measurements, statistical significance testing, best practices and common approaches in specific NLP tasks and applications.
References used
https://aclanthology.org/
Although Natural Language Processing is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less
We outline the Great Misalignment Problem in natural language processing research, this means simply that the problem definition is not in line with the method proposed and the human evaluation is not in line with the definition nor the method. We st
HCI and NLP traditionally focus on different evaluation methods. While HCI involves a small number of people directly and deeply, NLP traditionally relies on standardized benchmark evaluations that involve a larger number of people indirectly. We pre
Despite state-of-the-art performance, NLP systems can be fragile in real-world situations. This is often due to insufficient understanding of the capabilities and limitations of models and the heavy reliance on standard evaluation benchmarks. Researc
SemEval is the primary venue in the NLP community for the proposal of new challenges and for the systematic empirical evaluation of NLP systems. This paper provides a systematic quantitative analysis of SemEval aiming to evidence the patterns of the