الملخص نتخذ خطوة نحو معالجة تمثيل القارة الأفريقية في أبحاث NLP من خلال جلب مختلف أصحاب المصلحة من أصحاب المصلحة في إنشاء بيانات كبيرة متاحة للجمهور وعالية الجودة للتعرف على الكيان المسمى (NER) في عشرة لغات أفريقية.إننا نقوم بالتفصيل خصائص هذه اللغات لمساعدة الباحثين والممارسين على فهم التحديات التي يفرضونها على مهام NER.نقوم بتحليل مجموعات البيانات لدينا وإجراء تقييم تجريبي واسع النطاق للطرق الحكومية في جميع إعدادات التعلم الإشراف والنقل.أخيرا، نطلق سراح البيانات والرمز والنماذج لإلهام البحوث المستقبلية على الأفريقية NLP.1
Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
References used
https://aclanthology.org/
Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limite
We explore the application of state-of-the-art NER algorithms to ASR-generated call center transcripts. Previous work in this domain focused on the use of a BiLSTM-CRF model which relied on Flair embeddings; however, such a model is unwieldy in terms
The use of Named Entity Recognition (NER) over archaic Arabic texts is steadily increasing. However, most tools have been either developed for modern English or trained over English language documents and are limited over historical Arabic text. Even
To address a looming crisis of unreproducible evaluation for named entity recognition, we propose guidelines and introduce SeqScore, a software package to improve reproducibility. The guidelines we propose are extremely simple and center around trans
Named Entity Recognition is an essential task in natural language processing to detect entities and classify them into predetermined categories. An entity is a meaningful word, or phrase that refers to proper nouns. Named Entities play an important r