توضح هذه الورقة مشروع Glaux (اللغة اليونانية الآلية ")، بذل جهد مستمر لتطوير كورب غاميرية طويلة الأجل من اليونانية، تغطي ستة عشر قرنا من المواد الأدبية وغير الأدبية المشروح مع طرق NLP.بعد تقديم نظرة عامة على مشاريع Corpus ذات الصلة ومناقشة الهندسة المعمارية العامة للأجنحة، فإنها تكبير عدد من القضايا المنهجية الأكبر في تصميم الأورام التاريخية.وتشمل هذه ترميز المتغيرات النصية، من خلال التعامل مع الاختلاف المبرم والتخلي عن الغموض اللغوي.وأخيرا، تتم مناقشة المنظورات طويلة الأجل لهذا المشروع.
This paper describes the GLAUx project (the Greek Language Automated''), an ongoing effort to develop a large long-term diachronic corpus of Greek, covering sixteen centuries of literary and non-literary material annotated with NLP methods. After providing an overview of related corpus projects and discussing the general architecture of the corpus, it zooms in on a number of larger methodological issues in the design of historical corpora. These include the encoding of textual variants, handling extralinguistic variation and annotating linguistic ambiguity. Finally, the long- and short-term perspectives of this project are discussed.
References used
https://aclanthology.org/
Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of human parity'', since examining the quality at the level of the document rather than at the sentenc
Style transfer has been widely explored in natural language generation with non-parallel corpus by directly or indirectly extracting a notion of style from source and target domain corpus. A common shortcoming of existing approaches is the prerequisi
This paper presents a data set of German fairy tales, manually annotated with character networks which were obtained with high inter rater agreement. The release of this corpus provides an opportunity of training and comparing different algorithms fo
Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages. Most of these models feature an important corpus sampling step in the process of accumulating training data in different languages, to en
In this thesis proposal, we explore the application of event extraction to literary texts. Considering the lengths of literary documents modeling events in different granularities may be more adequate to extract meaningful information, as individual