New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek

دراسة تجريبية لنمذجة لغة بيرت وتحليل مورفولوجي لليونانية القديمة والقرون الوسطى

333 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

bert language modelling byzantine greek pilot study نمذجة لغة بيرت البيزنطية اليونانية دراسة الطيار صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper presents a pilot study to automatic linguistic preprocessing of Ancient and Byzantine Greek, and morphological analysis more specifically. To this end, a novel subword-based BERT language model was trained on the basis of a varied corpus of Modern, Ancient and Post-classical Greek texts. Consequently, the obtained BERT embeddings were incorporated to train a fine-grained Part-of-Speech tagger for Ancient and Byzantine Greek. In addition, a corpus of Greek Epigrams was manually annotated and the resulting gold standard was used to evaluate the performance of the morphological analyser on Byzantine Greek. The experimental results show very good perplexity scores (4.9) for the BERT language model and state-of-the-art performance for the fine-grained Part-of-Speech tagger for in-domain data (treebanks containing a mixture of Classical and Medieval Greek), as well as for the newly created Byzantine Greek gold standard data set. The language models and associated code are made available for use at https://github.com/pranaydeeps/Ancient-Greek-BERT

References used

https://aclanthology.org/

rate research

Experimental Study For Improving Mechanical Properties Of Limestone Mortar Of Ancient Masonry

1637 - Aِl-Baath University 2016 ورقة بحثية

This research study the impact of additions to improve limestone mortar properties to become comparable to hydraulic lime which is used in other countries, by using additions relative to that were used in the old mortar, also used modern additions like fiber glass to limit unwanted contraction in the use of limestone mortar, , especially in arid environments.

cracks التشققات الألياف الزجاجية المونة الكلسية الاضافات الهيدروليكية mortar limestone additions fiber glass المزيد..

ELERRANT: Automatic Grammatical Error Type Classification for Greek

458 - Association for Computation Linguistics 2021 مقالة

In this paper, we introduce the Greek version of the automatic annotation tool ERRANT (Bryant et al., 2017), which we named ELERRANT. ERRANT functions as a rule-based error type classifier and was used as the main evaluation tool of the systems parti cipating in the BEA-2019 (Bryant et al., 2019) shared task. Here, we discuss grammatical and morphological differences between English and Greek and how these differences affected the development of ELERRANT. We also introduce the first Greek Native Corpus (GNC) and the Greek WikiEdits Corpus (GWE), two new evaluation datasets with errors from native Greek learners and Wikipedia Talk Pages edits respectively. These two datasets are used for the evaluation of ELERRANT. This paper is a sole fragment of a bigger picture which illustrates the attempt to solve the problem of low-resource languages in NLP, in our case Greek.

error type classification type classification automatic grammatical error خطأ نوع التصنيف اكتب تصنيف الخطأ النحوي التلقائي صناعة حمض الفوسفور المزيد..

EstBERT: A Pretrained Language-Specific BERT for Estonian

157 - Association for Computation Linguistics 2021 مقالة

This paper presents EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian. Recent work has evaluated multilingual BERT models on Estonian tasks and found them to outperform the baselines. Still, based on existing stu dies on other languages, a language-specific BERT model is expected to improve over the multilingual ones. We first describe the EstBERT pretraining process and then present the models' results based on the finetuned EstBERT for multiple NLP tasks, including POS and morphological tagging, dependency parsing, named entity recognition and text classification. The evaluation results show that the models based on EstBERT outperform multilingual BERT models on five tasks out of seven, providing further evidence towards a view that training language-specific BERT models are still useful, even when multilingual models are available.

language-specific bert pretrained language-specific bert language-specific bert model بيرت محددة باللغة برت لغة محددة مسبقا لغة بيرت الخاصة باللغة صناعة حمض الفوسفور المزيد..

Does BERT Understand Idioms? A Probing-Based Empirical Study of BERT Encodings of Idioms

325 - Association for Computation Linguistics 2021 مقالة

Understanding idioms is important in NLP. In this paper, we study to what extent pre-trained BERT model can encode the meaning of a potentially idiomatic expression (PIE) in a certain context. We make use of a few existing datasets and perform two pr obing tasks: PIE usage classification and idiom paraphrase identification. Our experiment results suggest that BERT indeed can separate the literal and idiomatic usages of a PIE with high accuracy. It is also able to encode the idiomatic meaning of a PIE to some extent.

bert understand idioms bert understand understand idioms بيرت فهم التعابير بيرت تفهم فهم التعابير صناعة حمض الفوسفور المزيد..

Developing a Clinical Language Model for Swedish: Continued Pretraining of Generic BERT with In-Domain Data

379 - Association for Computation Linguistics 2021 مقالة

The use of pretrained language models, fine-tuned to perform a specific downstream task, has become widespread in NLP. Using a generic language model in specialized domains may, however, be sub-optimal due to differences in language use and vocabular y. In this paper, it is investigated whether an existing, generic language model for Swedish can be improved for the clinical domain through continued pretraining with clinical text. The generic and domain-specific language models are fine-tuned and evaluated on three representative clinical NLP tasks: (i) identifying protected health information, (ii) assigning ICD-10 diagnosis codes to discharge summaries, and (iii) sentence-level uncertainty prediction. The results show that continued pretraining on in-domain data leads to improved performance on all three downstream tasks, indicating that there is a potential added value of domain-specific language models for clinical NLP.

generic language model generic bert language model نموذج اللغة العامة بيرت عام نموذج اللغة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek

دراسة تجريبية لنمذجة لغة بيرت وتحليل مورفولوجي لليونانية القديمة والقرون الوسطى

Ask ChatGPT about the research

Read More

suggested questions