New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Survey and reproduction of computational approaches to dating of historical texts

مسح واستنساخ النهج الحسابية التي يرجع تاريخها إلى النصوص التاريخية

495 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

survey and reproduction reproduction of computational مسح والاستنساخ الاستنساخ الحاسوبية تاريخي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Finding the year of writing for a historical text is of crucial importance to historical research. However, the year of original creation is rarely explicitly stated and must be inferred from the text content, historical records, and codicological clues. Given a transcribed text, machine learning has successfully been used to estimate the year of production. In this paper, we present an overview of several estimation approaches for historical text archives spanning from the 12th century until today.

References used

https://aclanthology.org/

rate research

Period Classification in Chinese Historical Texts

349 - Association for Computation Linguistics 2021 مقالة

In this study, we study language change in Chinese Biji by using a classification task: classifying Ancient Chinese texts by time periods. Specifically, we focus on a unique genre in classical Chinese literature: Biji (literally notebook'' or brush n otes''), i.e., collections of anecdotes, quotations, etc., anything authors consider noteworthy, Biji span hundreds of years across many dynasties and conserve informal language in written form. For these reasons, they are regarded as a good resource for investigating language change in Chinese (Fang, 2010). In this paper, we create a new dataset of 108 Biji across four dynasties. Based on the dataset, we first introduce a time period classification task for Chinese. Then we investigate different feature representation methods for classification. The results show that models using contextualized embeddings perform best. An analysis of the top features chosen by the word n-gram model (after bleaching proper nouns) confirms that these features are informative and correspond to observations and assumptions made by historical linguists.

ancient chinese texts chinese historical texts classifying ancient chinese النصوص الصينية القديمة النصوص التاريخية الصينية تصنيف الصينيين القديم صناعة حمض الفوسفور المزيد..

Batavia asked for advice. Pretrained language models for Named Entity Recognition in historical texts.

502 - Association for Computation Linguistics 2021 مقالة

Pretrained language models like BERT have advanced the state of the art for many NLP tasks. For resource-rich languages, one has the choice between a number of language-specific models, while multilingual models are also worth considering. These mode ls are well known for their crosslingual performance, but have also shown competitive in-language performance on some tasks. We consider monolingual and multilingual models from the perspective of historical texts, and in particular for texts enriched with editorial notes: how do language models deal with the historical and editorial content in these texts? We present a new Named Entity Recognition dataset for Dutch based on 17th and 18th century United East India Company (VOC) reports extended with modern editorial notes. Our experiments with multilingual and Dutch pretrained language models confirm the crosslingual abilities of multilingual models while showing that all language models can leverage mixed-variant data. In particular, language models successfully incorporate notes for the prediction of entities in historical texts. We also find that multilingual models outperform monolingual models on our data, but that this superiority is linked to the task at hand: multilingual models lose their advantage when confronted with more semantical tasks.

شمي صناعة حمض الفوسفور

Dynamic Ensembles in Named Entity Recognition for Historical Arabic Texts

346 - Association for Computation Linguistics 2021 مقالة

The use of Named Entity Recognition (NER) over archaic Arabic texts is steadily increasing. However, most tools have been either developed for modern English or trained over English language documents and are limited over historical Arabic text. Even Arabic NER tools are often trained on modern web-sourced text, making their fit for a historical task questionable. To mitigate historic Arabic NER resource scarcity, we propose a dynamic ensemble model utilizing several learners. The dynamic aspect is achieved by utilizing predictors and features over NER algorithm results that identify which have performed better on a specific task in real-time. We evaluate our approach against state-of-the-art Arabic NER and static ensemble methods over a novel historical Arabic NER task we have created. Our results show that our approach improves upon the state-of-the-art and reaches a 0.8 F-score on this challenging task.

عربي قياسي صناعة حمض الفوسفور

Comparing Approaches to Dravidian Language Identification

673 - Association for Computation Linguistics 2021 مقالة

This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set includes 16,674 YouTube comments written in Roman script containing code-mixed text w ith English and one of the three South Dravidian languages: Kannada, Malayalam, and Tamil. We submitted results generated using two models, a Naive Bayes classifier with adaptive language models, which has shown to obtain competitive performance in many language and dialect identification tasks, and a transformer-based model which is widely regarded as the state-of-the-art in a number of NLP tasks. Our first submission was sent in the closed submission track using only the training set provided by the shared task organisers, whereas the second submission is considered to be open as it used a pretrained model trained with external data. Our team attained shared second position in the shared task with the submission based on Naive Bayes. Our results reinforce the idea that deep learning methods are not as competitive in language identification related tasks as they are in many other text classification tasks.

الرومانية بيرت south dravidian languages comparing approaches جنوب درافيدي اللغات مقارنة الأساليب صناعة حمض الفوسفور

A Survey of Approaches to Automatic Question Generation:from 2019 to Early 2021

203 - Association for Computation Linguistics 2021 مقالة

To provide analysis of recent researches of automatic question generation from text,we surveyed 9 papers between 2019 to early 2021, retrieved from Paper with Code(PwC). Our research follows the survey reported by Kurdi et al.(2020), in which analysi s of 93 papers from 2014 to early2019 are provided. We analyzed the 9papers from aspects including: (1) purpose of question generation, (2) generation method, and (3) evaluation. We found that recent approaches tend to rely on semantic information and Transformer-based models are attracting increasing interest since they are more efficient. On the other hand,since there isn't any widely acknowledged automatic evaluation metric designed for question generation, researchers adopt metrics of other natural language processing tasks to compare different systems.

automatic question generation question generation automatic question جيل السؤال التلقائي جيل السؤال السؤال التلقائي صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Survey and reproduction of computational approaches to dating of historical texts

مسح واستنساخ النهج الحسابية التي يرجع تاريخها إلى النصوص التاريخية

Ask ChatGPT about the research

Read More

suggested questions