Research papers, master and doctoral theses about bootstraping متعددة اللغات

Bootstrapping Multilingual Metadata Extraction: A Showcase in Cyrillic

109 - Association for Computation Linguistics 2021 مقالة

Applications based on scholarly data are of ever increasing importance. This results in disadvantages for areas where high-quality data and compatible systems are not available, such as non-English publications. To advance the mitigation of this imba lance, we use Cyrillic script publications from the CORE collection to create a high-quality data set for metadata extraction. We utilize our data for training and evaluating sequence labeling models to extract title and author information. Retraining GROBID on our data, we observe significant improvements in terms of precision and recall and achieve even better results with a self developed model. We make our data set covering over 15,000 publications as well as our source code freely available.

bootstrapping multilingual metadata multilingual metadata extraction bootstrapping multilingual bootstraping البيانات الوصفية متعددة اللغات استخراج البيانات الوصفية متعددة اللغات bootstraping متعددة اللغات صناعة حمض الفوسفور المزيد..

The Effect of Pretraining on Extractive Summarization for Scientific Documents

423 - Association for Computation Linguistics 2021 مقالة

Large pretrained models have seen enormous success in extractive summarization tasks. In this work, we investigate the influence of pretraining on a BERT-based extractive summarization system for scientific documents. We derive significant performanc e improvements using an intermediate pretraining step that leverages existing summarization datasets and report state-of-the-art results on a recently released scientific summarization dataset, SciTLDR. We systematically analyze the intermediate pretraining step by varying the size and domain of the pretraining corpus, changing the length of the input sequence in the target task and varying target tasks. We also investigate how intermediate pretraining interacts with contextualized word embeddings trained on different domains.

bootstraping متعددة اللغات extractive summarization tasks مهام تلخيص الاستخراجية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد