تعد استعادة الترقيم متطلبات أساسية لقراءة النص المستمدة من أنظمة التعرف على الكلام التلقائي (ASR). تقتصر معظم الحلول المعاصرة على التنبؤ ببعض العلامات التي تحدث بشكل متكرر، مثل الفترات والفواصل وعلامات الاستفهام - وفقط واحد لكل كلمة. ومع ذلك، في لغة مكتوبة، نتعامل مع عدد أكبر بكثير من أحرف علامات الترقيم (مثل الأقواس الواصلية، وما إلى ذلك)، ومجموعاتها (مثل الأقواس متبوعة ب DOT). لا يمكن دائما تقليل علامات الترقيم هذه بشكل لا لبس فيه إلى مجموعة أساسية من العلامات الأكثر تدويرا. في هذا العمل، نقيم عدة طرق في مهمة إعادة إعمار علامات الترقيم الشاملة. نحن نقوم بإجراء تجارب على الفورما المتوازي لغغتين مختلفتين، والإنجليزية والبولندية - اللغات مع التشكل البسيط والمعقد نسبيا، على التوالي. نحن نحقق أيضا في تأثير بناء نموذج على علامات ترقيم شاملة حول جودة مهام ترقيم الترقيم الأساسية
Punctuation restoration is a fundamental requirement for the readability of text derived from Automatic Speech Recognition (ASR) systems. Most contemporary solutions are limited to predicting only a few of the most frequently occurring marks, such as periods, commas, and question marks - and only one per word. However, in written language, we deal with a much larger number of punctuation characters (such as parentheses, hyphens, etc.), and their combinations (like parenthesis followed by a dot). Such comprehensive punctuation cannot always be unambiguously reduced to a basic set of the most frequently occurring marks. In this work, we evaluate several methods in the comprehensive punctuation reconstruction task. We conduct experiments on parallel corpora of two different languages, English and Polish - languages with a relatively simple and complex morphology, respectively. We also investigate the influence of building a model on comprehensive punctuation on the quality of the basic punctuation restoration task
References used
https://aclanthology.org/
We propose a novel method of homonymy-polysemy discrimination for three Indo-European Languages (English, Spanish and Polish). Support vector machines and LASSO logistic regression were successfully used in this task, outperforming baselines. The fea
In this paper we describe our submissions to WAT-2021 (Nakazawa et al., 2021) for English-to-Myanmar language (Burmese) task. Our team, ID: YCC-MT1'', focused on bringing transliteration knowledge to the decoder without changing the model. We manuall
Being an integral urban, social, economical and cultural part
of its development plans; states and its administrations
competes in the design, planning and implementation of
sustainable tourism development. Given the importance and
the need to de
Nowadays social-psychological variables , like attitudes and motivation, gender, aptitude, etc. have been established as influential factors in the process of learning a foreign language . Therefore, this research aims at measuring the attitudes of f
With the increasing use of technologies and automation in different sides of modern life, the outage of
electricity became a big issue that widely affects the daily life of most sectors like industrial, economical or
even entertaining sector. So it