تقدير أنظمة التشابه الدلالي النصي (STS) درجة تشابه معنى بين جملتين.تقدر أنظمة STS عبر اللغات درجة تشابه معنى بين جملتين، كل منها بلغة مختلفة.عادة ما تستخدم الخوارزميات الحديثة عادة نهجا بالغضب بشدة، يصعب استخدامه لغات ضعف الموارد.ومع ذلك، يحتاج أي نهج للحصول على بيانات التقييم لتأكيد النتائج.من أجل تبسيط عملية التقييم لغات ضعف الموارد (من حيث مجموعات بيانات تقييم STS)، نقدم مجموعات بيانات جديدة ل STS عبر اللغات والأحمر غير المباشر لغات دون بيانات التقييم هذه.نقدم أيضا نتائج العديد من الطرق الحديثة على هذه البيانات التي يمكن استخدامها كأساس للحصول على مزيد من البحث.نعتقد أن هذه المقالة لن تمد فقط أبحاث STS الحالية فقط إلى لغات أخرى، ولكنها ستشجع أيضا المنافسة على هذه بيانات التقييم الجديدة.
Semantic textual similarity (STS) systems estimate the degree of the meaning similarity between two sentences. Cross-lingual STS systems estimate the degree of the meaning similarity between two sentences, each in a different language. State-of-the-art algorithms usually employ a strongly supervised, resource-rich approach difficult to use for poorly-resourced languages. However, any approach needs to have evaluation data to confirm the results. In order to simplify the evaluation process for poorly-resourced languages (in terms of STS evaluation datasets), we present new datasets for cross-lingual and monolingual STS for languages without this evaluation data. We also present the results of several state-of-the-art methods on these data which can be used as a baseline for further research. We believe that this article will not only extend the current STS research to other languages, but will also encourage competition on this new evaluation data.
References used
https://aclanthology.org/
ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation
We propose a method to distill a language-agnostic meaning embedding from a multilingual sentence encoder. By removing language-specific information from the original embedding, we retrieve an embedding that fully represents the sentence's meaning. T
While cross-lingual techniques are finding increasing success in a wide range of Natural Language Processing tasks, their application to Semantic Role Labeling (SRL) has been strongly limited by the fact that each language adopts its own linguistic f
Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languag
Text Similarity is an important task in several application fields, such as information retrieval, plagiarism detection, machine translation, topic detection, text classification, text summarization and others. Finding similarity between two texts, p