تصف هذه الورقة نظامنا لمهمة SIGMORPHON 2021 المشتركة على تجميع النموذج المورفولوجي غير المقترح، والذي يطلب من المشاركين أن يثبتوا نماذج تم تأصيلهم معا وفقا ليمما الأساسي دون مساعدة من بيانات التدريب المشروح.نحن نوظف تجميع التجمعات التجمعات إلى مجموعة الكلمة معا باستخدام مقياس تقيي يجمع بين مسافة إلكترونية ومن المسافة الدلالية من Word Ageddings.نقوم بتجربة اثنين من الاختلافات في تحرير النموذج المستند إليه عن بعد لقياس المسافة الإلكترونية، ولكن نظرا لقيود الوقت، لا يتحسن نظامنا عبر النظام الأساسي للمهمة المشتركة.
This paper describes our system for the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering, which asks participants to group inflected forms together according their underlying lemma without the aid of annotated training data. We employ agglomerative clustering to group word forms together using a metric that combines an orthographic distance and a semantic distance from word embeddings. We experiment with two variations of an edit distance-based model for quantifying orthographic distance, but, due to time constraints, our system does not improve over the shared task's baseline system.
References used
https://aclanthology.org/
This work describes the Edinburgh submission to the SIGMORPHON 2021 Shared Task 2 on unsupervised morphological paradigm clustering. Given raw text input, the task was to assign each token to a cluster with other tokens from the same paradigm. We use
We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms. To this end, we re
This paper describes the submission of the CU-UBC team for the SIGMORPHON 2021 Shared Task 2: Unsupervised morphological paradigm clustering. Our system generates paradigms using morphological transformation rules which are discovered from raw data.
With advances in neural language models, the focus of linguistic steganography has shifted from edit-based approaches to generation-based ones. While the latter's payload capacity is impressive, generating genuine-looking texts remains challenging. I
Abstract We introduce an Edit-Based TransfOrmer with Repositioning (EDITOR), which makes sequence generation flexible by seamlessly allowing users to specify preferences in output lexical choice. Building on recent models for non-autoregressive seque