New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Paradigm Clustering with Weighted Edit Distance

تجميع النموذج مع مسافة تحرير المرجح

412 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper describes our system for the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering, which asks participants to group inflected forms together according their underlying lemma without the aid of annotated training data. We employ agglomerative clustering to group word forms together using a metric that combines an orthographic distance and a semantic distance from word embeddings. We experiment with two variations of an edit distance-based model for quantifying orthographic distance, but, due to time constraints, our system does not improve over the shared task's baseline system.

References used

https://aclanthology.org/

rate research

Adaptor Grammars for Unsupervised Paradigm Clustering

209 - Association for Computation Linguistics 2021 مقالة

This work describes the Edinburgh submission to the SIGMORPHON 2021 Shared Task 2 on unsupervised morphological paradigm clustering. Given raw text input, the task was to assign each token to a cluster with other tokens from the same paradigm. We use Adaptor Grammar segmentations combined with frequency-based heuristics to predict paradigm clusters. Our system achieved the highest average F1 score across 9 test languages, placing first out of 15 submissions.

unsupervised paradigm clustering paradigm clustering تجميع النموذج غير المنضح تجميع النموذج صناعة حمض الفوسفور

Findings of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering

310 - Association for Computation Linguistics 2021 مقالة

We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms. To this end, we re lease corpora for 5 development and 9 test languages, as well as gold partial paradigms for evaluation. We receive 14 submissions from 4 teams that follow different strategies, and the best performing system is based on adaptor grammars. Results vary significantly across languages. However, all systems are outperformed by a supervised lemmatizer, implying that there is still room for improvement.

unsupervised morphological paradigm morphological paradigm clustering sigmorphon shared task النموذج المورفولوجي غير المدخري تجميع النماذج المورفولوجية Sigmorphon المهمة المشتركة صناعة حمض الفوسفور المزيد..

Unsupervised Paradigm Clustering Using Transformation Rules

231 - Association for Computation Linguistics 2021 مقالة

This paper describes the submission of the CU-UBC team for the SIGMORPHON 2021 Shared Task 2: Unsupervised morphological paradigm clustering. Our system generates paradigms using morphological transformation rules which are discovered from raw data. We experiment with two methods for discovering rules. Our first approach generates prefix and suffix transformations between similar strings. Secondly, we experiment with more general rules which can apply transformations inside the input strings in addition to prefix and suffix transformations. We find that the best overall performance is delivered by prefix and suffix rules but more general transformation rules perform better for languages with templatic morphology and very high morpheme-to-word ratios.

نموذج مورفولوجي صناعة حمض الفوسفور

Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model

286 - Association for Computation Linguistics 2021 مقالة

With advances in neural language models, the focus of linguistic steganography has shifted from edit-based approaches to generation-based ones. While the latter's payload capacity is impressive, generating genuine-looking texts remains challenging. I n this paper, we revisit edit-based linguistic steganography, with the idea that a masked language model offers an off-the-shelf solution. The proposed method eliminates painstaking rule construction and has a high payload capacity for an edit-based model. It is also shown to be more secure against automatic detection than a generation-based method while offering better control of the security/payload capacity trade-off.

frustratingly easy edit-based easy edit-based linguistic frustratingly easy محبط سهل التحرير سهل التحرير اللغوي من السهل المحبط صناعة حمض الفوسفور المزيد..

EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints

301 - Association for Computation Linguistics 2021 مقالة

Abstract We introduce an Edit-Based TransfOrmer with Repositioning (EDITOR), which makes sequence generation flexible by seamlessly allowing users to specify preferences in output lexical choice. Building on recent models for non-autoregressive seque nce generation (Gu et al., 2019), EDITOR generates new sequences by iteratively editing hypotheses. It relies on a novel reposition operation designed to disentangle lexical choice from word positioning decisions, while enabling efficient oracles for imitation learning and parallel edits at decoding time. Empirically, EDITOR uses soft lexical constraints more effectively than the Levenshtein Transformer (Gu et al., 2019) while speeding up decoding dramatically compared to constrained beam search (Post and Vilar, 2018). EDITOR also achieves comparable or better translation quality with faster decoding speed than the Levenshtein Transformer on standard Romanian-English, English-German, and English-Japanese machine translation tasks.

الإسناد الاجتماعي repositioning for neural soft lexical constraints إعادة وضع العصبي القيود المعجمية الناعمة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Paradigm Clustering with Weighted Edit Distance

تجميع النموذج مع مسافة تحرير المرجح

Ask ChatGPT about the research

Read More

suggested questions