تحديد القروض المعجمية، ونقل الكلمات بين اللغات، هي ممارسة أساسية لللغويات التاريخية وأداة حيوية في تحليل اتصال اللغة والأحداث الثقافية بشكل عام.نسعى لتحسين الأدوات للكشف التلقائي للقروض المعجمية، مع التركيز هنا على الكشف عن الكلمات المقترضة من نصوص الكلمات أحادية الأحادية.بدءا من نموذج اللغة المعجمية العصبية المتكررة ونهج انتروبيات المنافسة، فإننا ندمج نموذجا أكثر قائما على المحولات القائمة على المحولات.من هناك، نقوم بتجربة العديد من النماذج والنهج المختلفة بما في ذلك نموذج الجهات المانحة المعجمية مع قائمة الكلمات المعززة.يقلل نموذج المحول وقت التنفيذ ويحسن الحد الأدنى للكشف عن الاقتراض.نموذج المانحين المعزز يظهر بعض الوعد.هناك حاجة إلى تغيير موضوعي في النهج أو النموذج لإجراء مكاسب كبيرة في تحديد القروض المعجمية.
Identification of lexical borrowings, transfer of words between languages, is an essential practice of historical linguistics and a vital tool in analysis of language contact and cultural events in general. We seek to improve tools for automatic detection of lexical borrowings, focusing here on detecting borrowed words from monolingual wordlists. Starting with a recurrent neural lexical language model and competing entropies approach, we incorporate a more current Transformer based lexical model. From there we experiment with several different models and approaches including a lexical donor model with augmented wordlist. The Transformer model reduces execution time and minimally improves borrowing detection. The augmented donor model shows some promise. A substantive change in approach or model is needed to make significant gains in identification of lexical borrowings.
References used
https://aclanthology.org/
We present new results for the problem of sequence metaphor labeling, using the recently developed Visibility Embeddings. We show that concatenating such embeddings to the input of a BiLSTM obtains consistent and significant improvements at almost no cost, and we present further improved results when visibility embeddings are combined with BERT.
We study the problem of performing automatic stance classification on social media with neural architectures such as BERT. Although these architectures deliver impressive results, their level is not yet comparable to the one of humans and they might
Neural dialog models are known to suffer from problems such as generating unsafe and inconsistent responses. Even though these problems are crucial and prevalent, they are mostly manually identified by model designers through interactions. Recently,
The development of automated approaches to linguistic acceptability has been greatly fostered by the availability of the English CoLA corpus, which has also been included in the widely used GLUE benchmark. However, this kind of research for languages
Most work in NLP makes the assumption that it is desirable to develop solutions in the native language in question. There is consequently a strong trend towards building native language models even for low-resource languages. This paper questions thi