Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity

تشفير المدينات السياقية المعرفة غير المباشرة والتبلغة من الاصطلاحية

548 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

تحديد التعابير embeddings encode monolingual knowledge of idiomaticity Embeddings تشفير أحادية معرفة الاصطلاحية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Potentially idiomatic expressions (PIEs) are ambiguous between non-compositional idiomatic interpretations and transparent literal interpretations. For example, hit the road'' can have an idiomatic meaning corresponding to start a journey' or have a literal interpretation. In this paper we propose a supervised model based on contextualized embeddings for predicting whether usages of PIEs are idiomatic or literal. We consider monolingual experiments for English and Russian, and show that the proposed model outperforms previous approaches, including in the case that the model is tested on instances of PIE types that were not observed during training. We then consider cross-lingual experiments in which the model is trained on PIE instances in one language, English or Russian, and tested on the other language. We find that the model outperforms baselines in this setting. These findings suggest that contextualized embeddings are able to learn representations that encode knowledge of idiomaticity that is not restricted to specific expressions, nor to a specific language.

References used

https://aclanthology.org/

rate research

On the Cross-lingual Transferability of Contextualized Sense Embeddings

553 - Association for Computation Linguistics 2021 مقالة

In this paper we analyze the extent to which contextualized sense embeddings, i.e., sense embeddings that are computed based on contextualized word embeddings, are transferable across languages.To this end, we compiled a unified cross-lingual benchma rk for Word Sense Disambiguation. We then propose two simple strategies to transfer sense-specific knowledge across languages and test them on the benchmark.Experimental results show that this contextualized knowledge can be effectively transferred to similar languages through pre-trained multilingual language models, to the extent that they can out-perform monolingual representations learnednfrom existing language-specific data.

contextualized sense embeddings sense embeddings الشرط السياقي المعنى Sense Ageddings. صناعة حمض الفوسفور

Detecting Multilingual COVID-19 Misinformation on Social Media via Contextualized Embeddings

729 - Association for Computation Linguistics 2021 مقالة

We present machine learning classifiers to automatically identify COVID-19 misinformation on social media in three languages: English, Bulgarian, and Arabic. We compared 4 multitask learning models for this task and found that a model trained with En glish BERT achieves the best results for English, and multilingual BERT achieves the best results for Bulgarian and Arabic. We experimented with zero shot, few shot, and target-only conditions to evaluate the impact of target-language training data on classifier performance, and to understand the capabilities of different models to generalize across languages in detecting misinformation online. This work was performed as a submission to the shared task, NLP4IF 2021: Fighting the COVID-19 Infodemic. Our best models achieved the second best evaluation test results for Bulgarian and Arabic among all the participating teams and obtained competitive scores for English.

contextualized embeddings english bert achieves إنجليزي بيرت يحقق صناعة حمض الفوسفور

Multilingual and Cross-Lingual Intent Detection from Spoken Data

1149 - Association for Computation Linguistics 2021 مقالة

We present a systematic study on multilingual and cross-lingual intent detection (ID) from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the ID task with spoken d ata. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language varieties. Our key results indicate that combining machine translation models with state-of-the-art multilingual sentence encoders (e.g., LaBSE) yield strong intent detectors in the majority of target languages covered in MInDS-14, and offer comparative analyses across different axes: e.g., translation direction, impact of speech recognition, data augmentation from a related domain. We see this work as an important step towards more inclusive development and evaluation of multilingual ID from spoken data, hopefully in a much wider spectrum of languages compared to prior work.

cross-lingual intent detection spoken data الكشف عن النية عبر اللغات البيانات المنطوقة صناعة حمض الفوسفور

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora

665 - Association for Computation Linguistics 2021 مقالة

We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously transla tes and reconstructs an input sentence. Through sharing model parameters among different languages, our model jointly trains the word embeddings in a common cross-lingual space. We also propose to combine word and subword embeddings to make use of orthographic similarities across different languages. We base our experiments on real-world data from endangered languages, namely Yongning Na, Shipibo-Konibo, and Griko. Our experiments on bilingual lexicon induction and word alignment tasks show that our model outperforms existing methods by a large margin for most language pairs. These results demonstrate that, contrary to common belief, an encoder-decoder translation model is beneficial for learning cross-lingual representations even in extremely low-resource conditions. Furthermore, our model also works well on high-resource conditions, achieving state-of-the-art performance on a German-English word-alignment task.

learning contextualised cross-lingual contextualised cross-lingual word تعلم السياق عبر اللغات الكلمة التبادلية السياقية صناعة حمض الفوسفور

Adversarial Attack against Cross-lingual Knowledge Graph Alignment

1102 - Association for Computation Linguistics 2021 مقالة

Recent literatures have shown that knowledge graph (KG) learning models are highly vulnerable to adversarial attacks. However, there is still a paucity of vulnerability analyses of cross-lingual entity alignment under adversarial attacks. This paper proposes an adversarial attack model with two novel attack techniques to perturb the KG structure and degrade the quality of deep cross-lingual entity alignment. First, an entity density maximization method is employed to hide the attacked entities in dense regions in two KGs, such that the derived perturbations are unnoticeable. Second, an attack signal amplification method is developed to reduce the gradient vanishing issues in the process of adversarial attacks for further improving the attack effectiveness.

cross-lingual knowledge graph knowledge graph alignment الرسم البياني المعرفة عبر اللغات المعرفة الرسم البياني محاذاة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity

تشفير المدينات السياقية المعرفة غير المباشرة والتبلغة من الاصطلاحية

Ask ChatGPT about the research

Read More

suggested questions