ﻻ يوجد ملخص باللغة العربية
Word translation is an integral part of language translation. In machine translation, each language is considered a domain with its own word embedding. The alignment between word embeddings allows linking semantically equivalent words in multilingual contexts. Moreover, it offers a way to infer cross-lingual meaning for words without a direct translation. Current methods for word embedding alignment are either supervised, i.e. they require known word pairs, or learn a cross-domain transformation on fixed embeddings in an unsupervised way. Here we propose an end-to-end approach for word embedding alignment that does not require known word pairs. Our method, termed Word Alignment through MMD (WAM), learns embeddings that are aligned during sentence translation training using a localized Maximum Mean Discrepancy (MMD) constraint between the embeddings. We show that our method not only out-performs unsupervised methods, but also supervised methods that train on known word translations.
In many modern day systems such as information extraction and knowledge management agents, ontologies play a vital role in maintaining the concept hierarchies of the selected domain. However, ontology population has become a problematic process due t
Cross-lingual word embeddings aim to capture common linguistic regularities of different languages, which benefit various downstream tasks ranging from machine translation to transfer learning. Recently, it has been shown that these embeddings can be
Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that refle
While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query
Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length speech segments. For zero-resource languages where labelled data is not available, one AWE approach is to use unsupervised autoencoder-based recurrent models. An