ﻻ يوجد ملخص باللغة العربية
Embedding from Language Models (ELMo) has shown to be effective for improving many natural language processing (NLP) tasks, and ELMo takes character information to compose word representation to train language models.However, the character is an insufficient and unnatural linguistic unit for word representation.Thus we introduce Embedding from Subword-aware Language Models (ESuLMo) which learns word representation from subwords using unsupervised segmentation over words.We show that ESuLMo can enhance four benchmark NLP tasks more effectively than ELMo, including syntactic dependency parsing, semantic role labeling, implicit discourse relation recognition and textual entailment, which brings a meaningful improvement over ELMo.
Multilingual pretrained representations generally rely on subword segmentation algorithms to create a shared multilingual vocabulary. However, standard heuristic algorithms often lead to sub-optimal segmentation, especially for languages with limited
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. Thi
Existing Image Captioning (IC) systems model words as atomic units in captions and are unable to exploit the structural information in the words. This makes representation of rare words very difficult and out-of-vocabulary words impossible. Moreover,
Current neural query auto-completion (QAC) systems rely on character-level language models, but they slow down when queries are long. We present how to utilize subword language models for the fast and accurate generation of query completion candidate
Named entity recognition (NER) is a vital task in spoken language understanding, which aims to identify mentions of named entities in text e.g., from transcribed speech. Existing neural models for NER rely mostly on dedicated word-level representatio