ﻻ يوجد ملخص باللغة العربية
Named entity recognition (NER) is a critical step in modern search query understanding. In the domain of eCommerce, identifying the key entities, such as brand and product type, can help a search engine retrieve relevant products and therefore offer an engaging shopping experience. Recent research shows promising results on shared benchmark NER tasks using deep learning methods, but there are still unique challenges in the industry regarding domain knowledge, training data, and model production. This paper demonstrates an end-to-end solution to address these challenges. The core of our solution is a novel model training framework TripleLearn which iteratively learns from three separate training datasets, instead of one training set as is traditionally done. Using this approach, the best model lifts the F1 score from 69.5 to 93.3 on the holdout test data. In our offline experiments, TripleLearn improved the model performance compared to traditional training approaches which use a single set of training data. Moreover, in the online A/B test, we see significant improvements in user engagement and revenue conversion. The model has been live on homedepot.com for more than 9 months, boosting search
We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass. Evaluated on WebQSP and GraphQuestions with extended annotations that cover multiple entities p
Language is inherent and compulsory for human communication. Whether expressed in a written or spoken way, it ensures understanding between people of the same and different regions. With the growing awareness and effort to include more low-resourced
Taking word sequences as the input, typical named entity recognition (NER) models neglect errors from pre-processing (e.g., tokenization). However, these errors can influence the model performance greatly, especially for noisy texts like tweets. Here
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a v
Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modellin