كمية المنتج (PQ) هي تقنية واستخدامها على نطاق واسع لاسترجاع الإعلانات المخصصة. تقترح الدراسات الحديثة PQ خاضعة للإشراف، حيث يمكن تدريب نماذج التضمين والتجميل بشكل مشترك مع التعلم الخاضع للإشراف. ومع ذلك، هناك نقص في الصياغة المناسبة لهدف التدريب المشترك؛ وبالتالي، فإن التحسينات حول الأساس غير المشرف السابق محدودة في الواقع. في هذا العمل، نقترح قياس كمية المنتج الموجهة نحو المطابقة (MOPQ)، حيث يتم صياغة فقدان MultioLli Outlastive MultioLli مهدفا. مع تقليل MCL، نحن قادرون على زيادة احتمال مطابقة الاستعلام ومفتاح الحقيقة الأرضية، مما يساهم في دقة الاسترجاع المثلى. بالنظر إلى أن الحساب الدقيق ل MCL مستعصرا بسبب طلب عينات متباينة واسعة، فإننا نقترح مزيد من أخذ العينات عبر الأجهزة المختلفة (DCS)، والذي يزيد بشكل كبير من العينات المقنعة لتقريب دقيق من MCL. نقوم بإجراء دراسات تجريبية واسعة النطاق على أربعة مجموعات بيانات حقيقية، والتي تحقق نتائجها من فعالية MOPQ. الرمز متاح في https://github.com/microsoft /mopq.
Product quantization (PQ) is a widely used technique for ad-hoc retrieval. Recent studies propose supervised PQ, where the embedding and quantization models can be jointly trained with supervised learning. However, there is a lack of appropriate formulation of the joint training objective; thus, the improvements over previous non-supervised baselines are limited in reality. In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated. With the minimization of MCL, we are able to maximize the matching probability of query and ground-truth key, which contributes to the optimal retrieval accuracy. Given that the exact computation of MCL is intractable due to the demand of vast contrastive samples, we further propose the Differentiable Cross-device Sampling (DCS), which significantly augments the contrastive samples for precise approximation of MCL. We conduct extensive experimental studies on four real-world datasets, whose results verify the effectiveness of MoPQ. The code is available at https://github.com/microsoft/MoPQ.
References used
https://aclanthology.org/
In recent years, speech synthesis system can generate speech with high speech quality. However, multi-speaker text-to-speech (TTS) system still require large amount of speech data for each target speaker. In this study, we would like to construct a m
Narrative analysis is becoming increasingly important for a number of linguistic tasks including summarization, knowledge extraction, and question answering. We present a novel approach for narrative event representation using attention to re-context
Word representations empowered with additional linguistic information have been widely studied and proved to outperform traditional embeddings. Current methods mainly focus on learning embeddings for words while embeddings of linguistic information (
Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations alo
The embedding-based large-scale query-document retrieval problem is a hot topic in the information retrieval (IR) field. Considering that pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, we present a Q