New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Multi-Vector Attention Models for Deep Re-ranking

نماذج اهتمام ناقلات متعددة

273 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

غالبا ما تستخدم أنظمة استرجاع المستندات على نطاق واسع أنماطين من نماذج الشبكة العصبية التي تعيش في طرفي مختلفين للحساب المشترك مقابل الدقة. النمط الأول هو طرازات تشفير مزدوجة (أو برج ثنائي)، حيث يتم حساب استعلام وتمثيلات المستند بشكل مستقل تماما ومجتمعة مع عملية منتج بسيطة DOT. النمط الثاني هو النماذج المتقاطعة، حيث يتم تسليم ميزات الاستعلام والوثائق في طبقة الإدخال ويستند جميع الحساب على تمثيل مستند الاستعلام المشترك. عادة ما تستخدم نماذج التشفير المزدوج للاسترجاع وإعادة التعتيم العميق، في حين عادة ما يتم استخدام نماذج الانتباه عبر الترتيب الضحل. في هذه الورقة، نقدم بنية خفيفة الوزن تستكشف هذه التكلفة المشتركة مقابل إيقاف تشغيل الدقة بناء على اهتمام متعدد ناقلات (MVA). نحن نقيم بدقة طريقتنا على مجموعة بيانات استرجاع MS-MARCO وإظهار كيفية التجارة الكفاءة من دقة الاسترجاع مع حساب مشترك وتكلفة تخزين المستندات دون اتصال. نظرا لأن تمثيل مستند مضغوط للغاية وسيتم تحقيق حساب مشترك غير مكلف من خلال مزيج من الرموز التجارية المستفادة التجمع والزواج العدواني. لدينا التعليمات البرمجية ونقاط التفتيش مفتوحة ومتاحة على Github.

Large-scale document retrieval systems often utilize two styles of neural network models which live at two different ends of the joint computation vs. accuracy spectrum. The first style is dual encoder (or two-tower) models, where the query and document representations are computed completely independently and combined with a simple dot product operation. The second style is cross-attention models, where the query and document features are concatenated in the input layer and all computation is based on the joint query-document representation. Dual encoder models are typically used for retrieval and deep re-ranking, while cross-attention models are typically used for shallow re-ranking. In this paper, we present a lightweight architecture that explores this joint cost vs. accuracy trade-off based on multi-vector attention (MVA). We thoroughly evaluate our method on the MS-MARCO passage retrieval dataset and show how to efficiently trade off retrieval accuracy with joint computation and offline document storage cost. We show that a highly compressed document representation and inexpensive joint computation can be achieved through a combination of learned pooling tokens and aggressive downprojection. Our code and model checkpoints are open-source and available on GitHub.

References used

https://aclanthology.org/

rate research

Learning about Word Vector Representations and Deep Learning through Implementing Word2vec

264 - Association for Computation Linguistics 2021 مقالة

Word vector representations are an essential part of an NLP curriculum. Here, we describe a homework that has students implement a popular method for learning word vectors, word2vec. Students implement the core parts of the method, including text pre processing, negative sampling, and gradient descent. Starter code provides guidance and handles basic operations, which allows students to focus on the conceptually challenging aspects. After generating their vectors, students evaluate them using qualitative and quantitative tests.

word vector representations vector representations تمثيلات ناقلات الكلمة تمثيل ناقلات صناعة حمض الفوسفور

Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

390 - Association for Computation Linguistics 2021 مقالة

Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make litt le contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in the text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on widely adopted WMT14 and WMT16 English to German and English to Czech language machine translation validate the RHE effectiveness.

redundant heads multi-head self-attention multi-head self-attention recently رؤساء الزائدة متعدد رئيس الانتباه متعدد رئيس الانتباه مؤخرا صناعة حمض الفوسفور المزيد..

1213Li at SemEval-2021 Task 6: Detection of Propaganda with Multi-modal Attention and Pre-trained Models

524 - Association for Computation Linguistics 2021 مقالة

This paper presents the solution proposed by the 1213Li team for subtask 3 in SemEval-2021 Task 6: identifying the multiple persuasion techniques used in the multi-modal content of the meme. We explored various approaches in feature extraction and th e detection of persuasion labels. Our final model employs pre-trained models including RoBERTa and ResNet-50 as a feature extractor for texts and images, respectively, and adopts a label embedding layer with multi-modal attention mechanism to measure the similarity of labels with the multi-modal information and fuse features for label prediction. Our proposed method outperforms the provided baseline method and achieves 3rd out of 16 participants with 0.54860/0.22830 for Micro/Macro F1 scores.

detection of propaganda multi-modal attention propaganda with multi-modal اهتمام متعدد الوسائط دعاية مع متعددة مشروط صناعة حمض الفوسفور

WikiBERT Models: Deep Transfer Learning for Many Languages

600 - Association for Computation Linguistics 2021 مقالة

Deep neural language models such as BERT have enabled substantial recent advances in many natural language processing tasks. However, due to the effort and computational cost involved in their pre-training, such models are typically introduced only f or a small number of high-resource languages such as English. While multilingual models covering large numbers of languages are available, recent work suggests monolingual training can produce better models, and our understanding of the tradeoffs between mono- and multilingual training is incomplete. In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models. We assess the merits of these models using cloze tests and the state-of-the-art UDify parser on Universal Dependencies data, contrasting performance with results using the multilingual BERT (mBERT) model. We find that the newly introduced WikiBERT models outperform mBERT in cloze tests for nearly all languages, and that UDify using WikiBERT models outperforms the parser using mBERT on average, with the language-specific models showing substantially improved performance for some languages, yet limited improvement or a decrease in performance for others. All of the methods and models introduced in this work are available under open licenses from https://github.com/turkunlp/wikibert.

deep transfer learning deep transfer التعلم العميق التعلم انتقال عميق صناعة حمض الفوسفور

Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models

295 - Association for Computation Linguistics 2021 مقالة

Open-domain extractive question answering works well on textual data by first retrieving candidate texts and then extracting the answer from those candidates. However, some questions cannot be answered by text alone but require information stored in tables. In this paper, we present an approach for retrieving both texts and tables relevant to a question by jointly encoding texts, tables and questions into a single vector space. To this end, we create a new multi-modal dataset based on text and table datasets from related work and compare the retrieval performance of different encoding schemata. We find that dense vector embeddings of transformer models outperform sparse embeddings on four out of six evaluation datasets. Comparing different dense embedding models, tri-encoders with one encoder for each question, text and table increase retrieval performance compared to bi-encoders with one encoder for the question and one for both text and tables. We release the newly created multi-modal dataset to the community so that it can be used for training and evaluation.

tables texts الجداول نصوص صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Multi-Vector Attention Models for Deep Re-ranking

نماذج اهتمام ناقلات متعددة

Ask ChatGPT about the research

Read More

suggested questions