أثبتت تقنيات ما قبل التدريب المسبقة بالمحولات من النص والتخطيط فعالا في عدد من مهام تفهم المستند.على الرغم من هذا النجاح، تعاني نماذج التدريب المسبق متعددة الوسائط من تكاليف حساب حسابية عالية جدا.بدافع من استراتيجيات القراءة البشرية، تقدم هذه الورقة انتباه الجميل، وهي آلية اهتمام جديدة تستفيد من هيكل الوثيقة وتخطيطها.يحضر Skim-Animaly فقط من الموقف الثاني الأبعاد للكلمات في وثيقة.تظهر تجاربنا أن اهتمام Skim-lective يحصل على حيرة أقل من الأعمال السابقة، في حين أن تكون أكثر فعالية بشكل فعال.يمكن دمج إيلاء اهتمام Skim مع محولات طويلة المدى لعمل المستندات الطويلة بكفاءة.نوضح أيضا كيف يمكن استخدام Skim-lecture خارج الرف كقنعة لأي نموذج لغوي مدرب مسبقا، مما يتيح تحسين أدائها أثناء تقييد الانتباه.أخيرا، نظهر ظهور تمثيل هيكل وثيقة في اهتمام Skim.
Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2-dimensional position of the words in a document. Our experiments show that Skim-Attention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.
References used
https://aclanthology.org/
Millions of hashtags are created on social media every day to cross-refer messages concerning similar topics. To help people find the topics they want to discuss, this paper characterizes a user's hashtagging preferences via predicting how likely the
Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle
Presentations are critical for communication in all areas of our lives, yet the creation of slide decks is often tedious and time-consuming. There has been limited research aiming to automate the document-to-slides generation process and all face a c
Document-level relation extraction aims to identify relations between entities in a whole document. Prior efforts to capture long-range dependencies have relied heavily on implicitly powerful representations learned through (graph) neural networks, w
Fully understanding narratives often requires identifying events in the context of whole documents and modeling the event relations. However, document-level event extraction is a challenging task as it requires the extraction of event and entity core