في هذا البرنامج التعليمي، نقدم جزءا من الخبرة الصناعية الفريدة في التعليق التوضيحي لبيانات اللغة الطبيعية فعالة عبر الجماعة الجماعية المشتركة من قبل كل من الباحثين والمهندسين الرئيسيين من ياندكس.سنقوم بإعطاء مقدمة لوضع علامات البيانات عبر أسواق الجماعة الجماعية العامة وستقدم المكونات الرئيسية لجمع الملصقات الفعالة.سيتبع ذلك جلسة عملية، حيث يتناول المشاركون مهمة إنتاج موارد عالمية حقيقية، تجربة مع تحديد إعدادات عملية وضع العلامات، وإطلاق مشروع مجموعة الملصقات الخاصة بهم على أحد أكبر أسواق الجماعة الجماعية.سيتم تشغيل المشروعات على الحشود الحقيقية داخل جلسة البرنامج التعليمي وسنقدم تقنيات مفيدة لمراقبة الجودة وتزويد الحضور بفرصة لمناقشة أفكارهم التوضيحية الخاصة بهم.
In this tutorial, we present a portion of unique industry experience in efficient natural language data annotation via crowdsourcing shared by both leading researchers and engineers from Yandex. We will make an introduction to data labeling via public crowdsourcing marketplaces and will present the key components of efficient label collection. This will be followed by a practical session, where participants address a real-world language resource production task, experiment with selecting settings for the labeling process, and launch their label collection project on one of the largest crowdsourcing marketplaces. The projects will be run on real crowds within the tutorial session and we will present useful quality control techniques and provide the attendees with an opportunity to discuss their own annotation ideas.
References used
https://aclanthology.org/
The advent of Deep Learning and the availability of large scale datasets has accelerated research on Natural Language Generation with a focus on newer tasks and better models. With such rapid progress, it is vital to assess the extent of scientific p
This paper presents a production Semi-Supervised Learning (SSL) pipeline based on the student-teacher framework, which leverages millions of unlabeled examples to improve Natural Language Understanding (NLU) tasks. We investigate two questions relate
Reviewing contracts is a time-consuming procedure that incurs large expenses to companies and social inequality to those who cannot afford it. In this work, we propose document-level natural language inference (NLI) for contracts'', a novel, real-wor
It is generally agreed upon in the natural language processing (NLP) community that ethics should be integrated into any curriculum. Being aware of and understanding the relevant core concepts is a prerequisite for following and participating in the
Natural Language Understanding (NLU) is an established component within a conversational AI or digital assistant system, and it is responsible for producing semantic understanding of a user request. We propose a scalable and automatic approach for im