نقترح نقدي علامتي التعلم النشط (CAL)، وهي خوارزمية للتعلم النشطة الجديدة (AL) التي تستغل سلوك النموذج على الحالات الفردية أثناء التدريب كوكيل للعثور على أكثر الحالات إعلامية لوضع العلامات.يستقبل Cal بواسطة خرائط البيانات، التي اقترحت مؤخرا أن تستمد الأفكار في جودة البيانات (Swayamdipta et al.، 2020).قارنا طريقنا على مهام تصنيف النص الشعبي لاستراتيجيات آل شائعة، والتي تعتمد بدلا من ذلك على سلوك ما بعد التدريب.نوضح أن CAL منافسة أساليب المنطقية المشتركة الأخرى، مما يدل على أن الديناميات التدريبية المستمدة من بيانات البذور الصغيرة يمكن استخدامها بنجاح في آل.نحن نقدم رؤى في طريقتنا الجديدة من خلال تحليل إحصاءات المستوى الدفاعية باستخدام خرائط البيانات.تبين نتائجنا كذلك أن Cal ينتج عنه استراتيجية تعليمية أكثر كفاءة في البيانات، وتحقيق نتائج قابلة للمقارنة أو أفضل مع بيانات تدريب أقل بكثير.
We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.
References used
This paper investigates and reveals the relationship between two closely related machine learning disciplines, namely Active Learning (AL) and Curriculum Learning (CL), from the lens of several novel curricula. This paper also introduces Active Curri
The range of works that can be considered as developing NLP for social good (NLP4SG) is enormous. While many of them target the identification of hate speech or fake news, there are others that address, e.g., text simplification to alleviate conseque
State-of-the-art multilingual systems rely on shared vocabularies that sufficiently cover all considered languages. To this end, a simple and frequently used approach makes use of subword vocabularies constructed jointly over several languages. We hy
Logical Observation Identifiers Names and Codes (LOINC) is a standard set of codes that enable clinicians to communicate about medical tests. Laboratories depend on LOINC to identify what tests a doctor orders for a patient. However, clinicians often
The paper reports on the methodology and final results of a large-scale synset mapping between plWordNet and Princeton WordNet. Dedicated manual and semi-automatic mapping procedures as well as interlingual relation types for nouns, verbs, adjectives and adverbs are described. The statistics of all types of interlingual relations are also provided.