في حين أن مقاييس الأداء الكلية يمكن أن تولد رؤى قيمة على نطاق واسع، إلا أن هيمنتها تعني ظاهرة أكثر تعقيدا وشغنا، مثل الغموض، قد يتم التغاضي عنها.التركيز على الشروط الغامضة (على سبيل المثال مشمس، غائم، شاب، إلخ) نحن تفحص سلوك النماذج المدرجة بشكل مريئي والنصوص فقط، وإيجاد الاختلافات المنهجية من الأحكام الإنسانية حتى عندما يكون الأداء العام للنموذج مرتفعا.للمساعدة في تفسير هذا التباين، نحدد افتراضين أدلى به مجموعات البيانات والنماذج التي تم فحصها، وتسترشد بفلسفة الغموض، عزل الحالات التي لا تعقد فيها.
While aggregate performance metrics can generate valuable insights at a large scale, their dominance means more complex and nuanced language phenomena, such as vagueness, may be overlooked. Focusing on vague terms (e.g. sunny, cloudy, young, etc.) we inspect the behavior of visually grounded and text-only models, finding systematic divergences from human judgments even when a model's overall performance is high. To help explain this disparity, we identify two assumptions made by the datasets and models examined and, guided by the philosophy of vagueness, isolate cases where they do not hold.
References used
https://aclanthology.org/
We address the problem of enhancing model robustness through regularization. Specifically, we focus on methods that regularize the model posterior difference between clean and noisy inputs. Theoretically, we provide a connection of two recent methods
The paper reports on an effort to reconsider the representation of some cases of derivational paradigm patterns in Bulgarian. The new treatment implemented within BulTreeBank-WordNet (BTB-WN), a wordnet for Bulgarian, is the grouping together of rela
The quality of fully automated text simplification systems is not good enough for use in real-world settings; instead, human simplifications are used. In this paper, we examine how to improve the cost and quality of human simplifications by leveragin
We ask subjects whether they perceive as human-produced a bunch of texts, some of which are actually human-written, while others are automatically generated. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, an
We outline the Great Misalignment Problem in natural language processing research, this means simply that the problem definition is not in line with the method proposed and the human evaluation is not in line with the definition nor the method. We st