نقترح سهولة، أداة تشخيصية بسيطة للإجابة على السؤال المرئي (VQA) الذي يحدد صعوبة الصورة، عينة السؤال.يعتمد سهولة على نمط الإجابات التي قدمها المعلقون المتعددين على سؤال معين.على وجه الخصوص، تعتبر جوانبين من الإجابات: (1) انتروبيا؛(2) المحتوى الدلالي.أولا، نثبت صحة تشخيصنا لتحديد عينات سهلة / من الصعب لنماذج VQA الحديثة.ثانيا، نعرض أن هذه السهولة يمكن استخدامها بنجاح لتحديد العينات الأكثر إعلانية للتدريب / ضبط الدقيقة.بشكل حاسم، يتم استخدام المعلومات فقط المتوفرة بسهولة في أي مجموعة بيانات VQA لحساب درجاتها.
We propose EASE, a simple diagnostic tool for Visual Question Answering (VQA) which quantifies the difficulty of an image, question sample. EASE is based on the pattern of answers provided by multiple annotators to a given question. In particular, it considers two aspects of the answers: (i) their Entropy; (ii) their Semantic content. First, we prove the validity of our diagnostic to identify samples that are easy/hard for state-of-art VQA models. Second, we show that EASE can be successfully used to select the most-informative samples for training/fine-tuning. Crucially, only information that is readily available in any VQA dataset is used to compute its scores.
References used
https://aclanthology.org/
In education, quiz questions have become an important tool for assessing the knowledge of students. Yet, manually preparing such questions is a tedious task, and thus automatic question generation has been proposed as a possible alternative. So far,
MiniVQA is a Jupyter notebook to build a tailored VQA competition for your students. The resource creates all the needed resources to create a classroom competition that engages and inspires your students on the free, self-service Kaggle platform. InClass competitions make machine learning fun!.
The evaluation of question answering models compares ground-truth annotations with model predictions. However, as of today, this comparison is mostly lexical-based and therefore misses out on answers that have no lexical overlap but are still semanti
Although showing promising values to downstream applications, generating question and answer together is under-explored. In this paper, we introduce a novel task that targets question-answer pair generation from visual images. It requires not only ge
Being able to generate accurate word alignments is useful for a variety of tasks. While statistical word aligners can work well, especially when parallel training data are plentiful, multilingual embedding models have recently been shown to give good