لا يزال التقييم التلقائي للحوارات المفتوحة للحوالات تحديا ملحوظا إلى حد كبير.على الرغم من وفرة العمل المنجز في هذا المجال، يتعين على القضاة البشري تقييم جودة الحوارات.نتيجة لذلك، يؤدي أداء هذه التقييمات على نطاق واسع مكلفة.يحقق هذا العمل في استخدام نموذج تعليمي عميق مدرب على التقييم التقييم في اللغة العامة (الغراء) بمثابة إشارة عالية الجودة للحوارات المفتوحة للنطاق.الهدف من ذلك هو استخدام مهام الغراء المختلفة كوجهات نظر مختلفة بشأن الحكم على جودة المحادثة، وبالتالي تقليل الحاجة إلى بيانات تدريبية إضافية أو ردود تعمل بمثابة مراجع جودة.نظرا لهذه الطبيعة، يمكن للطريقة استنتاج مختلف مقاييس الجودة ويمكن أن تستمد النتيجة الإجمالية القائمة على المكونات.نحن نحقق معاملات الارتباط ذات دلالة إحصائية تصل إلى 0.7.
The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Despite the abundance of work done in the field, human judges have to evaluate dialogues' quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and can derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.
References used
https://aclanthology.org/
Architectural design process is relatively complex considered due to the different
content with users difference, therefore, each design has its own advantages that are
difficult to standardize the process, as some have seen as architectural design
The big value of dams in the Syrian coast comes from using them for irrigation and sometimes as source of potable water. This study aimed to determine some chemical indicators of water quality in Lattakia dams during ten years (2002-2011). The
conce
We develop a unified system to answer directly from text open-domain questions that may require a varying number of retrieval steps. We employ a single multi-task transformer model to perform all the necessary subtasks---retrieving supporting facts,
Despite the remarkable performance of large-scale generative models in open-domain conversation, they are known to be less practical for building real-time conversation systems due to high latency. On the other hand, retrieval models could return res
Resolving pronouns to their referents has long been studied as a fundamental natural language understanding problem. Previous works on pronoun coreference resolution (PCR) mostly focus on resolving pronouns to mentions in text while ignoring the exop