عادة ما تعتمد نماذج المحادثة المعرضين على وحدة تحديد / استرجاع وحدة نمطية ووحدة جيل، تدربت بشكل منفصل أو في وقت واحد، مع أو دون الوصول إلى خيار معرفة ذهبي. مع إدخال النماذج الكبيرة المدربة مسبقا مسبقا، أصبح جزء الاختيار والجول أكثر وأكثر متشابكا، وتحول التركيز نحو تعزيز دمج المعرفة (من مصادر متعددة) بدلا من محاولة اختيار أفضل خيار المعرفة. ومع ذلك، تعتمد هذه الأساليب على ملصقات المعرفة و / أو المسترد الكثيف منفصل لأفضل أدائها. في هذا العمل، ندرس قدرات الاختيار غير المزروعة من النماذج الإدارية المدربة مسبقا (مثل BART) وإظهار أنه بإضافة وحدة نمطية للدرجات والكبر بين التشفير والكشف، فهي قادرة على تعلم اختيار المعرفة المناسبة من خلال تقليل اللغة فقدان النمذجة (أي دون الوصول إلى ملصقات المعرفة). تدربت على هذا النحو، نموذجنا - K-Mine - يظهر اختيار تنافسي وأداء جيل من النماذج التي تستفيد من ملصقات المعرفة و / أو المسترد الكثيف المنفصل.
Knowledge Grounded Conversation Models are usually based on a selection/retrieval module and a generation module, trained separately or simultaneously, with or without having access to a gold' knowledge option. With the introduction of large pre-trained generative models, the selection and generation part have become more and more entangled, shifting the focus towards enhancing knowledge incorporation (from multiple sources) instead of trying to pick the best knowledge option. These approaches however depend on knowledge labels and/or a separate dense retriever for their best performance. In this work we study the unsupervised selection abilities of pre-trained generative models (e.g. BART) and show that by adding a score-and-aggregate module between encoder and decoder, they are capable of learning to pick the proper knowledge through minimising the language modelling loss (i.e. without having access to knowledge labels). Trained as such, our model - K-Mine - shows competitive selection and generation performance against models that benefit from knowledge labels and/or separate dense retriever.
References used
https://aclanthology.org/
Paraphrase generation has benefited extensively from recent progress in the designing of training objectives and model architectures. However, previous explorations have largely focused on supervised methods, which require a large amount of labeled d
To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms
In this paper, we investigate what types of stereotypical information are captured by pretrained language models. We present the first dataset comprising stereotypical attributes of a range of social groups and propose a method to elicit stereotypes
Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators. However, as our systematic examination reveals, it is still challenging for such models to generate coherent long passag
Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and biased systems score so highly on standard benchmarks that there is little room for researchers who develop better systems to demonstrate their improvements. The