ﻻ يوجد ملخص باللغة العربية
Natural language dialogue systems raise great attention recently. As many dialogue models are data-driven, high-quality datasets are essential to these systems. In this paper, we introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively. To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization, deduplication, segmentation, and filtering. The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models. Besides, current dialogue datasets for personalized chatbot usually contain several persona sentences or attributes. Different from existing datasets, Pchatbot provides anonymized user IDs and timestamps for both posts and responses. This enables the development of personalized dialogue models that directly learn implicit user personality from the users dialogue history. Our preliminary experimental study benchmarks several state-of-the-art dialogue models to provide a comparison for future work. The dataset can be publicly accessed at Github.
The problem of verifying whether a textual hypothesis holds based on the given evidence, also known as fact verification, plays an important role in the study of natural language understanding and semantic representation. However, existing studies ar
Personalized dialogue systems are an essential step toward better human-machine interaction. Existing personalized dialogue agents rely on properly designed conversational datasets, which are mostly monolingual (e.g., English), which greatly limits t
Question answering and conversational systems are often baffled and need help clarifying certain ambiguities. However, limitations of existing datasets hinder the development of large-scale models capable of generating and utilising clarification que
Developing conversational agents to interact with patients and provide primary clinical advice has attracted increasing attention due to its huge application potential, especially in the time of COVID-19 Pandemic. However, the training of end-to-end
Dependency parsing is a longstanding natural language processing task, with its outputs crucial to various downstream tasks. Recently, neural network based (NN-based) dependency parsing has achieved significant progress and obtained the state-of-the-