ﻻ يوجد ملخص باللغة العربية
Poetry is one of the most important art forms of human languages. Recently many studies have focused on incorporating some linguistic features of poetry, such as style and sentiment, into its understanding or generation system. However, there is no focus on understanding or evaluating the semantics of poetry. Therefore, we propose a novel task to assess a models semantic understanding of poetry by poem matching. Specifically, this task requires the model to select one line of Chinese classical poetry among four candidates according to the modern Chinese translation of a line of poetry. To construct this dataset, we first obtain a set of parallel data of Chinese classical poetry and modern Chinese translation. Then we retrieve similar lines of poetry with the lines in a poetry corpus as negative choices. We name the dataset Chinese Classical Poetry Matching Dataset (CCPM) and release it at https://github.com/THUNLP-AIPoet/CCPM. We hope this dataset can further enhance the study on incorporating deep semantics into the understanding and generation system of Chinese classical poetry. We also preliminarily run two variants of BERT on this dataset as the baselines for this dataset.
Recent studies in sequence-to-sequence learning demonstrate that RNN encoder-decoder structure can successfully generate Chinese poetry. However, existing methods can only generate poetry with a given first line or users intent theme. In this paper,
Poetry generation has been a difficult task in natural language processing. Unlike plain neural text generation tasks, poetry has a high requirement for novelty, since an easily-understood sentence with too many high frequency words might not be cons
The hierarchy of classical Chinese poetry has been broadly acknowledged by a number of studies in Chinese literature. However, quantitative investigations about the evolutionary linkages of classical Chinese poetry are limited. The primary goal of th
The advancements of neural dialogue generation models show promising results on modeling short-text conversations. However, training such models usually needs a large-scale high-quality dialogue corpus, which is hard to access. In this paper, we pres
This paper presents BSTC (Baidu Speech Translation Corpus), a large-scale Chinese-English speech translation dataset. This dataset is constructed based on a collection of licensed videos of talks or lectures, including about 68 hours of Mandarin data