ﻻ يوجد ملخص باللغة العربية
We analyzed historical and literary documents in Chinese to gain insights into research issues, and overview our studies which utilized four different sources of text materials in this paper. We investigated the history of concepts and transliterated words in China with the Database for the Study of Modern China Thought and Literature, which contains historical documents about China between 1830 and 1930. We also attempted to disambiguate names that were shared by multiple government officers who served between 618 and 1912 and were recorded in Chinese local gazetteers. To showcase the potentials and challenges of computer-assisted analysis of Chinese literatures, we explored some interesting yet non-trivial questions about two of the Four Great Classical Novels of China: (1) Which monsters attempted to consume the Buddhist monk Xuanzang in the Journey to the West (JTTW), which was published in the 16th century, (2) Which was the most powerful monster in JTTW, and (3) Which major role smiled the most in the Dream of the Red Chamber, which was published in the 18th century. Similar approaches can be applied to the analysis and study of modern documents, such as the newspaper articles published about the 228 incident that occurred in 1947 in Taiwan.
Japan is a unique country with a distinct cultural heritage, which is reflected in billions of historical documents that have been preserved. However, the change in Japanese writing system in 1900 made these documents inaccessible for the general pub
Content zoning can be understood as a segmentation of textual documents into zones. This is inspired by [6] who initially proposed an approach for the argumentative zoning of textual documents. With the prototypical CoZo+ engine, we focus on content
Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP). Recently, inspired by the success of pre-trained language models (PLMs) in the generic
Machine translation requires large amounts of parallel text. While such datasets are abundant in domains such as newswire, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our
Literary artefacts are generally indexed and searched based on titles, meta data and keywords over the years. This searching and indexing works well when user/reader already knows about that particular creative textual artefact or document. This inde