ﻻ يوجد ملخص باللغة العربية
We show that the Zipfs law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipfs law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipfian power-law regime for frequent characters (first layer) with an exponential-like regime for less frequent characters (second layer). For these two layers we provide different (though related) theoretical descriptions that include the range of low-frequency characters (hapax legomena). The comparative analysis of rank-frequency relations for Chinese characters versus English words illustrates the extent to which the characters play for Chinese writers the same role as the words for those writing within alphabetical systems.
Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common s
Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural lang
Named Entity Recognition and Relation Extraction for Chinese literature text is regarded as the highly difficult problem, partially because of the lack of tagging sets. In this paper, we build a discourse-level dataset from hundreds of Chinese litera
Thermodynamic fluctuations in mechanical resonators cause uncertainty in their frequency measurement, fundamentally limiting performance of frequency-based sensors. Recently, integrating nanophotonic motion readout with micro- and nano-mechanical res
Graphical passwords (GPWs) are in many areas of the current world. Topological graphic passwords (Topsnut-gpws) are a new type of cryptography, and they differ from the existing GPWs. A Topsnut-gpw consists of two parts: one is a topological structur