بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Rank-frequency relation for Chinese characters

452 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Weibing Deng

تاريخ النشر 2013

مجال البحث الهندسة المعلوماتية فيزياء

والبحث باللغة English

تأليف W.B. Deng - A.E. Allahverdyan - B. Li

الحساب واللغة تحليل البيانات والإحصاءات والاحتمال

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We show that the Zipfs law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipfs law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipfian power-law regime for frequent characters (first layer) with an exponential-like regime for less frequent characters (second layer). For these two layers we provide different (though related) theoretical descriptions that include the range of low-frequency characters (hapax legomena). The comparative analysis of rank-frequency relations for Chinese characters versus English words illustrates the extent to which the characters play for Chinese writers the same role as the words for those writing within alphabetical systems.

قيم البحث

74 - Falcon Z. Dai , Zheng Cai 2017

Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common s ubstructures that provide semantic, phonetic and syntactic hints. In this work, we propose to explicitly incorporate the visual appearance of a characters glyph in its representation, resulting in a novel glyph-aware embedding of Chinese characters. Being inspired by the success of convolutional neural networks in computer vision, we use them to incorporate the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In the context of two basic Chinese NLP tasks of language modeling and word segmentation, the model learns to represent each characters task-relevant semantic and syntactic information in the character-level embedding.

الحساب واللغة التعلم الآلي

Allotaxonometry and rank-turbulence divergence: A universal instrument for comparing complex systems

186 - P. S. Dodds , J. R. Minot , M. V. Arnold 2020

Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural lang uage, and node degree in complex networks. Comparisons of component size distributions for two complex systems---or a system with itself at two different time points---generally employ information-theoretic instruments, such as Jensen-Shannon divergence. We argue that these methods lack transparency and adjustability, and should not be applied when component probabilities are non-sensible or are problematic to estimate. Here, we introduce `allotaxonometry along with `rank-turbulence divergence, a tunable instrument for comparing any two (Zipfian) ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and then establish a rank-based allotaxonograph which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank-turbulence divergence for a series of distinct settings including: Language use on Twitter and in books, species abundance, baby name popularity, market capitalization, performance in sports, mortality causes, and job titles. We provide a series of supplementary flipbooks which demonstrate the tunability and storytelling power of rank-based allotaxonometry.

الفيزياء والمجتمع تحليل البيانات والإحصاءات والاحتمال

A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text

77 - Jingjing Xu , Ji Wen , Xu Sun 2017

Named Entity Recognition and Relation Extraction for Chinese literature text is regarded as the highly difficult problem, partially because of the lack of tagging sets. In this paper, we build a discourse-level dataset from hundreds of Chinese litera ture articles for improving this task. To build a high quality dataset, we propose two tagging methods to solve the problem of data inconsistency, including a heuristic tagging method and a machine auxiliary tagging method. Based on this corpus, we also introduce several widely used models to conduct experiments. Experimental results not only show the usefulness of the proposed dataset, but also provide baselines for further research. The dataset is available at https://github.com/lancopku/Chinese-Literature-NER-RE-Dataset

الحساب واللغة

Thermodynamic Limit for Linear Harmonic Oscillator Resonance Frequency Measurement

113 - Mingkang Wang , Vladimir Aksyuk 2019

Thermodynamic fluctuations in mechanical resonators cause uncertainty in their frequency measurement, fundamentally limiting performance of frequency-based sensors. Recently, integrating nanophotonic motion readout with micro- and nano-mechanical res onators allowed practical chip-scale sensors to routinely operate near this limit in high-bandwidth measurements. However, the exact and general expressions for either thermodynamic frequency measurement uncertainty or efficient, real-time frequency estimators are not well established, particularly for fast and weakly-driven resonators. Here, we derive, and numerically validate, the Cramer-Rao lower bound (CRLB) and an efficient maximum-likelihood estimator for the frequency of a classical linear harmonic oscillator subject to thermodynamic fluctuations. For a fluctuating oscillator without external drive, the frequency Allan deviation calculated from simulated resonator motion data agrees with the derived CRLB $sigma_f = {1 over 2pi}sqrt{Gamma over 2tau}$ for averaging times $tau$ below, as well as above, the relaxation time $1overGamma$. The CRLB approach is general and can be extended to driven resonators, non-negligible motion detection imprecision, as well as backaction from a continuous linear quantum measurement.

الفيزياء التطبيقية تحليل البيانات والإحصاءات والاحتمال

Using Chinese Characters To Generate Text-Based Passwords For Information Security

94 - Bing Yao , Yarong Mu , Yirong Sun 2019

Graphical passwords (GPWs) are in many areas of the current world. Topological graphic passwords (Topsnut-gpws) are a new type of cryptography, and they differ from the existing GPWs. A Topsnut-gpw consists of two parts: one is a topological structur e (graph), and one is a set of discrete elements (a graph labelling, or coloring), the topological structure connects these discrete elements together to form an interesting story. Our idea is to transform Chinese characters into computer and electronic equipments with touch screen by speaking, writing and keyboard for forming Hanzi-graphs and Hanzi-gpws. We will use Hanzigpws to produce text-based passwords (TB-paws). We will introduce flawed graph labellings on disconnected Hanzi-graphs.

نظرية المعلومات التوافقية نظرية المعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الإتحاد الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Rank-frequency relation for Chinese characters

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً