ترغب بنشر مسار تعليمي؟ اضغط هنا

A Comparison of natural (english) and artificial (esperanto) languages. A Multifractal method based analysis

212   0   0.0 ( 0 )
 نشر من قبل Marcel Ausloos
 تاريخ النشر 2008
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a comparison of two english texts, written by Lewis Carroll, one (Alice in wonderland) and the other (Through a looking glass), the former translated into esperanto, in order to observe whether natural and artificial languages significantly differ from each other. We construct one dimensional time series like signals using either word lengths or word frequencies. We use the multifractal ideas for sorting out correlations in the writings. In order to check the robustness of the methods we also write the corresponding shuffled texts. We compare characteristic functions and e.g. observe marked differences in the (far from parabolic) f(alpha) curves, differences which we attribute to Tsallis non extensive statistical features in the frequency time series and length time series. The esperanto text has more extreme vallues. A very rough approximation consists in modeling the texts as a random Cantor set if resulting from a binomial cascade of long and short words (or words and blanks). This leads to parameters characterizing the text style, and most likely in fine the author writings.



قيم البحث

اقرأ أيضاً

98 - M. Ausloos 2008
A comparison of two english texts from Lewis Carroll, one (Alice in wonderland), also translated into esperanto, the other (Through a looking glass) are discussed in order to observe whether natural and artificial languages significantly differ from each other. One dimensional time series like signals are constructed using only word frequencies (FTS) or word lengths (LTS). The data is studied through (i) a Zipf method for sorting out correlations in the FTS and (ii) a Grassberger-Procaccia (GP) technique based method for finding correlations in LTS. Features are compared : different power laws are observed with characteristic exponents for the ranking properties, and the {it phase space attractor dimensionality}. The Zipf exponent can take values much less than unity ($ca.$ 0.50 or 0.30) depending on how a sentence is defined. This non-universality is conjectured to be a measure of the author $style$. Moreover the attractor dimension $r$ is a simple function of the so called phase space dimension $n$, i.e., $r = n^{lambda}$, with $lambda = 0.79$. Such an exponent should also conjecture to be a measure of the author $creativity$. However, even though there are quantitative differences between the original english text and its esperanto translation, the qualitative differences are very minutes, indicating in this case a translation relatively well respecting, along our analysis lines, the content of the author writing.
Many complex systems generate multifractal time series which are long-range cross-correlated. Numerous methods have been proposed to characterize the multifractal nature of these long-range cross correlations. However, several important issues about these methods are not well understood and most methods consider only one moment order. We study the joint multifractal analysis based on partition function with two moment orders, which was initially invented to investigate fluid fields, and derive analytically several important properties. We apply the method numerically to binomial measures with multifractal cross correlations and bivariate fractional Brownian motions without multifractal cross correlations. For binomial multifractal measures, the explicit expressions of mass function, singularity strength and multifractal spectrum of the cross correlations are derived, which agree excellently with the numerical results. We also apply the method to stock market indexes and unveil intriguing multifractality in the cross correlations of index volatilities.
Multifractal analysis has become a powerful signal processing tool that characterizes signals or images via the fluctuations of their pointwise regularity, quantified theoretically by the so-called multifractal spectrum. The practical estimation of t he multifractal spectrum fundamentally relies on exploiting the scale dependence of statistical properties of appropriate multiscale quantities, such as wavelet leaders, that can be robustly computed from discrete data. Despite successes of multifractal analysis in various real-world applications, current estimation procedures remain essentially limited to providing concave upper-bound estimates, while there is a priori no reason for the multifractal spectrum to be a concave function. This work addresses this severe practical limitation and proposes a novel formalism for multifractal analysis that enables nonconcave multifractal spectra to be estimated in a stable way. The key contributions reside in the development and theoretical study of a generalized multifractal formalism to assess the multiscale statistics of wavelet leaders, and in devising a practical algorithm that permits this formalism to be applied to real-world data, allowing for the estimation of nonconcave multifractal spectra. Numerical experiments are conducted on several synthetic multifractal processes as well as on a real-world remote-sensing image and demonstrate the benefits of the proposed multifractal formalism over the state of the art.
The Super Characters method addresses sentiment analysis problems by first converting the input text into images and then applying 2D-CNN models to classify the sentiment. It achieves state of the art performance on many benchmark datasets. However, it is not as straightforward to apply in Latin languages as in Asian languages. Because the 2D-CNN model is designed to recognize two-dimensional images, it is better if the inputs are in the form of glyphs. In this paper, we propose SEW (Squared English Word) method generating a squared glyph for each English word by drawing Super Characters images of each English word at the alphabet level, combining the squared glyph together into a whole Super Characters image at the sentence level, and then applying the CNN model to classify the sentiment within the sentence. We applied the SEW method to Wikipedia dataset and obtained a 2.1% accuracy gain compared to the original Super Characters method. For multi-modal data with both structured tabular data and unstructured natural language text, the modified SEW method integrates the data into a single image and classifies sentiment with one unified CNN model.
We present the condensation method that exploits the heterogeneity of the probability distribution functions (PDF) of event locations to improve the spatial information content of seismic catalogs. The method reduces the size of seismic catalogs whil e improving the access to the spatial information content of seismic catalogs. The PDFs of events are first ranked by decreasing location errors and then successively condensed onto better located and lower variance event PDFs. The obtained condensed catalog attributes different weights to each event, providing an optimal spatial representation with respect to the spatially varying location capability of the seismic network. Synthetic tests on fractal distributions perturbed with realistic location errors show that condensation improves spatial information content of the original catalog. Applied to Southern California seismicity, the new condensed catalog highlights major mapped fault traces and reveals possible additional structures while reducing the catalog length by ~25%. The condensation method allows us to account for location error information within a point based spatial analysis. We demonstrate this by comparing the multifractal properties of the condensed catalog locations with those of the original catalog. We evidence different spatial scaling regimes characterized by distinct multifractal spectra and separated by transition scales. We interpret the upper scale as to agree with the thickness of the brittle crust, while the lower scale (2.5km) might depend on the relocation procedure. Accounting for these new results, the Epidemic Type Aftershock Model formulation suggests that, contrary to previous studies, large earthquakes dominate the earthquake triggering process. This implies that the limited capability of detecting small magnitude events cannot be used to argue that earthquakes are unpredictable in general.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا