بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Spectral Analysis of Word Statistics

94 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chaim Even-Zohar

تاريخ النشر 2020

مجال البحث

والبحث باللغة English

تأليف Chaim Even-Zohar - Tsviqa Lakrec - Ran J. Tessler

الاحتمالات التوافقية نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Given a random text over a finite alphabet, we study the frequencies at which fixed-length words occur as subsequences. As the data size grows, the joint distribution of word counts exhibits a rich asymptotic structure. We investigate all linear combinations of subword statistics, and fully characterize their different orders of magnitude using diverse algebraic tools. Moreover, we establish the spectral decomposition of the space of word statistics of each order. We provide explicit formulas for the eigenvectors and eigenvalues of the covariance matrix of the multivariate distribution of these statistics. Our techniques include and elaborate on a set of algebraic word operators, recently studied and employed by Dieker and Saliola (Adv Math, 2018). Subword counts find applications in Combinatorics, Statistics, and Computer Science. We revisit special cases from the combinatorial literature, such as intransitive dice, random core partitions, and questions on random walk. Our structural approach describes in a unified framework several classical statistical tests. We propose further potential applications to data analysis and machine learning.

قيم البحث

256 - Gelu M. Nita 2016

We obtain analytical approximations for the expectation and variance of the Spectral Kurtosis estimator in the case of Gaussian and coherent transient time domain signals mixed with a quasi-stationary Gaussian background, which are suitable for pract ical estimations of their signal-to-noise ratio and duty-cycle relative to the instrumental integration time. We validate these analytical approximations by means of numerical simulations and demonstrate that such estimates are affected by statistical uncertainties that, for a suitable choice of the integration time, may not exceed a few percent. Based on these analytical results, we suggest a multiscale Spectral Kurtosis spectrometer design optimized for real-time detection of transient signals, automatic discrimination based on their statistical signature, and measurement of their properties.

الأجهزة والأساليب للزيئات الفيزياء الفلكية الفيزياء الفلكية الشمسية والنجوم نظرية الإحصاء

Empirical spectral distributions of sparse random graphs

126 - Amir Dembo , Eyal Lubetzky , Yumeng Zhang 2016

We study the spectrum of a random multigraph with a degree sequence ${bf D}_n=(D_i)_{i=1}^n$ and average degree $1 ll omega_n ll n$, generated by the configuration model, and also the spectrum of the analogous random simple graph. We show that, when the empirical spectral distribution (ESD) of $omega_n^{-1} {bf D}_n $ converges weakly to a limit $ u$, under mild moment assumptions (e.g., $D_i/omega_n$ are i.i.d. with a finite second moment), the ESD of the normalized adjacency matrix converges in probability to $ uboxtimes sigma_{rm sc}$, the free multiplicative convolution of $ u$ with the semicircle law. Relating this limit with a variant of the Marchenko--Pastur law yields the continuity of its density (away from zero), and an effective procedure for determining its support. Our proof of convergence is based on a coupling between the random simple graph and multigraph with the same degrees, which might be of independent interest. We further construct and rely on a coupling of the multigraph to an inhomogeneous ErdH{o}s-Renyi graph with the target ESD, using three intermediate random graphs, with a negligible fraction of edges modified in each step.

الاحتمالات التوافقية

Multiple pattern matching: A Markov chain approach

425 - Manuel Lladser , M. D. Betterton , Rob Knight 2007

RNA motifs typically consist of short, modular patterns that include base pairs formed within and between modules. Estimating the abundance of these patterns is of fundamental importance for assessing the statistical significance of matches in genome wide searches, and for predicting whether a given function has evolved many times in different species or arose from a single common ancestor. In this manuscript, we review in an integrated and self-contained manner some basic concepts of automata theory, generating functions and transfer matrix methods that are relevant to pattern analysis in biological sequences. We formalize, in a general framework, the concept of Markov chain embedding to analyze patterns in random strings produced by a memoryless source. This conceptualization, together with the capability of automata to recognize complicated patterns, allows a systematic analysis of problems related to the occurrence and frequency of patterns in random strings. The applications we present focus on the concept of synchronization of automata, as well as automata used to search for a finite number of keywords (including sets of patterns generated according to base pairing rules) in a general text.

الاحتمالات التوافقية نظرية الإحصاء

Sparse random tensors: Concentration, regularization and applications

139 - Zhixin Zhou , Yizhe Zhu 2019

We prove a non-asymptotic concentration inequality for the spectral norm of sparse inhomogeneous random tensors with Bernoulli entries. For an order-$k$ inhomogeneous random tensor $T$ with sparsity $p_{max}geq frac{clog n}{n }$, we show that $|T-mat hbb E T|=O(sqrt{n p_{max}}log^{k-2}(n))$ with high probability. The optimality of this bound up to polylog factors is provided by an information theoretic lower bound. By tensor unfolding, we extend the range of sparsity to $p_{max}geq frac{clog n}{n^{m}}$ with $1leq mleq k-1$ and obtain concentration inequalities for different sparsity regimes. We also provide a simple way to regularize $T$ such that $O(sqrt{n^{m}p_{max}})$ concentration still holds down to sparsity $p_{max}geq frac{c}{n^{m}}$ with $k/2leq mleq k-1$. We present our concentration and regularization results with two applications: (i) a randomized construction of hypergraphs of bounded degrees with good expander mixing properties, (ii) concentration of sparsified tensors under uniform sampling.

الاحتمالات التوافقية نظرية الإحصاء

Parameter Estimation for Undirected Graphical Models with Hard Constraints

70 - Bhaswar B. Bhattacharya , Kavita Ramanan 2020

The hardcore model on a graph $G$ with parameter $lambda>0$ is a probability measure on the collection of all independent sets of $G$, that assigns to each independent set $I$ a probability proportional to $lambda^{|I|}$. In this paper we consider th e problem of estimating the parameter $lambda$ given a single sample from the hardcore model on a graph $G$. To bypass the computational intractability of the maximum likelihood method, we use the maximum pseudo-likelihood (MPL) estimator, which for the hardcore model has a surprisingly simple closed form expression. We show that for any sequence of graphs ${G_N}_{Ngeq 1}$, where $G_N$ is a graph on $N$ vertices, the MPL estimate of $lambda$ is $sqrt N$-consistent, whenever the graph sequence has uniformly bounded average degree. We then derive sufficient conditions under which the MPL estimate of the activity parameters is $sqrt N$-consistent given a single sample from a general $H$-coloring model, in which restrictions between adjacent colors are encoded by a constraint graph $H$. We verify the sufficient conditions for models where there is at least one unconstrained color as long as the graph sequence has uniformly bounded average degree. This applies to many $H$-coloring examples such as the Widom-Rowlinson and multi-state hard-core models. On the other hand, for the $q$-coloring model, which falls outside this class, we show that consistent estimation may be impossible even for graphs with bounded average degree. Nevertheless, we show that the MPL estimate is $sqrt N$-consistent in the $q$-coloring model when ${G_N}_{Ngeq 1}$ has bounded average double neighborhood. The presence of hard constraints, as opposed to soft constraints, leads to new challenges, and our proofs entail applications of the method of exchangeable pairs as well as combinatorial arguments that employ the probabilistic method.

الاحتمالات التوافقية نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة قرطبة الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Spectral Analysis of Word Statistics

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً