Do you want to publish a course? Click here

Using Fisher Information In Big Data

103   0   0.0 ( 0 )
 Added by Nasir Ahmad
 Publication date 2015
and research's language is English




Ask ChatGPT about the research

In this era of Big Data, proficient use of data mining is the key to capture useful information from any dataset. As numerous data mining techniques make use of information theory concepts, in this paper, we discuss how Fisher information (FI) can be applied to analyze patterns in Big Data. The main advantage of FI is its ability to combine multiple variables together to inform us on the overall trends and stability of a system. It can therefore detect whether a system is losing dynamic order and stability, which may serve as a signal of an impending regime shift. In this work, we first provide a brief overview of Fisher information theory, followed by a simple step-by-step numerical example on how to compute FI. Finally, as a numerical demonstration, we calculate the evolution of FI for GDP per capita (current US Dollar) and total population of the USA from 1960 to 2013.



rate research

Read More

Two new proofs of the Fisher information inequality (FII) using data processing inequalities for mutual information and conditional variance are presented.
399 - Fionn Murtagh 2016
We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the $L_0$ pseudo-norm, we use the p-adic norm. We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all computational processing operations are direct reading of the data, or are bounded by a constant number of direct readings of the data, implying linear computational time. Through p-adic sparse data coding, efficient storage results, and for bounded p-adic norm stored data, search and retrieval are constant time operations. Examples show the effectiveness of this new approach to content-driven encoding and displaying of data.
The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in information theory, statistics and machine learning. Recently several MI estimators have been proposed that can achieve parametric MSE convergence rate. However, most of the previously proposed estimators have the high computational complexity of at least $O(N^2)$. We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in $mathbb{R}^d$ based on $N$ i.i.d. samples. The reduced complexity MI estimator, called the ensemble dependency graph estimator (EDGE), combines randomized locality sensitive hashing (LSH), dependency graphs, and ensemble bias-reduction methods. We prove that EDGE achieves optimal computational complexity $O(N)$, and can achieve the optimal parametric MSE rate of $O(1/N)$ if the density is $d$ times differentiable. To the best of our knowledge EDGE is the first non-parametric MI estimator that can achieve parametric MSE rates with linear time complexity. We illustrate the utility of EDGE for the analysis of the information plane (IP) in deep learning. Using EDGE we shed light on a controversy on whether or not the compression property of information bottleneck (IB) in fact holds for ReLu and other rectification functions in deep neural networks (DNN).
According to Kolmogorov complexity, every finite binary string is compressible to a shortest code -- its information content -- from which it is effectively recoverable. We investigate the extent to which this holds for infinite binary sequences (streams). We devise a new coding method which uniformly codes every stream $X$ into an algorithmically random stream $Y$, in such a way that the first $n$ bits of $X$ are recoverable from the first $I(Xupharpoonright_n)$ bits of $Y$, where $I$ is any partial computable information content measure which is defined on all prefixes of $X$, and where $Xupharpoonright_n$ is the initial segment of $X$ of length $n$. As a consequence, if $g$ is any computable upper bound on the initial segment prefix-free complexity of $X$, then $X$ is computable from an algorithmically random $Y$ with oracle-use at most $g$. Alternatively (making no use of such a computable bound $g$) one can achieve an oracle-use bounded above by $K(Xupharpoonright_n)+log n$. This provides a strong analogue of Shannons source coding theorem for algorithmic information theory.
In this article we study lossless compression of strings of pure quantum states of indeterminate-length quantum codes which were introduced by Schumacher and Westmoreland. Past work has assumed that the strings of quantum data are prepared to be encoded in an independent and identically distributed way. We introduce the notion of quantum stochastic ensembles, allowing us to consider strings of quantum states prepared in a more general way. For any identically distributed quantum stochastic ensemble we define an associated quantum Markov chain and prove that the optimal average codeword length via lossless coding is equal to the quantum dynamical entropy of the associated quantum Markov chain.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا