ترغب بنشر مسار تعليمي؟ اضغط هنا

Estimation of experimental data redundancy and related statistics

329   0   0.0 ( 0 )
 نشر من قبل Igor Grabec
 تاريخ النشر 2007
  مجال البحث فيزياء
والبحث باللغة English
 تأليف I. Grabec




اسأل ChatGPT حول البحث

Redundancy of experimental data is the basic statistic from which the complexity of a natural phenomenon and the proper number of experiments needed for its exploration can be estimated. The redundancy is expressed by the entropy of information pertaining to the probability density function of experimental variables. Since the calculation of entropy is inconvenient due to integration over a range of variables, an approximate expression for redundancy is derived that includes only a sum over the set of experimental data about these variables. The approximation makes feasible an efficient estimation of the redundancy of data along with the related experimental information and information cost function. From the experimental information the complexity of the phenomenon can be simply estimated, while the proper number of experiments needed for its exploration can be determined from the minimum of the cost function. The performance of the approximate estimation of these statistics is demonstrated on two-dimensional normally distributed random data.

قيم البحث

اقرأ أيضاً

140 - I. Grabec 2007
The extraction of a physical law y=yo(x) from joint experimental data about x and y is treated. The joint, the marginal and the conditional probability density functions (PDF) are expressed by given data over an estimator whose kernel is the instrume nt scattering function. As an optimal estimator of yo(x) the conditional average is proposed. The analysis of its properties is based upon a new definition of prediction quality. The joint experimental information and the redundancy of joint measurements are expressed by the relative entropy. With the number of experiments the redundancy on average increases, while the experimental information converges to a certain limit value. The difference between this limit value and the experimental information at a finite number of data represents the discrepancy between the experimentally determined and the true properties of the phenomenon. The sum of the discrepancy measure and the redundancy is utilized as a cost function. By its minimum a reasonable number of data for the extraction of the law yo(x) is specified. The mutual information is defined by the marginal and the conditional PDFs of the variables. The ratio between mutual information and marginal information is used to indicate which variable is the independent one. The properties of the introduced statistics are demonstrated on deterministically and randomly related variables.
In this paper, we consider a surrogate modeling approach using a data-driven nonparametric likelihood function constructed on a manifold on which the data lie (or to which they are close). The proposed method represents the likelihood function using a spectral expansion formulation known as the kernel embedding of the conditional distribution. To respect the geometry of the data, we employ this spectral expansion using a set of data-driven basis functions obtained from the diffusion maps algorithm. The theoretical error estimate suggests that the error bound of the approximate data-driven likelihood function is independent of the variance of the basis functions, which allows us to determine the amount of training data for accurate likelihood function estimations. Supporting numerical results to demonstrate the robustness of the data-driven likelihood functions for parameter estimation are given on instructive examples involving stochastic and deterministic differential equations. When the dimension of the data manifold is strictly less than the dimension of the ambient space, we found that the proposed approach (which does not require the knowledge of the data manifold) is superior compared to likelihood functions constructed using standard parametric basis functions defined on the ambient coordinates. In an example where the data manifold is not smooth and unknown, the proposed method is more robust compared to an existing polynomial chaos surrogate model which assumes a parametric likelihood, the non-intrusive spectral projection.
We consider the evolution of a network of neurons, focusing on the asymptotic behavior of spikes dynamics instead of membrane potential dynamics. The spike response is not sought as a deterministic response in this context, but as a conditional proba bility : Reading out the code consists of inferring such a probability. This probability is computed from empirical raster plots, by using the framework of thermodynamic formalism in ergodic theory. This gives us a parametric statistical model where the probability has the form of a Gibbs distribution. In this respect, this approach generalizes the seminal and profound work of Schneidman and collaborators. A minimal presentation of the formalism is reviewed here, while a general algorithmic estimation method is proposed yielding fast convergent implementations. It is also made explicit how several spike observables (entropy, rate, synchronizations, correlations) are given in closed-form from the parametric estimation. This paradigm does not only allow us to estimate the spike statistics, given a design choice, but also to compare different models, thus answering comparative questions about the neural code such as : are correlations (or time synchrony or a given set of spike patterns, ..) significant with respect to rate coding only ? A numerical validation of the method is proposed and the perspectives regarding spike-train code analysis are also discussed.
113 - John Harlim 2018
Modern scientific computational methods are undergoing a transformative change; big data and statistical learning methods now have the potential to outperform the classical first-principles modeling paradigm. This book bridges this transition, connec ting the theory of probability, stochastic processes, functional analysis, numerical analysis, and differential geometry. It describes two classes of computational methods to leverage data for modeling dynamical systems. The first is concerned with data fitting algorithms to estimate parameters in parametric models that are postulated on the basis of physical or dynamical laws. The second class is on operator estimation, which uses the data to nonparametrically approximate the operator generated by the transition function of the underlying dynamical systems. This self-contained book is suitable for graduate studies in applied mathematics, statistics, and engineering. Carefully chosen elementary examples with supplementary MATLAB codes and appendices covering the relevant prerequisite materials are provided, making it suitable for self-study.
It is generally known that counting statistics is not correctly described by a Gaussian approximation. Nevertheless, in neutron scattering, it is common practice to apply this approximation to the counting statistics; also at low counting numbers. We show that the application of this approximation leads to skewed results not only for low-count features, such as background level estimation, but also for its estimation at double-digit count numbers. In effect, this approximation is shown to be imprecise on all levels of count. Instead, a Multinomial approach is introduced as well as a more standard Poisson method, which we compare with the Gaussian case. These two methods originate from a proper analysis of a multi-detector setup and a standard triple axis instrument.We devise a simple mathematical procedure to produce unbiased fits using the Multinomial distribution and demonstrate this method on synthetic and actual inelastic scattering data. We find that the Multinomial method provide almost unbiased results, and in some cases outperforms the Poisson statistics. Although significantly biased, the Gaussian approach is in general more robust in cases where the fitted model is not a true representation of reality. For this reason, a proper data analysis toolbox for low-count neutron scattering should therefore contain more than one model for counting statistics.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا