No Arabic abstract
In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech recognition) with the addition of random effects to the generative model. The random effects component of the model handles shape variability across different waveforms within a general class of waveforms of similar shape. We show that this probabilistic model provides a unified framework for learning these models from sets of waveform data as well as parsing, classification, and prediction of new waveforms. We derive a computationally efficient EM algorithm to fit the model on multiple waveforms, and introduce a scoring method that evaluates a test waveform based on its shape. Results on two real-world data sets demonstrate that the random effects methodology leads to improved accuracy (compared to alternative approaches) on classification and segmentation of real-world waveforms.
Traditional voxel-level multiple testing procedures in neuroimaging, mostly $p$-value based, often ignore the spatial correlations among neighboring voxels and thus suffer from substantial loss of power. We extend the local-significance-index based procedure originally developed for the hidden Markov chain models, which aims to minimize the false nondiscovery rate subject to a constraint on the false discovery rate, to three-dimensional neuroimaging data using a hidden Markov random field model. A generalized expectation-maximization algorithm for maximizing the penalized likelihood is proposed for estimating the model parameters. Extensive simulations show that the proposed approach is more powerful than conventional false discovery rate procedures. We apply the method to the comparison between mild cognitive impairment, a disease status with increased risk of developing Alzheimers or another dementia, and normal controls in the FDG-PET imaging study of the Alzheimers Disease Neuroimaging Initiative.
The equations of a physical constitutive model for material stress within tantalum grains were solved numerically using a tetrahedrally meshed volume. The resulting output included a scalar vonMises stress for each of the more than 94,000 tetrahedra within the finite element discretization. In this paper, we define an intricate statistical model for the spatial field of vonMises stress which uses the given grain geometry in a fundamental way. Our model relates the three-dimensional field to integrals of latent stochastic processes defined on the vertices of the one- and two-dimensional grain boundaries. An intuitive neighborhood structure of said boundary nodes suggested the use of a latent Gaussian Markov random field (GMRF). However, despite the potential for computational gains afforded by GMRFs, the integral nature of our model and the sheer number of data points pose substantial challenges for a full Bayesian analysis. To overcome these problems and encourage efficient exploration of the posterior distribution, a number of techniques are now combined: parallel computing, sparse matrix methods, and a modification of a block update strategy within the sampling routine. In addition, we use an auxiliary variables approach to accommodate the presence of outliers in the data.
We demonstrate the application of pattern recognition algorithms via hidden Markov models (HMM) for qubit readout. This scheme provides a state-path trajectory approach capable of detecting qubit state transitions and makes for a robust classification scheme with higher starting state assignment fidelity than when compared to a multivariate Gaussian (MVG) or a support vector machine (SVM) scheme. Therefore, the method also eliminates the qubit-dependent readout time optimization requirement in current schemes. Using a HMM state discriminator we estimate fidelities reaching the ideal limit. Unsupervised learning gives access to transition matrix, priors, and IQ distributions, providing a toolbox for studying qubit state dynamics during strong projective readout.
In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric modeling where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the classical EM algorithm can be adapted to infer the model parameters. For the initialisation step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the merging criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.
Understanding centennial scale climate variability requires data sets that are accurate, long, continuous and of broad spatial coverage. Since instrumental measurements are generally only available after 1850, temperature fields must be reconstructed using paleoclimate archives, known as proxies. Various climate field reconstructions (CFR) methods have been proposed to relate past temperature to such proxy networks. In this work, we propose a new CFR method, called GraphEM, based on Gaussian Markov random fields embedded within an EM algorithm. Gaussian Markov random fields provide a natural and flexible framework for modeling high-dimensional spatial fields. At the same time, they provide the parameter reduction necessary for obtaining precise and well-conditioned estimates of the covariance structure, even in the sample-starved setting common in paleoclimate applications. In this paper, we propose and compare the performance of different methods to estimate the graphical structure of climate fields, and demonstrate how the GraphEM algorithm can be used to reconstruct past climate variations. The performance of GraphEM is compared to the widely used CFR method RegEM with regularization via truncated total least squares, using synthetic data. Our results show that GraphEM can yield significant improvements, with uniform gains over space, and far better risk properties. We demonstrate that the spatial structure of temperature fields can be well estimated by graphs where each neighbor is only connected to a few geographically close neighbors, and that the increase in performance is directly related to recovering the underlying sparsity in the covariance of the spatial field. Our work demonstrates how significant improvements can be made in climate reconstruction methods by better modeling the covariance structure of the climate field.