ترغب بنشر مسار تعليمي؟ اضغط هنا

Latent Theme Dictionary Model for Finding Co-occurrent Patterns in Process Data

103   0   0.0 ( 0 )
 نشر من قبل Guanhua Fang
 تاريخ النشر 2019
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Process data, temporally ordered categorical observations, are of recent interest due to its increasing abundance and the desire to extract useful information. A process is a collection of time-stamped events of different types, recording how an individual behaves in a given time period. The process data are too complex in terms of size and irregularity for the classical psychometric models to be applicable, at least directly, and, consequently, it is desirable to develop new ways for modeling and analysis. We introduce herein a latent theme dictionary model (LTDM) for processes that identifies co-occurrent event patterns and individuals with similar behavioral patterns. Theoretical properties are established under certain regularity conditions for the likelihood based estimation and inference. A non-parametric Bayes LTDM algorithm using the Markov Chain Monte Carlo method is proposed for computation. Simulation studies show that the proposed approach performs well in a range of situations. The proposed method is applied to an item in the 2012 Programme for International Student Assessment with interpretable findings.



قيم البحث

اقرأ أيضاً

We propose a latent topic model with a Markovian transition for process data, which consist of time-stamped events recorded in a log file. Such data are becoming more widely available in computer-based educational assessment with complex problem solv ing items. The proposed model can be viewed as an extension of the hierarchical Bayesian topic model with a hidden Markov structure to accommodate the underlying evolution of an examinees latent state. Using topic transition probabilities along with response times enables us to capture examinees learning trajectories, making clustering/classification more efficient. A forward-backward variational expectation-maximization (FB-VEM) algorithm is developed to tackle the challenging computational problem. Useful theoretical properties are established under certain asymptotic regimes. The proposed method is applied to a complex problem solving item in 2012 Programme for International Student Assessment (PISA 2012).
Convolutional dictionary learning (CDL), the problem of estimating shift-invariant templates from data, is typically conducted in the absence of a prior/structure on the templates. In data-scarce or low signal-to-noise ratio (SNR) regimes, which have received little attention from the community, learned templates overfit the data and lack smoothness, which can affect the predictive performance of downstream tasks. To address this limitation, we propose GPCDL, a convolutional dictionary learning framework that enforces priors on templates using Gaussian Processes (GPs). With the focus on smoothness, we show theoretically that imposing a GP prior is equivalent to Wiener filtering the learned templates, thereby suppressing high-frequency components and promoting smoothness. We show that the algorithm is a simple extension of the classical iteratively reweighted least squares, which allows the flexibility to experiment with different smoothness assumptions. Through simulation, we show that GPCDL learns smooth dictionaries with better accuracy than the unregularized alternative across a range of SNRs. Through an application to neural spiking data from rats, we show that learning templates by GPCDL results in a more accurate and visually-interpretable smooth dictionary, leading to superior predictive performance compared to non-regularized CDL, as well as parametric alternatives.
This work proposes a nonparametric method to compare the underlying mean functions given two noisy datasets. The motivation for the work stems from an application of comparing wind turbine power curves. Comparing wind turbine data presents new proble ms, namely the need to identify the regions of difference in the input space and to quantify the extent of difference that is statistically significant. Our proposed method, referred to as funGP, estimates the underlying functions for different data samples using Gaussian process models. We build a confidence band using the probability law of the estimated function differences under the null hypothesis. Then, the confidence band is used for the hypothesis test as well as for identifying the regions of difference. This identification of difference regions is a distinct feature, as existing methods tend to conduct an overall hypothesis test stating whether two functions are different. Understanding the difference regions can lead to further practical insights and help devise better control and maintenance strategies for wind turbines. The merit of funGP is demonstrated by using three simulation studies and four real wind turbine datasets.
We consider modeling, inference, and computation for analyzing multivariate binary data. We propose a new model that consists of a low dimensional latent variable component and a sparse graphical component. Our study is motivated by analysis of item response data in cognitive assessment and has applications to many disciplines where item response data are collected. Standard approaches to item response data in cognitive assessment adopt the multidimensional item response theory (IRT) models. However, human cognition is typically a complicated process and thus may not be adequately described by just a few factors. Consequently, a low-dimensional latent factor model, such as the multidimensional IRT models, is often insufficient to capture the structure of the data. The proposed model adds a sparse graphical component that captures the remaining ad hoc dependence. It reduces to a multidimensional IRT model when the graphical component becomes degenerate. Model selection and parameter estimation are carried out simultaneously through construction of a pseudo-likelihood function and properly chosen penalty terms. The convexity of the pseudo-likelihood function allows us to develop an efficient algorithm, while the penalty terms generate a low-dimensional latent component and a sparse graphical structure. Desirable theoretical properties are established under suitable regularity conditions. The method is applied to the revised Eysencks personality questionnaire, revealing its usefulness in item analysis. Simulation results are reported that show the new method works well in practical situations.
We introduce a numerically tractable formulation of Bayesian joint models for longitudinal and survival data. The longitudinal process is modelled using generalised linear mixed models, while the survival process is modelled using a parametric genera l hazard structure. The two processes are linked by sharing fixed and random effects, separating the effects that play a role at the time scale from those that affect the hazard scale. This strategy allows for the inclusion of non-linear and time-dependent effects while avoiding the need for numerical integration, which facilitates the implementation of the proposed joint model. We explore the use of flexible parametric distributions for modelling the baseline hazard function which can capture the basic shapes of interest in practice. We discuss prior elicitation based on the interpretation of the parameters. We present an extensive simulation study, where we analyse the inferential properties of the proposed models, and illustrate the trade-off between flexibility, sample size, and censoring. We also apply our proposal to two real data applications in order to demonstrate the adaptability of our formulation both in univariate time-to-event data and in a competing risks framework. The methodology is implemented in rstan.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا