ﻻ يوجد ملخص باللغة العربية
Finding statistically significant high-order interaction features in predictive modeling is important but challenging task. The difficulty lies in the fact that, for a recent applications with high-dimensional covariates, the number of possible high-order interaction features would be extremely large. Identifying statistically significant features from such a huge pool of candidates would be highly challenging both in computational and statistical senses. To work with this problem, we consider a two stage algorithm where we first select a set of high-order interaction features by marginal screening, and then make statistical inferences on the regression model fitted only with the selected features. Such statistical inferences are called post-selection inference (PSI), and receiving an increasing attention in the literature. One of the seminal recent advancements in PSI literature is the works by Lee et al. where the authors presented an algorithmic framework for computing exact sampling distributions in PSI. A main challenge when applying their approach to our high-order interaction models is to cope with the fact that PSI in general depends not only on the selected features but also on the unselected features, making it hard to apply to our extremely high-dimensional high-order interaction models. The goal of this paper is to overcome this difficulty by introducing a novel efficient method for PSI. Our key idea is to exploit the underlying tree structure among high-order interaction features, and to develop a pruning method of the tree which enables us to quickly identify a group of unselected features that are guaranteed to have no influence on PSI. The experimental results indicate that the proposed method allows us to reliably identify statistically significant high-order interaction features with reasonable computational cost.
Hawkes process provides an effective statistical framework for analyzing the time-dependent interaction of neuronal spiking activities. Although utilized in many real applications, the classic Hawkes process is incapable of modelling inhibitory inter
Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a crit
Topic models are Bayesian models that are frequently used to capture the latent structure of certain corpora of documents or images. Each data element in such a corpus (for instance each item in a collection of scientific articles) is regarded as a c
Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorit
Applying standard statistical methods after model selection may yield inefficient estimators and hypothesis tests that fail to achieve nominal type-I error rates. The main issue is the fact that the post-selection distribution of the data differs fro