Density Estimation Trees as fast non-parametric modelling tools

71 0 0.0 ( 0 )

Download Cite

Added by Lucio Anderlini

Publication date 2016

fields Mathematical Statistics

and research's language is English

Authors Lucio Anderlini

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Density Estimation Trees (DETs) are decision trees trained on a multivariate dataset to estimate its probability density function. While not competitive with kernel techniques in terms of accuracy, they are incredibly fast, embarrassingly parallel and relatively small when stored to disk. These properties make DETs appealing in the resource-expensive horizon of the LHC data analysis. Possible applications may include selection optimization, fast simulation and fast detector calibration. In this contribution I describe the algorithm, made available to the HEP community in a RooFit implementation. A set of applications under discussion within the LHCb Collaboration are also briefly illustrated.

rate research

Density Estimation Trees in High Energy Physics

183 - Lucio Anderlini 2015

Density Estimation Trees can play an important role in exploratory data analysis for multidimensional, multi-modal data models of large samples. I briefly discuss the algorithm, a self-optimization technique based on kernel density estimation, and some applications in High Energy Physics.

Applications High Energy Physics - Experiment Data Analysis Statistics and Probability

Asymptotic minimax risk of predictive density estimation for non-parametric regression

196 - Xinyi Xu , Feng Liang 2010

We consider the problem of estimating the predictive density of future observations from a non-parametric regression model. The density estimators are evaluated under Kullback--Leibler divergence and our focus is on establishing the exact asymptotics of minimax risk in the case of Gaussian errors. We derive the convergence rate and constant for minimax risk among Bayesian predictive densities under Gaussian priors and we show that this minimax risk is asymptotically equivalent to that among all density estimators.

Statistics Theory Statistics Theory

On non-parametric density estimation on linear and non-linear manifolds using generalized Radon transforms

183 - James Webber , Erika Hussey , Eric Miller 2019

Here we present a new non-parametric approach to density estimation and classification derived from theory in Radon transforms and image reconstruction. We start by constructing a forward problem in which the unknown density is mapped to a set of one dimensional empirical distribution functions computed from the raw input data. Interpreting this mapping in terms of Radon-type projections provides an analytical connection between the data and the density with many very useful properties including stable invertibility, fast computation, and significant theoretical grounding. Using results from the literature in geometric inverse problems we give uniqueness results and stability estimates for our methods. We subsequently extend the ideas to address problems in manifold learning and density estimation on manifolds. We introduce two new algorithms which can be readily applied to implement density estimation using Radon transforms in low dimensions or on low dimensional manifolds embedded in $mathbb{R}^d$. We test our algorithms performance on a range of synthetic 2-D density estimation problems, designed with a mixture of sharp edges and smooth features. We show that our algorithm can offer a consistently competitive performance when compared to the state-of-the-art density estimation methods from the literature.

Numerical Analysis

Fast Nonparametric Conditional Density Estimation

454 - Michael P. Holmes , Alexander G. Gray , Charles Lee Isbell 2012

Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or none from the machine learning community. None of that work has been applied to greater than bivariate data, presumably due to the computational difficulty of data-driven bandwidth selection. We describe the double kernel conditional density estimator and derive fast dual-tree-based algorithms for bandwidth selection using a maximum likelihood criterion. These techniques give speedups of up to 3.8 million in our experiments, and enable the first applications to previously intractable large multivariate datasets, including a redshift prediction problem from the Sloan Digital Sky Survey.

Methodology Machine Learning Machine Learning

A robust and non-parametric model for prediction of dengue incidence

62 - Atlanta Chakraborty 2020

Disease surveillance is essential not only for the prior detection of outbreaks but also for monitoring trends of the disease in the long run. In this paper, we aim to build a tactical model for the surveillance of dengue, in particular. Most existing models for dengue prediction exploit its known relationships between climate and socio-demographic factors with the incidence counts, however they are not flexible enough to capture the steep and sudden rise and fall of the incidence counts. This has been the motivation for the methodology used in our paper. We build a non-parametric, flexible, Gaussian Process (GP) regression model that relies on past dengue incidence counts and climate covariates, and show that the GP model performs accurately, in comparison with the other existing methodologies, thus proving to be a good tactical and robust model for health authorities to plan their course of action.

Applications