Do you want to publish a course? Click here

Density Estimation Trees as fast non-parametric modelling tools

71   0   0.0 ( 0 )
 Added by Lucio Anderlini
 Publication date 2016
and research's language is English




Ask ChatGPT about the research

Density Estimation Trees (DETs) are decision trees trained on a multivariate dataset to estimate its probability density function. While not competitive with kernel techniques in terms of accuracy, they are incredibly fast, embarrassingly parallel and relatively small when stored to disk. These properties make DETs appealing in the resource-expensive horizon of the LHC data analysis. Possible applications may include selection optimization, fast simulation and fast detector calibration. In this contribution I describe the algorithm, made available to the HEP community in a RooFit implementation. A set of applications under discussion within the LHCb Collaboration are also briefly illustrated.



rate research

Read More

151 - Lucio Anderlini 2015
Density Estimation Trees can play an important role in exploratory data analysis for multidimensional, multi-modal data models of large samples. I briefly discuss the algorithm, a self-optimization technique based on kernel density estimation, and some applications in High Energy Physics.
185 - Xinyi Xu , Feng Liang 2010
We consider the problem of estimating the predictive density of future observations from a non-parametric regression model. The density estimators are evaluated under Kullback--Leibler divergence and our focus is on establishing the exact asymptotics of minimax risk in the case of Gaussian errors. We derive the convergence rate and constant for minimax risk among Bayesian predictive densities under Gaussian priors and we show that this minimax risk is asymptotically equivalent to that among all density estimators.
Here we present a new non-parametric approach to density estimation and classification derived from theory in Radon transforms and image reconstruction. We start by constructing a forward problem in which the unknown density is mapped to a set of one dimensional empirical distribution functions computed from the raw input data. Interpreting this mapping in terms of Radon-type projections provides an analytical connection between the data and the density with many very useful properties including stable invertibility, fast computation, and significant theoretical grounding. Using results from the literature in geometric inverse problems we give uniqueness results and stability estimates for our methods. We subsequently extend the ideas to address problems in manifold learning and density estimation on manifolds. We introduce two new algorithms which can be readily applied to implement density estimation using Radon transforms in low dimensions or on low dimensional manifolds embedded in $mathbb{R}^d$. We test our algorithms performance on a range of synthetic 2-D density estimation problems, designed with a mixture of sharp edges and smooth features. We show that our algorithm can offer a consistently competitive performance when compared to the state-of-the-art density estimation methods from the literature.
Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or none from the machine learning community. None of that work has been applied to greater than bivariate data, presumably due to the computational difficulty of data-driven bandwidth selection. We describe the double kernel conditional density estimator and derive fast dual-tree-based algorithms for bandwidth selection using a maximum likelihood criterion. These techniques give speedups of up to 3.8 million in our experiments, and enable the first applications to previously intractable large multivariate datasets, including a redshift prediction problem from the Sloan Digital Sky Survey.
Disease surveillance is essential not only for the prior detection of outbreaks but also for monitoring trends of the disease in the long run. In this paper, we aim to build a tactical model for the surveillance of dengue, in particular. Most existing models for dengue prediction exploit its known relationships between climate and socio-demographic factors with the incidence counts, however they are not flexible enough to capture the steep and sudden rise and fall of the incidence counts. This has been the motivation for the methodology used in our paper. We build a non-parametric, flexible, Gaussian Process (GP) regression model that relies on past dengue incidence counts and climate covariates, and show that the GP model performs accurately, in comparison with the other existing methodologies, thus proving to be a good tactical and robust model for health authorities to plan their course of action.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا