ﻻ يوجد ملخص باللغة العربية
We introduce Density sketches (DS): a succinct online summary of the data distribution. DS can accurately estimate point wise probability density. Interestingly, DS also provides a capability to sample unseen novel data from the underlying data distribution. Thus, analogous to popular generative models, DS allows us to succinctly replace the real-data in almost all machine learning pipelines with synthetic examples drawn from the same distribution as the original data. However, unlike generative models, which do not have any statistical guarantees, DS leads to theoretically sound asymptotically converging consistent estimators of the underlying density function. Density sketches also have many appealing properties making them ideal for large-scale distributed applications. DS construction is an online algorithm. The sketches are additive, i.e., the sum of two sketches is the sketch of the combined data. These properties allow data to be collected from distributed sources, compressed into a density sketch, efficiently transmitted in the sketch form to a central server, merged, and re-sampled into a synthetic database for modeling applications. Thus, density sketches can potentially revolutionize how we store, communicate, and distribute data.
In this note we illustrate how common matrix approximation methods, such as random projection and random sampling, yield projection-cost-preserving sketches, as introduced in [FSS13, CEM+15]. A projection-cost-preserving sketch is a matrix approximat
We investigate an algorithm named histogram transform ensembles (HTE) density estimator whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. On the theoretical side, by decomposing the error te
From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir sampling, thinki
Machine learning models have emerged as a very effective strategy to sidestep time-consuming electronic-structure calculations, enabling accurate simulations of greater size, time scale and complexity. Given the interpolative nature of these models,
Given a point set $Psubset mathbb{R}^d$, a kernel density estimation for Gaussian kernel is defined as $overline{mathcal{G}}_P(x) = frac{1}{left|Pright|}sum_{pin P}e^{-leftlVert x-p rightrVert^2}$ for any $xinmathbb{R}^d$. We study how to construct a