ﻻ يوجد ملخص باللغة العربية
The count-min sketch (CMS) is a time and memory efficient randomized data structure that provides estimates of tokens frequencies in a data stream, i.e. point queries, based on random hashed data. Learning-augmented CMSs improve the CMS by learning models that allow to better exploit data properties. In this paper, we focus on the learning-augmented CMS of Cai, Mitzenmacher and Adams (textit{NeurIPS} 2018), which relies on Bayesian nonparametric (BNP) modeling of a data stream via Dirichlet process (DP) priors. This is referred to as the CMS-DP, and it leads to BNP estimates of a point query as posterior means of the point query given the hashed data. While BNPs is proved to be a powerful tool for developing robust learning-augmented CMSs, ideas and methods behind the CMS-DP are tailored to point queries under DP priors, and they can not be used for other priors or more general queries. In this paper, we present an alternative, and more flexible, derivation of the CMS-DP such that: i) it allows to make use of the Pitman-Yor process (PYP) prior, which is arguably the most popular generalization of the DP prior; ii) it can be readily applied to the more general problem of estimating range queries. This leads to develop a novel learning-augmented CMS under power-law data streams, referred to as the CMS-PYP, which relies on BNP modeling of the stream via PYP priors. Applications to synthetic and real data show that the CMS-PYP outperforms the CMS and the CMS-DP in the estimation of low-frequency tokens; this known to be a critical feature in natural language processing, where it is indeed common to encounter power-law data streams.
Bayesian network structure learning algorithms with limited data are being used in domains such as systems biology and neuroscience to gain insight into the underlying processes that produce observed data. Learning reliable networks from limited data
Much of the data being created on the web contains interactions between users and items. Stochastic blockmodels, and other methods for community detection and clustering of bipartite graphs, can infer latent user communities and latent item clusters
We introduce a novel framework for the estimation of the posterior distribution over the weights of a neural network, based on a new probabilistic interpretation of adaptive optimisation algorithms such as AdaGrad and Adam. We demonstrate the effecti
We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian Structural Inference (BSI) relies on a set of candidate unifilar HMM (uHMM) topologies for inference of process structure from
In this paper, we propose the multivariate quantile Bayesian structural time series (MQBSTS) model for the joint quantile time series forecast, which is the first such model for correlated multivariate time series to the authors best knowledge. The M