Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sequential Estimation of Convex Divergences using Reverse Submartingales and Exchangeable Filtrations

61 0 0.0 ( 0 )

Download Cite

Added by Tudor Manole

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Tudor Manole - Aaditya Ramdas

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a unified technique for sequential estimation of convex divergences between distributions, including integral probability metrics like the kernel maximum mean discrepancy, $varphi$-divergences like the Kullback-Leibler divergence, and optimal transport costs, such as powers of Wasserstein distances. The technical underpinnings of our approach lie in the observation that empirical convex divergences are (partially ordered) reverse submartingales with respect to the exchangeable filtration, coupled with maximal inequalities for such processes. These techniques appear to be powerful additions to the existing literature on both confidence sequences and convex divergences. We construct an offline-to-sequential device that converts a wide array of existing offline concentration inequalities into time-uniform confidence sequences that can be continuously monitored, providing valid inference at arbitrary stopping times. The resulting sequential bounds pay only an iterated logarithmic price over the corresponding fixed-time bounds, retaining the same dependence on problem parameters (like dimension or alphabet size if applicable).

rate research

Monotonically Decreasing Sequence of Divergences

53 - Tomohiro Nishiyama 2019

Divergences are quantities that measure discrepancy between two probability distributions and play an important role in various fields such as statistics and machine learning. Divergences are non-negative and are equal to zero if and only if two distributions are the same. In addition, some important divergences such as the f-divergence have convexity, which we call convex divergence. In this paper, we show new properties of the convex divergences by using integral and differential operators that we introduce. For the convex divergence, the result applied the integral or differential operator is also a divergence. In particular, the integral operator preserves convexity. Furthermore, the results applied the integral operator multiple times constitute a monotonically decreasing sequence of the convex divergences. We derive new sequences of the convex divergences that include the Kullback-Leibler divergence or the reverse Kullback-Leibler divergence from these properties.

Statistics Theory Information Theory Information Theory

Sequential Estimation of Network Cascades

67 - Anirudh Sridhar , H. Vincent Poor 2019

We consider the problem of locating the source of a network cascade, given a noisy time-series of network data. Initially, the cascade starts with one unknown, affected vertex and spreads deterministically at each time step. The goal is to find an adaptive procedure that outputs an estimate for the source as fast as possible, subject to a bound on the estimation error. For a general class of graphs, we describe a family of matrix sequential probability ratio tests (MSPRTs) that are first-order asymptotically optimal up to a constant factor as the estimation error tends to zero. We apply our results to lattices and regular trees, and show that MSPRTs are asymptotically optimal for regular trees. We support our theoretical results with simulations.

Statistics Theory Information Theory Social and Information Networks

Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

65 - Jeremiah Birrell , Markos A. Katsoulakis , Yannis Pantazis 2020

Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural-network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.

Machine Learning Information Theory Information Theory

Sequential Analysis in High Dimensional Multiple Testing and Sparse Recovery

414 - Matthew Malloy , Robert Nowak 2011

This paper studies the problem of high-dimensional multiple testing and sparse recovery from the perspective of sequential analysis. In this setting, the probability of error is a function of the dimension of the problem. A simple sequential testing procedure is proposed. We derive necessary conditions for reliable recovery in the non-sequential setting and contrast them with sufficient conditions for reliable recovery using the proposed sequential testing procedure. Applications of the main results to several commonly encountered models show that sequential testing can be exponentially more sensitive to the difference between the null and alternative distributions (in terms of the dependence on dimension), implying that subtle cases can be much more reliably determined using sequential methods.

Statistics Theory Information Theory Information Theory

A simple randomized algorithm for sequential prediction of ergodic time series

408 - L. Gyorfi , G. Lugosi , G. Morvai 2008

We present a simple randomized procedure for the prediction of a binary sequence. The algorithm uses ideas from recent developments of the theory of the prediction of individual sequences. We show that if the sequence is a realization of a stationary and ergodic random process then the average number of mistakes converges, almost surely, to that of the optimum, given by the Bayes predictor. The desirable finite-sample properties of the predictor are illustrated by its performance for Markov processes. In such cases the predictor exhibits near optimal behavior even without knowing the order of the Markov process. Prediction with side information is also considered.

Statistics Theory Information Theory Information Theory

comments

Fetching comments

Kalamoon Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sequential Estimation of Convex Divergences using Reverse Submartingales and Exchangeable Filtrations

Ask ChatGPT about the research

No Arabic abstract

Read More