ترغب بنشر مسار تعليمي؟ اضغط هنا

Sequential Estimation of Convex Divergences using Reverse Submartingales and Exchangeable Filtrations

61   0   0.0 ( 0 )
 نشر من قبل Tudor Manole
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a unified technique for sequential estimation of convex divergences between distributions, including integral probability metrics like the kernel maximum mean discrepancy, $varphi$-divergences like the Kullback-Leibler divergence, and optimal transport costs, such as powers of Wasserstein distances. The technical underpinnings of our approach lie in the observation that empirical convex divergences are (partially ordered) reverse submartingales with respect to the exchangeable filtration, coupled with maximal inequalities for such processes. These techniques appear to be powerful additions to the existing literature on both confidence sequences and convex divergences. We construct an offline-to-sequential device that converts a wide array of existing offline concentration inequalities into time-uniform confidence sequences that can be continuously monitored, providing valid inference at arbitrary stopping times. The resulting sequential bounds pay only an iterated logarithmic price over the corresponding fixed-time bounds, retaining the same dependence on problem parameters (like dimension or alphabet size if applicable).

قيم البحث

اقرأ أيضاً

53 - Tomohiro Nishiyama 2019
Divergences are quantities that measure discrepancy between two probability distributions and play an important role in various fields such as statistics and machine learning. Divergences are non-negative and are equal to zero if and only if two dist ributions are the same. In addition, some important divergences such as the f-divergence have convexity, which we call convex divergence. In this paper, we show new properties of the convex divergences by using integral and differential operators that we introduce. For the convex divergence, the result applied the integral or differential operator is also a divergence. In particular, the integral operator preserves convexity. Furthermore, the results applied the integral operator multiple times constitute a monotonically decreasing sequence of the convex divergences. We derive new sequences of the convex divergences that include the Kullback-Leibler divergence or the reverse Kullback-Leibler divergence from these properties.
We consider the problem of locating the source of a network cascade, given a noisy time-series of network data. Initially, the cascade starts with one unknown, affected vertex and spreads deterministically at each time step. The goal is to find an ad aptive procedure that outputs an estimate for the source as fast as possible, subject to a bound on the estimation error. For a general class of graphs, we describe a family of matrix sequential probability ratio tests (MSPRTs) that are first-order asymptotically optimal up to a constant factor as the estimation error tends to zero. We apply our results to lattices and regular trees, and show that MSPRTs are asymptotically optimal for regular trees. We support our theoretical results with simulations.
Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine l earning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural-network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.
This paper studies the problem of high-dimensional multiple testing and sparse recovery from the perspective of sequential analysis. In this setting, the probability of error is a function of the dimension of the problem. A simple sequential testing procedure is proposed. We derive necessary conditions for reliable recovery in the non-sequential setting and contrast them with sufficient conditions for reliable recovery using the proposed sequential testing procedure. Applications of the main results to several commonly encountered models show that sequential testing can be exponentially more sensitive to the difference between the null and alternative distributions (in terms of the dependence on dimension), implying that subtle cases can be much more reliably determined using sequential methods.
117 - L. Gyorfi , G. Lugosi , G. Morvai 2008
We present a simple randomized procedure for the prediction of a binary sequence. The algorithm uses ideas from recent developments of the theory of the prediction of individual sequences. We show that if the sequence is a realization of a stationary and ergodic random process then the average number of mistakes converges, almost surely, to that of the optimum, given by the Bayes predictor. The desirable finite-sample properties of the predictor are illustrated by its performance for Markov processes. In such cases the predictor exhibits near optimal behavior even without knowing the order of the Markov process. Prediction with side information is also considered.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا