أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Peter Clifford

Pattern Matching under Polynomial Transformation

307 - Ayelet Butman , Peter Clifford , Raphael Clifford 2011

We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where applicat ion specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algorithms and the first lower bounds for both new and old problems. Given a pattern of length m and a longer text of length n where both are assumed to contain integer values only, we first show O(n log m) time algorithms for pattern matching under linear transformations even when wildcard symbols can occur in the input. We then show how to extend the technique to polynomial transformations of arbitrary degree. Next we consider the problem of finding the minimum Hamming distance under polynomial transformation. We show that, for any epsilon>0, there cannot exist an O(n m^(1-epsilon)) time algorithm for additive and linear transformations conditional on the hardness of the classic 3SUM problem. Finally, we consider a version of the Hamming distance problem under additive transformations with a bound k on the maximum distance that need be reported. We give a deterministic O(nk log k) time solution which we then improve by careful use of randomisation to O(n sqrt(k log k) log n) time for sufficiently small k. Our randomised solution outputs the correct answer at every position with high probability.

بنى وهياكل البيانات والخوارزميات

A statistical analysis of probabilistic counting algorithms

515 - Peter Clifford , Ioana A. Cosma 2010

This paper considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form low-dimensional data sk etches. We apply conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections. We derive estimators of the cardinality in both cases, and show that the maximal-term estimator is recursively computable and has exponentially decreasing error bounds. Furthermore, we show that the estimators have comparable asymptotic efficiency, and explain this result by demonstrating an unexpected connection between the two approaches.

حساب

A simple sketching algorithm for entropy estimation

332 - Peter Clifford 2009

We consider the problem of approximating the empirical Shannon entropy of a high-frequency data stream under the relaxed strict-turnstile model, when space limitations make exact computation infeasible. An equivalent measure of entropy is the Renyi e ntropy that depends on a constant alpha. This quantity can be estimated efficiently and unbiasedly from a low-dimensional synopsis called an alpha-stable data sketch via the method of compressed counting. An approximation to the Shannon entropy can be obtained from the Renyi entropy by taking alpha sufficiently close to 1. However, practical guidelines for parameter calibration with respect to alpha are lacking. We avoid this problem by showing that the random variables used in estimating the Renyi entropy can be transformed to have a proper distributional limit as alpha approaches 1: the maximally skewed, strictly stable distribution with alpha = 1 defined on the entire real line. We propose a family of asymptotically unbiased log-mean estimators of the Shannon entropy, indexed by a constant zeta > 0, that can be computed in a single-pass algorithm to provide an additive approximation. We recommend the log-mean estimator with zeta = 1 that has exponentially decreasing tail bounds on the error probability, asymptotic relative efficiency of 0.932, and near-optimal computational complexity.

حساب نظرية الإحصاء نظرية الإحصاء

Efficient l_{alpha} Distance Approximation for High Dimensional Data Using alpha-Stable Projection

35 - Peter Clifford , Ioana A. Cosma 2008

In recent years, large high-dimensional data sets have become commonplace in a wide range of applications in science and commerce. Techniques for dimension reduction are of primary concern in statistical analysis. Projection methods play an important role. We investigate the use of projection algorithms that exploit properties of the alpha-stable distributions. We show that l_{alpha} distances and quasi-distances can be recovered from random projections with full statistical efficiency by L-estimation. The computational requirements of our algorithm are modest; after a once-and-for-all calculation to determine an array of length k, the algorithm runs in O(k) time for each distance, where k is the reduced dimension of the projection.

حساب

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد