أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Sebastiano Vigna

Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

307 - Sebastiano Vigna , Paolo Boldi 2016

Minimal-interval semantics associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfyin g the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithms for computing conjunction and disjunction that are linear in the number of intervals and logarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms. In all cases, space is linear in the number of operands. More importantly, we define a formal notion of optimal laziness, and either prove it, or prove its impossibility, for each algorithm. We cast our results in a general framework of antichains of intervals on total orders, making our algorithms directly applicable to other domains.

بنى وهياكل البيانات والخوارزميات استرجاع المعلومات

Supremum-Norm Convergence for Step-Asynchronous Successive Overrelaxation on M-matrices

95 - Sebastiano Vigna 2014

Step-asynchronous successive overrelaxation updates the values contained in a single vector using the usual Gauss-Seidel-like weighted rule, but arbitrarily mixing old and new values, the only constraint being temporal coherence: you cannot use a val ue before it has been computed. We show that given a nonnegative real matrix $A$, a $sigmageqrho(A)$ and a vector $boldsymbol w>0$ such that $Aboldsymbol wleqsigmaboldsymbol w$, every iteration of step-asynchronous successive overrelaxation for the problem $(sI- A)boldsymbol x=boldsymbol b$, with $s >sigma$, reduces geometrically the $boldsymbol w$-norm of the current error by a factor that we can compute explicitly. Then, we show that given a $sigma>rho(A)$ it is in principle always possible to compute such a $boldsymbol w$. This property makes it possible to estimate the supremum norm of the absolute error at each iteration without any additional hypothesis on $A$, even when $A$ is so large that computing the product $Aboldsymbol x$ is feasible, but estimating the supremum norm of $(sI-A)^{-1}$ is not.

بنى وهياكل البيانات والخوارزميات التحليل العددي

A Weighted Correlation Index for Rankings with Ties

149 - Sebastiano Vigna 2014

Understanding the correlation between two different scores for the same set of items is a common problem in information retrieval, and the most commonly used statistics that quantifies this correlation is Kendalls $tau$. However, the standard definit ion fails to capture that discordances between items with high rank are more important than those between items with low rank. Recently, a new measure of correlation based on average precision has been proposed to solve this problem, but like many alternative proposals in the literature it assumes that there are no ties in the scores. This is a major deficiency in a number of contexts, and in particular while comparing centrality scores on large graphs, as the obvious baseline, indegree, has a very large number of ties in web and social graphs. We propose to extend Kendalls definition in a natural way to take into account weights in the presence of ties. We prove a number of interesting mathematical properties of our generalization and describe an $O(nlog n)$ algorithm for its computation. We also validate the usefulness of our weighted measure of correlation using experimental data.

الشبكات الاجتماعية والمعلومات استرجاع المعلومات

Further scramblings of Marsaglias xorshift generators

128 - Sebastiano Vigna 2014

xorshift* generators are a variant of Marsaglias xorshift generators that eliminate linear artifacts typical of generators based on $mathbf Z/2mathbf Z$-linear operations using multiplication by a suitable constant. Shortly after high-dimensional xor shift* generators were introduced, Saito and Matsumoto suggested a different way to eliminate linear artifacts based on addition in $mathbf Z/2^{32}mathbf Z$, leading to the XSadd generator. Starting from the observation that the lower bits of XSadd are very weak, as its reverse fails systematically several statistical tests, we explore xorshift+, a variant of XSadd using 64-bit operations, which leads, in small dimension, to extremely fast high-quality generators.

بنى وهياكل البيانات والخوارزميات التشفير والأمن البرمجيات الرياضية

An experimental exploration of Marsaglias xorshift generators, scrambled

57 - Sebastiano Vigna 2014

Marsaglia proposed recently xorshift generators as a class of very fast, good-quality pseudorandom number generators. Subsequent analysis by Panneton and LEcuyer has lowered the expectations raised by Marsaglias paper, showing several weaknesses of s uch generators, verified experimentally using the TestU01 suite. Nonetheless, many of the weaknesses of xorshift generators fade away if their result is scrambled by a non-linear operation (as originally suggested by Marsaglia). In this paper we explore the space of possible generators obtained by multiplying the result of a xorshift generator by a suitable constant. We sample generators at 100 equispaced points of their state space and obtain detailed statistics that lead us to choices of parameters that improve on the current ones. We then explore for the first time the space of high-dimensional xorshift generators, following another suggestion in Marsaglias paper, finding choices of parameters providing periods of length $2^{1024} - 1$ and $2^{4096} - 1$. The resulting generators are of extremely high quality, faster than current similar alternatives, and generate long-period sequences passing strong statistical tests using only eight logical operations, one addition and one multiplication by a constant.

بنى وهياكل البيانات والخوارزميات التشفير والأمن البرمجيات الرياضية

Fibonacci Binning

87 - Sebastiano Vigna 2013

This note argues that when dot-plotting distributions typically found in papers about web and social networks (degree distributions, component-size distributions, etc.), and more generally distributions that have high variability in their tail, an ex ponentially binned version should always be plotted, too, and suggests Fibonacci binning as a visually appealing, easy-to-use and practical choice.

الشبكات الاجتماعية والمعلومات الفيزياء والمجتمع

Stanford Matrix Considered Harmful

85 - Sebastiano Vigna 2007

This note argues about the validity of web-graph data used in the literature.

استرجاع المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد