Batched Predecessor and Sorting with Size-Priced Information in External Memory

477 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mayank Goswami

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Michael A. Bender - Mayank Goswami - Dzejla Mededovic

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In the unit-cost comparison model, a black box takes an input two items and outputs the result of the comparison. Problems like sorting and searching have been studied in this model, and it has been generalized to include the concept of priced information, where different pairs of items (say database records) have different comparison costs. These comparison costs can be arbitrary (in which case no algorithm can be close to optimal (Charikar et al. STOC 2000)), structured (for example, the comparison cost may depend on the length of the databases (Gupta et al. FOCS 2001)), or stochastic (Angelov et al. LATIN 2008). Motivated by the database setting where the cost depends on the sizes of the items, we consider the problems of sorting and batched predecessor where two non-uniform sets of items $A$ and $B$ are given as input. (1) In the RAM setting, we consider the scenario where both sets have $n$ keys each. The cost to compare two items in $A$ is $a$, to compare an item of $A$ to an item of $B$ is $b$, and to compare two items in $B$ is $c$. We give upper and lower bounds for the case $a le b le c$. Notice that the case $b=1, a=c=infty$ is the famous ``nuts and bolts problem. (2) In the Disk-Access Model (DAM), where transferring elements between disk and internal memory is the main bottleneck, we consider the scenario where elements in $B$ are larger than elements in $A$. The larger items take more I/Os to be brought into memory, consume more space in internal memory, and are required in their entirety for comparisons. We first give output-sensitive lower and upper bounds on the batched predecessor problem, and use these to derive bounds on the complexity of sorting in the two models. Our bounds are tight in most cases, and require novel generalizations of the classical lower bound techniques in external memory to accommodate the non-uniformity of keys.

قيم البحث

163 - Paolo Ferragina , Travis Gagie , Giovanni Manzini 2009

In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size $n$, they use only ${n} $ bits of disk working space while all previous approaches use $Th{n log n}$ bits of disk working space. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the BWT that uses $Th{n}$ bits of working space, and a lightweight {em internal-memory} algorithm for computing the BWT which is the fastest in the literature when the available working space is $os{n}$ bits. Finally, we prove {em lower} bounds on the complexity of computing and inverting the BWT via sequential scans in terms of the classic product: internal-memory space $times$ number of passes over the disk data.

بنى وهياكل البيانات والخوارزميات

External Memory Planar Point Location with Fast Updates

86 - John Iacono , Ben Karsin , Grigorios Koumoutsos 2019

We study dynamic planar point location in the External Memory Model or Disk Access Model (DAM). Previous work in this model achieves polylog query and polylog amortized update time. We present a data structure with $O( log_B^2 N)$ query time and $O(f rac{1}{ B^{1-epsilon}} log_B N)$ amortized update time, where $N$ is the number of segments, $B$ the block size and $epsilon$ is a small positive constant, under the assumption that all faces have constant size. This is a $B^{1-epsilon}$ factor faster for updates than the fastest previous structure, and brings the cost of insertion and deletion down to subconstant amortized time for reasonable choices of $N$ and $B$. Our structure solves the problem of vertical ray-shooting queries among a dynamic set of interior-disjoint line segments; this is well-known to solve dynamic planar point location for a connected subdivision of the plane with faces of constant size.

بنى وهياكل البيانات والخوارزميات

Validating XML Documents in the Streaming Model with External Memory

120 - Christian Konrad , Frederic Magniez 2010

We study the problem of validating XML documents of size $N$ against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which $p$-pass streaming algorithms require $Omega(N/p)$ space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this problem with memory space $O(log^2 N)$, a constant number of auxiliary read/write streams, and $O(log N)$ total number of passes on the XML document and auxiliary streams. An important intermediate step of this algorithm is the computation of the First-Child-Next-Sibling (FCNS) encoding of the initial XML document in a streaming fashion. We study this problem independently, and we also provide memory efficient streaming algorithms for decoding an XML document given in its FCNS encoding. Furthermore, validating XML documents encoding binary trees in the usual streaming model without external memory can be done with sublinear memory. There is a one-pass algorithm using $O(sqrt{N log N})$ space, and a bidirectional two-pass algorithm using $O(log^2 N)$ space performing this task.

بنى وهياكل البيانات والخوارزميات قواعد البيانات

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

137 - Satya P. Jammy , Christian T. Jacobs , Neil D. Sandham 2016

Future architectures designed to deliver exascale performance motivate the need for novel algorithmic changes in order to fully exploit their capabilities. In this paper, the performance of several numerical algorithms, characterised by varying degre es of memory and computational intensity, are evaluated in the context of finite difference methods for fluid dynamics problems. It is shown that, by storing some of the evaluated derivatives as single thread- or process-local variables in memory, or recomputing the derivatives on-the-fly, a speed-up of ~2 can be obtained compared to traditional algorithms that store all derivatives in global arrays.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية البرمجيات الرياضية

Query-Competitive Sorting with Uncertainty

102 - Magnus M. Halldorsson ICE-TCS , n Department of Computer Science 2020

We study the problem of sorting under incomplete information, when queries are used to resolve uncertainties. Each of $n$ data items has an unknown value, which is known to lie in a given interval. We can pay a query cost to learn the actual value, a nd we may allow an error threshold in the sorting. The goal is to find a nearly-sorted permutation by performing a minimum-cost set of queries. We show that an offline optimum query set can be found in poly time, and that both oblivious and adaptive problems have simple competitive algorithms. The competitive ratio for the oblivious problem is $n$ for uniform query costs, and unbounded for arbitrary costs; for the adaptive problem, the ratio is 2. We present a unified adaptive strategy for uniform costs that yields the following improved results: (1) a 3/2-competitive randomized algorithm; (2) a 5/3-competitive deterministic algorithm if the dependency graph has no 2-components after some preprocessing, which has competitive ratio $3/2+mathrm{O}(1/k)$ if the components obtained have size at least $k$; and (3) an exact algorithm for laminar families of intervals. The first two results have matching lower bounds, and we have a lower bound of 7/5 for large components. We also give a randomized adaptive algorithm with competitive ratio $1+frac{4}{3sqrt{3}}approx 1.7698$ for arbitrary query costs, and we show that the 2-competitive deterministic adaptive algorithm can be generalized for queries returning intervals and for a more general vertex cover problem, by using the local ratio technique. Moreover, we prove that the advice complexity of the adaptive problem is $lfloor n/2rfloor$ if no error threshold is allowed, and $lceil n/3cdotlg 3rceil$ for the general case. Finally, we present some graph-theoretical results on co-threshold tolerance graphs, and we discuss uncertainty variants of some classical interval problems.

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات