ﻻ يوجد ملخص باللغة العربية
Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.
We leverage the powerful lossy image compression algorithm BPG to build a lossless image compression system. Specifically, the original image is first decomposed into the lossy reconstruction obtained after compressing it with BPG and the correspondi
We present a record-breaking result and lessons learned in practicing TPCx-IoT benchmarking for a real-world use case. We find that more system characteristics need to be benchmarked for its application to real-world use cases. We introduce an extens
Relational datasets are being generated at an alarmingly rapid rate across organizations and industries. Compressing these datasets could significantly reduce storage and archival costs. Traditional compression algorithms, e.g., gzip, are suboptimal
Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracles
System logs record detailed runtime information of software systems and are used as the main data source for many tasks around software engineering. As modern software systems are evolving into large scale and complex structures, logs have become one