Subscribe to the gold package and get unlimited access to Shamra Academy

Enhancing the Interactivity of Dataframe Queries by Leveraging Think Time

157 0 0.0 ( 0 )

Download Cite

Added by Doris Xin

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Doris Xin - Devin Petersohn - Dixin Tang

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We propose opportunistic evaluation, a framework for accelerating interactions with dataframes. Interactive latency is critical for iterative, human-in-the-loop dataframe workloads for supporting exploratory data analysis. Opportunistic evaluation significantly reduces interactive latency by 1) prioritizing computation directly relevant to the interactions and 2) leveraging think time for asynchronous background computation for non-critical operators that might be relevant to future interactions. We show, through empirical analysis, that current user behavior presents ample opportunities for optimization, and the solutions we propose effectively harness such opportunities.

rate research

Towards Scalable Dataframe Systems

132 - Devin Petersohn , Stephen Macke , Doris Xin 2020

Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the remarkable success of dataframe libraries in Rand Python, dataframes face performance issues even on moderately large datasets. Moreover, there is significant ambiguity regarding dataframe semantics. In this paper we lay out a vision and roadmap for scalable dataframe systems. To demonstrate the potential in this area, we report on our experience building MODIN, a scaled-up implementation of the most widely-used and complex dataframe API today, Pythons pandas. With pandas as a reference, we propose a simple data model and algebra for dataframes to ground discussion in the field. Given this foundation, we lay out an agenda of open research opportunities where the distinct features of dataframes will require extending the state of the art in many dimensions of data management. We discuss the implications of signature data-frame features including flexible schemas, ordering, row/column equivalence, and data/metadata fluidity, as well as the piecemeal, trial-and-error-based approach to interacting with dataframes.

Databases

Enhancing XML Data Warehouse Query Performance by Fragmentation

627 - Hadj Mahboubi 2009

XML data warehouses form an interesting basis for decision-support applications that exploit heterogeneous data from multiple sources. However, XML-native database systems currently suffer from limited performances in terms of manageable data volume and response time for complex analytical queries. Fragmenting and distributing XML data warehouses (e.g., on data grids) allow to address both these issues. In this paper, we work on XML warehouse fragmentation. In relational data warehouses, several studies recommend the use of derived horizontal fragmentation. Hence, we propose to adapt it to the XML context. We particularly focus on the initial horizontal fragmentation of dimensions XML documents and exploit two alternative algorithms. We experimentally validate our proposal and compare these alternatives with respect to a unified XML warehouse model we advocate for.

Databases

Efficient Processing of Reachability and Time-Based Path Queries in a Temporal Graph

80 - Huanhuan Wu , Yuzhen Huang , James Cheng 2016

A temporal graph is a graph in which vertices communicate with each other at specific time, e.g., $A$ calls $B$ at 11 a.m. and talks for 7 minutes, which is modeled by an edge from $A$ to $B$ with starting time 11 a.m. and duration 7 mins. Temporal graphs can be used to model many networks with time-related activities, but efficient algorithms for analyzing temporal graphs are severely inadequate. We study fundamental problems such as answering reachability and time-based path queries in a temporal graph, and propose an efficient indexing technique specifically designed for processing these queries in a temporal graph. Our results show that our method is efficient and scalable in both index construction and query processing.

Databases

Assessing Achievability of Queries and Constraints

204 - Rada Chirkova , Jon Doyle , Juan L. Reutter 2017

Assessing and improving the quality of data in data-intensive systems are fundamental challenges that have given rise to numerous applications targeting transformation and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between the tools that address issues in these areas. Our focus is on the problem of determining whether there exist sequences of data-transforming procedures that, when applied to the (untransformed) input data, would yield data satisfying the conditions required for performing the task in question. Our goal is to develop a framework that would address this problem, starting with the relational setting. In this paper we abstract data-processing tools as black-box procedures. This abstraction describes procedures by a specification of which parts of the database might be modified by the procedure, as well as by the constraints that specify the required states of the database before and after applying the procedure. We then proceed to study fundamental algorithmic questions arising in this context, such as understanding when one can guarantee that sequences of procedures apply to original or transformed data, when they succeed at improving the data, and when knowledge bases can represent the outcomes of procedures. Finally, we turn to the problem of determining whether the application of a sequence of procedures to a database results in the satisfaction of properties specified by either queries or constraints. We show that this problem is decidable for some broad and realistic classes of procedures and properties, even when procedures are allowed to alter the schema of instances.

Databases

Efficient Approximation of Well-Designed SPARQL Queries

162 - Zhenyu Song , Zhiyong Feng , Xiaowang Zhang 2016

Query response time often influences user experience in the real world. However, it possibly takes more time to answer a query with its all exact solutions, especially when it contains the OPT operations since the OPT operation is the least conventional operator in SPARQL. So it becomes essential to make a trade-off between the query response time and the accuracy of their solutions. In this paper, based on the depth of the OPT operation occurring in a query, we propose an approach to obtain its all approximate queries with less depth of the OPT operation. This paper mainly discusses those queries with well-designed patterns since the OPT operation in a well-designed pattern is really optional. Firstly, we transform a well-designed pattern in OPT normal form into a well-designed tree, whose inner nodes are labeled by OPT operation and leaf nodes are labeled by patterns containing other operations such as the AND operation and the FILTER operation. Secondly, based on this well-designed tree, we remove optional well-designed subtrees with less depth of the OPT operation and then obtain approximate queries with different depths of the OPT operation. Finally, we evaluate the approximate query efficiency with the degree of approximation.

Databases

comments

Fetching comments

Peninsula Private University

Additional details More universities

Enhancing the Interactivity of Dataframe Queries by Leveraging Think Time

Ask ChatGPT about the research

No Arabic abstract

Read More