ﻻ يوجد ملخص باللغة العربية
Exploratory data science largely happens in computational notebooks with dataframe API, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for accelerating visual insight discovery in data science workflows. When users print a dataframe in their notebooks, Lux recommends visualizations to provide a quick overview of the patterns and trends and suggests promising analysis directions. Lux features a high-level language for generating visualizations on-demand to encourage rapid visual experimentation with data. We demonstrate that through the use of a careful design and three system optimizations, Lux adds no more than two seconds of overhead on top of pandas for over 98% of datasets in the UCI repository. We evaluate Lux in terms of usability via a controlled first-use study and interviews with early adopters, finding that Lux helps fulfill the needs of data scientists for visualization support within their dataframe workflows. Lux has already been embraced by data science practitioners, with over 1.9k stars on Github within its first 15 months.
As the rate of data collection continues to grow rapidly, developing visualization tools that scale to immense data sets is a serious and ever-increasing challenge. Existing approaches generally seek to decouple storage and visualization systems, per
Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of
Visual query systems (VQSs) empower users to interactively search for line charts with desired visual patterns, typically specified using intuitive sketch-based interfaces. Despite decades of past work on VQSs, these efforts have not translated to ad
This paper proposes a composable Just in Time Architecture for Data Science (DS) Pipelines named JITA-4DS and associated resource management techniques for configuring disaggregated data centers (DCs). DCs under our approach are composable based on v
Data scientists across disciplines are increasingly in need of exploratory analysis tools for data sets with a high volume of features. We expand upon graph mining approaches for exploratory analysis of high-dimensional data to introduce Sirius, a vi