Meeting in the notebook: a notebook-based environment for micro-submissions in data science collaborations

81 0 0.0 ( 0 )

Download Cite

Added by Micah Smith

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Micah J. Smith - Jurgen Cito - Kalyan Veeramachaneni

Human-Computer Interaction

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Developers in data science and other domains frequently use computational notebooks to create exploratory analyses and prototype models. However, they often struggle to incorporate existing software engineering tooling into these notebook-based workflows, leading to fragile development processes. We introduce Assembl{e}, a new development environment for collaborative data science projects, in which promising code fragments of data science pipelines can be contributed as pull requests to an upstream repository entirely from within JupyterLab, abstracting away low-level version control tool usage. We describe the design and implementation of Assembl{e} and report on a user study of 23 data scientists.

rate research

Notebook articles: towards a transformative publishing experience in nonlinear science

153 - Cristel Chandre 2021

Open Science, Reproducible Research, Findable, Accessible, Interoperable and Reusable (FAIR) data principles are long term goals for scientific dissemination. However, the implementation of these principles calls for a reinspection of our means of dissemination. In our viewpoint, we discuss and advocate, in the context of nonlinear science, how a notebook article represents an essential step toward this objective by fully embracing cloud computing solutions. Notebook articles as scholar articles offer an alternative, efficient and more ethical way to disseminate research through their versatile environment. This format invites the readers to delve deeper into the reported research. Through the interactivity of the notebook articles, research results such as for instance equations and figures are reproducible even for non-expert readers. The codes and methods are available, in a transparent manner, to interested readers. The methods can be reused and adapted to answer additional questions in related topics. The codes run on cloud computing services, which provide easy access, even to low-income countries and research groups. The versatility of this environment provides the stakeholders - from the researchers to the publishers - with opportunities to disseminate the research results in innovative ways.

Digital Libraries

Fine-Grained Lineage for Safer Notebook Interactions

170 - Stephen Macke , Hongpu Gong , Doris Jung-Lin Lee 2020

Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called cells, notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the notebooks visible code, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafetys ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than $7times$ more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.

Software Engineering Databases Human-Computer Interaction

Testing with Jupyter notebooks: NoteBook VALidation (nbval) plug-in for pytest

190 - Hans Fangohr , Vidar Fauske , Thomas Kluyver 2020

The Notebook validation tool nbval allows to load and execute Python code from a Jupyter notebook file. While computing outputs from the cells in the notebook, these outputs are compared with the outputs saved in the notebook file, treating each cell as a test. Deviations are reported as test failures, with various configuration options available to control the behaviour. Application use cases include the validation of notebook-based documentation, tutorials and textbooks, as well as the use of notebooks as additional unit, integration and system tests for the libraries that are used in the notebook. Nbval is implemented as a plugin for the pytest testing software.

Software Engineering

SceneAR: Scene-based Micro Narratives for Sharing and Remixing in Augmented Reality

95 - Mengyu Chen , Andres Monroy-Hernandez , Misha Sra 2021

Short-form digital storytelling has become a popular medium for millions of people to express themselves. Traditionally, this medium uses primarily 2D media such as text (e.g., memes), images (e.g., Instagram), gifs (e.g., Giphy), and videos (e.g., TikTok, Snapchat). To expand the modalities from 2D to 3D media, we present SceneAR, a smartphone application for creating sequential scene-based micro narratives in augmented reality (AR). What sets SceneAR apart from prior work is the ability to share the scene-based stories as AR content -- no longer limited to sharing images or videos, these narratives can now be experienced in peoples own physical environments. Additionally, SceneAR affords users the ability to remix AR, empowering them to build-upon others creations collectively. We asked 18 people to use SceneAR in a 3-day study. Based on user interviews, analysis of screen recordings, and the stories they created, we extracted three themes. From those themes and the study overall, we derived six strategies for designers interested in supporting short-form AR narratives.

Human-Computer Interaction

Human-AI Collaboration in Data Science: Exploring Data Scientists Perceptions of Automated AI

329 - Dakuo Wang , Justin D. Weisz , Michael Muller 2019

The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.

Human-Computer Interaction Artificial Intelligence Machine Learning