ترغب بنشر مسار تعليمي؟ اضغط هنا

Themisto: Towards Automated Documentation Generation in Computational Notebooks

96   0   0.0 ( 0 )
 نشر من قبل April Wang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations, which leads to challenges in sharing their notebooks with others and future selves. Inspired by human documentation practices from analyzing 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore the Human-AI Collaboration opportunity in the code documentation scenario. Themisto facilitates the creation of different types of documentation via three approaches: a deep-learning-based approach to generate documentation for source code (fully automated), a query-based approach to retrieve the online API documentation for source code (fully automated), and a user prompt approach to motivate users to write more documentation (semi-automated). We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants satisfaction with their computational notebook.



قيم البحث

اقرأ أيضاً

Researchers and practitioners across many disciplines have recently adopted computational notebooks to develop, document, and share their scientific workflows - and the GIS community is no exception. This chapter introduces computational notebooks in the geographical context. It begins by explaining the computational paradigm and philosophy that underlie notebooks. Next it unpacks their architecture to illustrate a notebook users typical workflow. Then it discusses the main benefits notebooks offer GIS researchers and practitioners, including better integration with modern software, more natural access to new forms of data, and better alignment with the principles and benefits of open science. In this context, it identifies notebooks as the glue that binds together a broader ecosystem of open source packages and transferable platforms for computational geography. The chapter concludes with a brief illustration of using notebooks for a set of basic GIS operations. Compared to traditional desktop GIS, notebooks can make spatial analysis more nimble, extensible, and reproducible and have thus evolved into an important component of the geospatial science toolkit.
155 - Xuye Liu , Dakuo Wang , April Wang 2021
Jupyter notebook allows data scientists to write machine learning code together with its documentation in cells. In this paper, we propose a new task of code documentation generation (CDG) for computational notebooks. In contrast to the previous CDG tasks which focus on generating documentation for single code snippets, in a computational notebook, one documentation in a markdown cell often corresponds to multiple code cells, and these code cells have an inherent structure. We proposed a new model (HAConvGNN) that uses a hierarchical attention mechanism to consider the relevant code cells and the relevant code tokens information when generating the documentation. Tested on a new corpus constructed from well-documented Kaggle notebooks, we show that our model outperforms other baseline models.
144 - Yang Bai , Yu Guan , Jian Qing Shi 2021
Fatigue is a broad, multifactorial concept that includes the subjective perception of reduced physical and mental energy levels. It is also one of the key factors that strongly affect patients health-related quality of life. To date, most fatigue ass essment methods were based on self-reporting, which may suffer from many factors such as recall bias. To address this issue, in this work, we recorded multi-modal physiological data (including ECG, accelerometer, skin temperature and respiratory rate, as well as demographic information such as age, BMI) in free-living environments and developed automated fatigue assessment models. Specifically, we extracted features from each modality and employed the random forest-based mixed-effects models, which can take advantage of the demographic information for improved performance. We conducted experiments on our collected dataset, and very promising preliminary results were achieved. Our results suggested ECG played an important role in the fatigue assessment tasks.
92 - Will Crichton 2020
Automatic documentation generation tools, or auto docs, are widely used to visualize information about APIs. However, each auto doc tool comes with its own unique representation of API information. In this paper, I use an information visualization an alysis of auto docs to generate potential design principles for improving their usability. Developers use auto docs as a reference by looking up relevant API primitives given partial information, or leads, about its name, type, or behavior. I discuss how auto docs can better support searching and scanning on these leads, e.g. by providing more information-dense visualizations of method signatures.
Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN features outperform low-level audio-visual emotion descriptors upon extensive experimentation; and (iii) demonstrates how enhanced affect prediction facilitates computational advertising, and leads to better viewing experience while watching an online video stream embedded with ads based on a study involving 17 users. We model ad emotions based on subjective human opinions as well as objective multimodal features, and show how effectively modeling ad emotions can positively impact a real-life application.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا