No Arabic abstract
Researchers and practitioners across many disciplines have recently adopted computational notebooks to develop, document, and share their scientific workflows - and the GIS community is no exception. This chapter introduces computational notebooks in the geographical context. It begins by explaining the computational paradigm and philosophy that underlie notebooks. Next it unpacks their architecture to illustrate a notebook users typical workflow. Then it discusses the main benefits notebooks offer GIS researchers and practitioners, including better integration with modern software, more natural access to new forms of data, and better alignment with the principles and benefits of open science. In this context, it identifies notebooks as the glue that binds together a broader ecosystem of open source packages and transferable platforms for computational geography. The chapter concludes with a brief illustration of using notebooks for a set of basic GIS operations. Compared to traditional desktop GIS, notebooks can make spatial analysis more nimble, extensible, and reproducible and have thus evolved into an important component of the geospatial science toolkit.
Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations, which leads to challenges in sharing their notebooks with others and future selves. Inspired by human documentation practices from analyzing 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore the Human-AI Collaboration opportunity in the code documentation scenario. Themisto facilitates the creation of different types of documentation via three approaches: a deep-learning-based approach to generate documentation for source code (fully automated), a query-based approach to retrieve the online API documentation for source code (fully automated), and a user prompt approach to motivate users to write more documentation (semi-automated). We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants satisfaction with their computational notebook.
Desktop GIS applications, such as ArcGIS and QGIS, provide tools essential for conducting suitability analysis, an activity that is central in formulating a land-use plan. But, when it comes to building complicated land-use suitability models, these applications have several limitations, including operating system-dependence, lack of dedicated modules, insufficient reproducibility, and difficult, if not impossible, deployment on a computing cluster. To address the challenges, this paper introduces PyLUSAT: Python for Land Use Suitability Analysis Tools. PyLUSAT is an open-source software package that provides a series of tools (functions) to conduct various tasks in a suitability modeling workflow. These tools were evaluated against comparable tools in ArcMap 10.4 with respect to both accuracy and computational efficiency. Results showed that PyLUSAT functions were two to ten times more efficient depending on the jobs complexity, while generating outputs with similar accuracy compared to the ArcMap tools. PyLUSAT also features extensibility and cross-platform compatibility. It has been used to develop fourteen QGIS Processing Algorithms and implemented on a high-performance computational cluster (HiPerGator at the University of Florida) to expedite the process of suitability analysis. All these properties make PyLUSAT a competitive alternative solution for urban planners/researchers to customize and automate suitability analysis as well as integrate the technique into a larger analytical framework.
This article sets out our perspective on how to begin the journey of decolonising computational fields, such as data and cognitive sciences. We see this struggle as requiring two basic steps: a) realisation that the present-day system has inherited, and still enacts, hostile, conservative, and oppressive behaviours and principles towards women of colour (WoC); and b) rejection of the idea that centering individual people is a solution to system-level problems. The longer we ignore these two steps, the more our academic system maintains its toxic structure, excludes, and harms Black women and other minoritised groups. This also keeps the door open to discredited pseudoscience, like eugenics and physiognomy. We propose that grappling with our fields histories and heritage holds the key to avoiding mistakes of the past. For example, initiatives such as diversity boards can still be harmful because they superficially appear reformatory but nonetheless center whiteness and maintain the status quo. Building on the shoulders of many WoCs work, who have been paving the way, we hope to advance the dialogue required to build both a grass-roots and a top-down re-imagining of computational sciences -- including but not limited to psychology, neuroscience, cognitive science, computer science, data science, statistics, machine learning, and artificial intelligence. We aspire for these fields to progress away from their stagnant, sexist, and racist shared past into carving and maintaining an ecosystem where both a diverse demographics of researchers and scientific ideas that critically challenge the status quo are welcomed.
Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.
Since the very first detection of gravitational waves from the coalescence of two black holes in 2015, Bayesian statistical methods have been routinely applied by LIGO and Virgo to extract the signal out of noisy interferometric measurements, obtain point estimates of the physical parameters responsible for producing the signal, and rigorously quantify their uncertainties. Different computational techniques have been devised depending on the source of the gravitational radiation and the gravitational waveform model used. Prominent sources of gravitational waves are binary black hole or neutron star mergers, the only objects that have been observed by detectors to date. But also gravitational waves from core collapse supernovae, rapidly rotating neutron stars, and the stochastic gravitational wave background are in the sensitivity band of the ground-based interferometers and expected to be observable in future observation runs. As nonlinearities of the complex waveforms and the high-dimensional parameter spaces preclude analytic evaluation of the posterior distribution, posterior inference for all these sources relies on computer-intensive simulation techniques such as Markov chain Monte Carlo methods. A review of state-of-the-art Bayesian statistical parameter estimation methods will be given for researchers in this cross-disciplinary area of gravitational wave data analysis.