No Arabic abstract
Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.
There are many normative and technical questions involved in evaluating the quality of software used in epidemiological simulations. In this paper we answer some of these questions and offer practical guidance to practitioners, funders, scientific journals, and consumers of epidemiological research. The heart of our paper is a case study of the Imperial College London (ICL) COVID-19 simulator. We contend that epidemiological simulators should be engineered and evaluated within the framework of safety-critical standards developed by the consensus of the software engineering community for applications such as automotive and aircraft control.
Recent advances in artificial intelligence (AI) have lead to an explosion of multimedia applications (e.g., computer vision (CV) and natural language processing (NLP)) for different domains such as commercial, industrial, and intelligence. In particular, the use of AI applications in a national security environment is often problematic because the opaque nature of the systems leads to an inability for a human to understand how the results came about. A reliance on black boxes to generate predictions and inform decisions is potentially disastrous. This paper explores how the application of standards during each stage of the development of an AI system deployed and used in a national security environment would help enable trust. Specifically, we focus on the standards outlined in Intelligence Community Directive 203 (Analytic Standards) to subject machine outputs to the same rigorous standards as analysis performed by humans.
As buildings are central to the social and environmental sustainability of human settlements, high-quality geospatial data are necessary to support their management and planning. Authorities around the world are increasingly collecting and releasing such data openly, but these are mostly disconnected initiatives, making it challenging for users to fully leverage their potential for urban sustainability. We conduct a global study of 2D geospatial data on buildings that are released by governments for free access, ranging from individual cities to whole countries. We identify and benchmark more than 140 releases from 28 countries containing above 100 million buildings, based on five dimensions: accessibility, richness, data quality, harmonisation, and relationships with other actors. We find that much building data released by governments is valuable for spatial analyses, but there are large disparities among them and not all instances are of high quality, harmonised, and rich in descriptive information. Our study also compares authoritative data to OpenStreetMap, a crowdsourced counterpart, suggesting a mutually beneficial and complementary relationship.
Aims: Our Gulf War Illness (GWI) study conducts combinatorial screening of many interactive neural and humoral biomarkers in order to establish predictive, diagnostic, and therapeutic targets. We encounter obstacles at every stage of the biomarker discovery process, from sample acquisition, bio-marker extraction to multi-aspect, multi-way interaction analysis, due to the study complexity and lack of support for complex data problem solutions. We introduce a novel data platform, named ROSALIND, to overcome the challenges, foster healthy and vital collaborations and advance scientific inquiries. Main methods: ROSALIND is a researcher-centered, study-specific data platform. It provides vital support of individual creativity and effort in collaborative research. We follow the principles etched in the platform name - ROSALIND stands for resource organisms with self-governed accessibility, linkability, integrability, neutrality, and dependability. We translate, encode and implement the principles in the platform with novel use of advanced concepts and techniques to ensure and protect data integrity and research integrity. From a researchers vantage point, ROSALIND embodies nuance utilities and advanced functionalities in one system, beyond conventional storage, archive and data management. Key findings: The deployment of ROSALIND in our GWI study in recent 12 months has accelerated the pace of data experiment and analysis, removed numerous error sources, and increased research quality and productivity. Significance: ROSALIND seems the first to address data integrity and research integrity in tandem with digital measures and means. It also promises a new type of distributed research networks with individualized data platforms connected in various self-organized collaboration configurations.
Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on trans-port, however, OT can be fragile to outliers or noise, especially in high dimensions. Here, we introduce a new form of structured OT that simultaneously learns low-dimensional structure in data while leveraging this structure to solve the alignment task. Compared with OT, the resulting transport plan has better structural interpretability, highlighting the connections between individual data points and local geometry, and is more robust to noise and sampling. We apply the method to synthetic as well as real datasets, where we show that our method can facilitate alignment in noisy settings and can be used to both correct and interpret domain shift.