Developing nations are particularly susceptible to the adverse effects of global warming. By 2040, 14 percent of global emissions will come from data centers. This paper presents early findings in the use AI and digital twins to model and optimize data center operations.
As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science lifecycle, including long-term data sto
rage, data exploration and discovery services, and compute capabilities to support data analysis and re-analysis, as new data are added and as scientific pipelines are refined. We describe our experience developing data commons-- interoperable infrastructure that co-locates data, storage, and compute with common analysis tools--and present several cases studies. Across these case studies, several common requirements emerge, including the need for persistent digital identifier and metadata services, APIs, data portability, pay for compute capabilities, and data peering agreements between data commons. Though many challenges, including sustainability and developing appropriate standards remain, interoperable data commons bring us one step closer to effective Data Science as Service for the scientific research community.
As machine learning and data science applications grow ever more prevalent, there is an increased focus on data sharing and open data initiatives, particularly in the context of the African continent. Many argue that data sharing can support research
and policy design to alleviate poverty, inequality, and derivative effects in Africa. Despite the fact that the datasets in question are often extracted from African communities, conversations around the challenges of accessing and sharing African data are too often driven by nonAfrican stakeholders. These perspectives frequently employ a deficit narratives, often focusing on lack of education, training, and technological resources in the continent as the leading causes of friction in the data ecosystem. We argue that these narratives obfuscate and distort the full complexity of the African data sharing landscape. In particular, we use storytelling via fictional personas built from a series of interviews with African data experts to complicate dominant narratives and to provide counternarratives. Coupling these personas with research on data practices within the continent, we identify recurring barriers to data sharing as well as inequities in the distribution of data sharing benefits. In particular, we discuss issues arising from power imbalances resulting from the legacies of colonialism, ethno-centrism, and slavery, disinvestment in building trust, lack of acknowledgement of historical and present-day extractive practices, and Western-centric policies that are ill-suited to the African context. After outlining these problems, we discuss avenues for addressing them when sharing data generated in the continent.
Building Management Systems (BMS) are crucial in the drive towards smart sustainable cities. This is due to the fact that they have been effective in significantly reducing the energy consumption of buildings. A typical BMS is composed of smart devic
es that communicate with one another in order to achieve their purpose. However, the heterogeneity of these devices and their associated meta-data impede the deployment of solutions that depend on the interactions among these devices. Nonetheless, automatically inferring the semantics of these devices using data-driven methods provides an ideal solution to the problems brought about by this heterogeneity. In this paper, we undertake a multi-dimensional study to address the problem of inferring the semantics of IoT devices using machine learning models. Using two datasets with over 67 million data points collected from IoT devices, we developed discriminative models that produced competitive results. Particularly, our study highlights the potential of Image Encoded Time Series (IETS) as a robust alternative to statistical feature-based inference methods. Leveraging just a fraction of the data required by feature-based methods, our evaluations show that this encoding competes with and even outperforms traditional methods in many cases.
As buildings are central to the social and environmental sustainability of human settlements, high-quality geospatial data are necessary to support their management and planning. Authorities around the world are increasingly collecting and releasing
such data openly, but these are mostly disconnected initiatives, making it challenging for users to fully leverage their potential for urban sustainability. We conduct a global study of 2D geospatial data on buildings that are released by governments for free access, ranging from individual cities to whole countries. We identify and benchmark more than 140 releases from 28 countries containing above 100 million buildings, based on five dimensions: accessibility, richness, data quality, harmonisation, and relationships with other actors. We find that much building data released by governments is valuable for spatial analyses, but there are large disparities among them and not all instances are of high quality, harmonised, and rich in descriptive information. Our study also compares authoritative data to OpenStreetMap, a crowdsourced counterpart, suggesting a mutually beneficial and complementary relationship.
Autonomous Driving is now the promising future of transportation. As one basis for autonomous driving, High Definition Map (HD map) provides high-precision descriptions of the environment, therefore it enables more accurate perception and localizatio
n while improving the efficiency of path planning. However, an extremely large amount of map data needs to be transmitted during driving, thus posing great challenge for real-time and safety requirements for autonomous driving. To this end, we first demonstrate how the existing data distribution mechanism can support HD map services. Furthermore, considering the constraints of vehicle power, vehicle speed, base station bandwidth, etc., we propose a HD map data distribution mechanism on top of Vehicle-to-Infrastructure (V2I) data transmission. By this mechanism, the map provision task is allocated to the selected RSU nodes and transmits proportionate HD map data cooperatively. Their works on map data loading aims to provide in-time HD map data service with optimized in-vehicle energy consumption. Finally, we model the selection of RSU nodes into a partial knapsack problem and propose a greedy strategy-based data transmission algorithm. Experimental results confirm that within limited energy consumption, the proposed mechanism can ensure HD map data service by coordinating multiple RSUs with the shortest data transmission time.