No Arabic abstract
Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated from static features from the volunteered geographic information site OpenStreetMap (OSM). We use OSM features as predictors for linear regressions of counts of traffic disruptions and traffic volume at 6,500 points in the road network within 112 regions of Oxfordshire, UK. We show that more than half the variation in traffic volume and disruptions can be explained with static features alone, and use cross-validation and recursive feature elimination to evaluate the predictive power and importance of different land use categories. Finally, we show that using OSMs granular point of interest data allows for better predictions than the aggregate categories typically used in studies of transportation and land use.
Representations of geographic entities captured in popular knowledge graphs such as Wikidata and DBpedia are often incomplete. OpenStreetMap (OSM) is a rich source of openly available, volunteered geographic information that has a high potential to complement these representations. However, identity links between the knowledge graph entities and OSM nodes are still rare. The problem of link discovery in these settings is particularly challenging due to the lack of a strict schema and heterogeneity of the user-defined node representations in OSM. In this article, we propose OSM2KG - a novel link discovery approach to predict identity links between OSM nodes and geographic entities in a knowledge graph. The core of the OSM2KG approach is a novel latent, compact representation of OSM nodes that captures semantic node similarity in an embedding. OSM2KG adopts this latent representation to train a supervised model for link prediction and utilises existing links between OSM and knowledge graphs for training. Our experiments conducted on several OSM datasets, as well as the Wikidata and DBpedia knowledge graphs, demonstrate that OSM2KG can reliably discover identity links. OSM2KG achieves an F1 score of 92.05% on Wikidata and of 94.17% on DBpedia on average, which corresponds to a 21.82 percentage points increase in F1 score on Wikidata compared to the best performing baselines.
Accurate modelling of local population movement patterns is a core contemporary concern for urban policymakers, affecting both the short term deployment of public transport resources and the longer term planning of transport infrastructure. Yet, while macro-level population movement models (such as the gravity and radiation models) are well developed, micro-level alternatives are in much shorter supply, with most macro-models known to perform badly in smaller geographic confines. In this paper we take a first step to remedying this deficit, by leveraging two novel datasets to analyse where and why macro-level models of human mobility break down at small scales. In particular, we use an anonymised aggregate dataset from a major mobility app and combine this with freely available data from OpenStreetMap concerning land-use composition of different areas around the county of Oxfordshire in the United Kingdom. We show where different models fail, and make the case for a new modelling strategy which moves beyond rough heuristics such as distance and population size towards a detailed, granular understanding of the opportunities presented in different areas of the city.
The emergence of large stores of transactional data generated by increasing use of digital devices presents a huge opportunity for policymakers to improve their knowledge of the local environment and thus make more informed and better decisions. A research frontier is hence emerging which involves exploring the type of measures that can be drawn from data stores such as mobile phone logs, Internet searches and contributions to social media platforms, and the extent to which these measures are accurate reflections of the wider population. This paper contributes to this research frontier, by exploring the extent to which local commuting patterns can be estimated from data drawn from Twitter. It makes three contributions in particular. First, it shows that simple heuristics drawn from geolocated Twitter data offer a good proxy for local commuting patterns; one which outperforms the major existing method for estimating these patterns (the radiation model). Second, it investigates sources of error in the proxy measure, showing that the model performs better on short trips with higher volumes of commuters; it also looks at demographic biases but finds that, surprisingly, measurements are not significantly affected by the fact that the demographic makeup of Twitter users differs significantly from the population as a whole. Finally, it looks at potential ways of going beyond simple heuristics by incorporating temporal information into models.
One of the elements that have popularized and facilitated the use of geographical information on a variety of computational applications has been the use of Web maps; this has opened new research challenges on different subjects, from locating places and people, the study of social behavior or the analyzing of the hidden structures of the terms used in a natural language query used for locating a place. However, the use of geographic information under technological features is not new, instead it has been part of a development and technological integration process. This paper presents a state of the art review about the application of geographic information under different approaches: its use on location based services, the collaborative user participation on it, its contextual-awareness, its use in the Semantic Web and the challenges of its use in natural languge queries. Finally, a prototype that integrates most of these areas is presented.
In an era of heterogeneous data, novel methods and volunteered geographic information provide opportunities to understand how people interact with a place. However, it is not enough to simply have such heterogeneous data, instead an understanding of its usability and reliability needs to be undertaken. Here, we draw upon the case study of Rakiura, Stewart Island where manifested passenger numbers across the Foveaux Strait are known. We have built a population model to ground truth such novel indicators. In our preliminary study, we find that a number of indicators offer the opportunity to understand fluctuations in populations. Some indicators (such as wastewater volumes) can suggest relative changes in populations in a raw form. While other indicators (such as TripAdvisor reviews or Instagram posts) require further data enrichment to get insights into population fluctuations. This research forms part of a larger research project looking to test and apply such novel indicators to inform disaster risk assessments.