No Arabic abstract
Selecting an appropriate clustering method as well as an optimal number of clusters in road accident data is at times confusing and difficult. This paper analyzes shortcomings of different existing techniques applied to cluster accident-prone areas and recommends using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering Points To Identify the Clustering Structure (OPTICS) to overcome them. Comparative performance analysis based on real-life data on the recorded cases of road accidents in North Carolina also show more effectiveness and efficiency achieved by these algorithms.
Due to urbanization and the increase of individual mobility, in most metropolitan areas around the world congestion and inefficient traffic management occur. Highly necessary intelligent traffic control systems, which are able to reduce congestion, rely on measurements of traffic situations in urban road networks and freeways. Unfortunately, the instrumentation for accurate traffic measurement is expensive and not widely implemented. This thesis addresses this problem, where relatively inexpensive and easy to install loop-detectors are used by a geometric deep learning algorithm, which uses loop-detector data in a spatial context of a road network, to estimate queue length in front of signalized intersections, which can be then used for following traffic control tasks. Therefore, in the first part of this work a conventional estimation method for queue length (which does not use machine learning techniques) based on second-by-second loop-detector data is implemented, which uses detected shockwaves in queues to estimate the length and point of time for the maximum queue. The method is later used as reference but also as additional input information for the geometric deep learning approach. In the second part the geometric deep learning algorithm is developed, which uses spatial correlations in the road network but also temporal correlations in detector data time sequences by new attention mechanisms, to overcome the limitations of conventional methods like excess traffic demand, lane changing and stop-and-go traffic. Therefore, it is necessary to abstract the topology of the road network in a graph. Both approaches are compared regarding their performance, reliability as well as limitations and validated by usage of the traffic simulation software SUMO (Simulation of Urban MObility). Finally, the results are discussed in the conclusions and further investigations are suggested.
Recently, due to an increasing interest for transparency in artificial intelligence, several methods of explainable machine learning have been developed with the simultaneous goal of accuracy and interpretability by humans. In this paper, we study a recent framework of explainable clustering first suggested by Dasgupta et al.~cite{dasgupta2020explainable}. Specifically, we focus on the $k$-means and $k$-medians problems and provide nearly tight upper and lower bounds. First, we provide an $O(log k log log k)$-approximation algorithm for explainable $k$-medians, improving on the best known algorithm of $O(k)$~cite{dasgupta2020explainable} and nearly matching the known $Omega(log k)$ lower bound~cite{dasgupta2020explainable}. In addition, in low-dimensional spaces $d ll log k$, we show that our algorithm also provides an $O(d log^2 d)$-approximate solution for explainable $k$-medians. This improves over the best known bound of $O(d log k)$ for low dimensions~cite{laber2021explainable}, and is a constant for constant dimensional spaces. To complement this, we show a nearly matching $Omega(d)$ lower bound. Next, we study the $k$-means problem in this context and provide an $O(k log k)$-approximation algorithm for explainable $k$-means, improving over the $O(k^2)$ bound of Dasgupta et al. and the $O(d k log k)$ bound of cite{laber2021explainable}. To complement this we provide an almost tight $Omega(k)$ lower bound, improving over the $Omega(log k)$ lower bound of Dasgupta et al. Given an approximate solution to the classic $k$-means and $k$-medians, our algorithm for $k$-medians runs in time $O(kd log^2 k )$ and our algorithm for $k$-means runs in time $ O(k^2 d)$.
Early risk diagnosis and driving anomaly detection from vehicle stream are of great benefits in a range of advanced solutions towards Smart Road and crash prevention, although there are intrinsic challenges, especially lack of ground truth, definition of multiple risk exposures. This study proposes a domain-specific automatic clustering (termed Autocluster) to self-learn the optimal models for unsupervised risk assessment, which integrates key steps of risk clustering into an auto-optimisable pipeline, including feature and algorithm selection, hyperparameter auto-tuning. Firstly, based on surrogate conflict measures, indicator-guided feature extraction is conducted to construct temporal-spatial and kinematical risk features. Then we develop an elimination-based model reliance importance (EMRI) method to unsupervised-select the useful features. Secondly, we propose balanced Silhouette Index (bSI) to evaluate the internal quality of imbalanced clustering. A loss function is designed that considers the clustering performance in terms of internal quality, inter-cluster variation, and model stability. Thirdly, based on Bayesian optimisation, the algorithm selection and hyperparameter auto-tuning are self-learned to generate the best clustering partitions. Various algorithms are comprehensively investigated. Herein, NGSIM vehicle trajectory data is used for test-bedding. Findings show that Autocluster is reliable and promising to diagnose multiple distinct risk exposures inherent to generalised driving behaviour. Besides, we also delve into risk clustering, such as, algorithms heterogeneity, Silhouette analysis, hierarchical clustering flows, etc. Meanwhile, the Autocluster is also a method for unsupervised multi-risk data labelling and indicator threshold calibration. Furthermore, Autocluster is useful to tackle the challenges in imbalanced clustering without ground truth or priori knowledge
Automotive traffic is a classical example of a complex system, being the simplest case the homogeneous traffic where all vehicles are of the same kind, and using different means of transportation increases complexity due to different driving rules and interactions between each vehicle type. In particular, when motorcyclists drive in between the lanes of stopped or slow-moving vehicles. This later driving mode is a Venezuelan pervasive practice of mobilization that clearly jeopardizes road safety. We developed a minimalist agent-based model to analyze the interaction of road users with and without motorcyclists on the way. The presence of motorcyclists dwindles significantly the frequency of lane changes of motorists while increasing their frequency of acceleration-deceleration maneuvers, without significantly affecting their average speed. That is, motorcyclist corralled motorists in their lanes limiting their ability to maneuver and increasing their acceleration noise. Comparison of the simulations with real traffic videos shows good agreement between model and observation. The implications of these results regarding road safety concerns about the interaction between motorists and motorcyclists are discussed.
Regionalization is the task of dividing up a landscape into homogeneous patches with similar properties. Although this task has a wide range of applications, it has two notable challenges. First, it is assumed that the resulting regions are both homogeneous and spatially contiguous. Second, it is well-recognized that landscapes are hierarchical such that fine-scale regions are nested wholly within broader-scale regions. To address these two challenges, first, we develop a spatially constrained spectral clustering framework for region delineation that incorporates the tradeoff between region homogeneity and spatial contiguity. The framework uses a flexible, truncated exponential kernel to represent the spatial contiguity constraints, which is integrated with the landscape feature similarity matrix for region delineation. To address the second challenge, we extend the framework to create fine-scale regions that are nested within broader-scaled regions using a greedy, recursive bisection approach. We present a case study of a terrestrial ecology data set in the United States that compares the proposed framework with several baseline methods for regionalization. Experimental results suggest that the proposed framework for regionalization outperforms the baseline methods, especially in terms of balancing region contiguity and homogeneity, as well as creating regions of more similar size, which is often a desired trait of regions.