ﻻ يوجد ملخص باللغة العربية
Continuous estimation the drivers take-over readiness is critical for safe and timely transfer of control during the failure modes of autonomous vehicles. In this paper, we propose a data-driven approach for estimating the drivers take-over readiness based purely on observable cues from in-vehicle vision sensors. We present an extensive naturalistic drive dataset of drivers in a conditionally autonomous vehicle running on Californian freeways. We collect subjective ratings for the drivers take-over readiness from multiple human observers viewing the sensor feed. Analysis of the ratings in terms of intra-class correlation coefficients (ICCs) shows a high degree of consistency in the ratings across raters. We define a metric for the drivers take-over readiness termed the Observable Readiness Index (ORI) based on the ratings. Finally, we propose an LSTM model for continuous estimation of the drivers ORI based on a holistic representation of the drivers state, capturing gaze, hand, pose and foot activity. Our model estimates the ORI with a mean absolute error of 0.449 on a 5 point scale.
With increasing automation in passenger vehicles, the study of safe and smooth occupant-vehicle interaction and control transitions is key. In this study, we focus on the development of contextual, semantically meaningful representations of the drive
Understanding occupant-vehicle interactions by modeling control transitions is important to ensure safe approaches to passenger vehicle automation. Models which contain contextual, semantically meaningful representations of driver states can be used
Pedestrians are arguably one of the most safety-critical road users to consider for autonomous vehicles in urban areas. In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes from a single imag
The task of visual grounding requires locating the most relevant region or object in an image, given a natural language query. So far, progress on this task was mostly measured on curated datasets, which are not always representative of human spoken
Current vision systems are trained on huge datasets, and these datasets come with costs: curation is expensive, they inherit human biases, and there are concerns over privacy and usage rights. To counter these costs, interest has surged in learning f