Current technology for autonomous cars primarily focuses on getting the passenger from point A to B. Nevertheless, it has been shown that passengers are afraid of taking a ride in self-driving cars. One way to alleviate this problem is by allowing the passenger to give natural language commands to the car. However, the car can misunderstand the issued command or the visual surroundings which could lead to uncertain situations. It is desirable that the self-driving car detects these situations and interacts with the passenger to solve them. This paper proposes a model that detects uncertain situations when a command is given and finds the visual objects causing it. Optionally, a question generated by the system describing the uncertain objects is included. We argue that if the car could explain the objects in a human-like way, passengers could gain more confidence in the cars abilities. Thus, we investigate how to (1) detect uncertain situations and their underlying causes, and (2) how to generate clarifying questions for the passenger. When evaluating on the Talk2Car dataset, we show that the proposed model, acrfull{pipeline}, improves gls{m:ambiguous-absolute-increase} in terms of $IoU_{.5}$ compared to not using gls{pipeline}. Furthermore, we designed a referring expression generator (REG) acrfull{reg_model} tailored to a self-driving car setting which yields a relative improvement of gls{m:meteor-relative} METEOR and gls{m:rouge-relative} ROUGE-l compared with state-of-the-art REG models, and is three times faster.