Systems for the automatic recognition and detection of automotive parts are crucial in several emerging research areas in the development of intelligent vehicles. They enable, for example, the detection and modelling of interactions between human and the vehicle. In this paper, we quantitatively and qualitatively explore the efficacy of deep learning architectures for the classification and localisation of 29 interior and exterior vehicle regions on three novel datasets. Furthermore, we experiment with joint and transfer learning approaches across datasets and point out potential applications of our systems. Our best network architecture achieves an F1 score of 93.67 % for recognition, while our best localisation approach utilising state-of-the-art backbone networks achieve a mAP of 63.01 % for detection. The MuSe-CAR-Part dataset, which is based on a large variety of human-car interactions in videos, the weights of the best models, and the code is publicly available to academic parties for benchmarking and future research.