Nighttime Stereo Depth Estimation using Joint Translation-Stereo Learning: Light Effects and Uninformative Regions


Abstract in English

Nighttime stereo depth estimation is still challenging, as assumptions associated with daytime lighting conditions do not hold any longer. Nighttime is not only about low-light and dense noise, but also about glow/glare, flares, non-uniform distribution of light, etc. One of the possible solutions is to train a network on night stereo images in a fully supervised manner. However, to obtain proper disparity ground-truths that are dense, independent from glare/glow, and have sufficiently far depth ranges is extremely intractable. To address the problem, we introduce a network joining day/night translation and stereo. In training the network, our method does not require ground-truth disparities of the night images, or paired day/night images. We utilize a translation network that can render realistic night stereo images from day stereo images. We then train a stereo network on the rendered night stereo images using the available disparity supervision from the corresponding day stereo images, and simultaneously also train the day/night translation network. We handle the fake depth problem, which occurs due to the unsupervised/unpaired translation, for light effects (e.g., glow/glare) and uninformative regions (e.g., low-light and saturated regions), by adding structure-preservation and weighted-smoothness constraints. Our experiments show that our method outperforms the baseline methods on night images.

Download