We search for the Baryon Acoustic Oscillations in the projected cross-correlation function binned into transverse comoving radius between the SDSS-IV DR16 eBOSS quasars and a dense photometric sample of galaxies selected from the DESI Legacy Imaging Surveys. We estimate the density of the photometric sample of galaxies in this redshift range to be about 2900 deg$^{-2}$, which is deeper than the official DESI ELG selection, and the density of the spectroscopic sample is about 20 deg$^{-2}$. In order to mitigate the systematics related to the use of different imaging surveys close to the detection limit, we use a neural network approach that accounts for complex dependencies between the imaging attributes and the observed galaxy density. We find that we are limited by the depth of the imaging surveys which affects the density and purity of the photometric sample and its overlap in redshift with the quasar sample, which thus affects the performance of the method. When cross-correlating the photometric galaxies with quasars in $0.6 leq z leq 1.2$, the cross-correlation function can provide better constraints on the comoving angular distance, $D_{rm M}$ (6% precision) compared to the constraint on the spherically-averaged distance $D_{rm V}$ (9% precision) obtained from the auto-correlation. Although not yet competitive, this technique will benefit from the arrival of deeper photometric data from upcoming surveys which will enable it to go beyond the current limitations we have identified in this work.