We present novel 3D multi-scale SPH simulations of gas-rich galaxy mergers between the most massive galaxies at $z sim 8 - 10$, designed to scrutinize the direct collapse formation scenario for massive black hole seeds proposed in citet{mayer+10}. The simulations achieve a resolution of 0.1 pc, and include both metallicity-dependent optically-thin cooling and a model for thermal balance at high optical depth. We consider different formulations of the SPH hydrodynamical equations, including thermal and metal diffusion. When the two merging galaxy cores collide, gas infall produces a compact, optically thick nuclear disk with densities exceeding $10^{-10}$ g cm$^3$. The disk rapidly accretes higher angular momentum gas from its surroundings reaching $sim 5$ pc and a mass of $gtrsim 10^9$ $M_{odot}$ in only a few $10^4$ yr. Outside $gtrsim 2$ pc it fragments into massive clumps. Instead, supersonic turbulence prevents fragmentation in the inner parsec region, which remains warm ($sim 3000-6000$ K) and develops strong non-axisymmetric modes that cause prominent radial gas inflows ($> 10^4$ $M_{odot}$ yr$^{-1}$), forming an ultra-dense massive disky core. Angular momentum transport by non-axisymmetric modes should continue below our spatial resolution limit, quickly turning the disky core into a supermassive protostar which can collapse directly into a massive black hole of mass $10^8-10^9$ $M_{odot}$ via the relativistic radial instability. Such a cold direct collapse explains naturally the early emergence of high-z QSOs. Its telltale signature would be a burst of gravitational waves in the frequency range $10^{-4} - 10^{-1}$ Hz, possibly detectable by the planned eLISA interferometer.