We present a detailed comparison between the well-known SPH code GADGET and the new moving-mesh code AREPO on a number of hydrodynamical test problems. Through a variety of numerical experiments we establish a clear link between test problems and systematic numerical effects seen in cosmological simulations of galaxy formation. Our tests demonstrate deficiencies of the SPH method in several sectors. These accuracy problems not only manifest themselves in idealized hydrodynamical tests, but also propagate to more realistic simulation setups of galaxy formation, ultimately affecting gas properties in the full cosmological framework, as highlighted in papers by Vogelsberger et al. (2011) and Keres et al. (2011). We find that an inadequate treatment of fluid instabilities in GADGET suppresses entropy generation by mixing, underestimates vorticity generation in curved shocks and prevents efficient gas stripping from infalling substructures. In idealized tests of inside-out disk formation, the convergence rate of gas disk sizes is much slower in GADGET due to spurious angular momentum transport. In simulations where we follow the interaction between a forming central disk and orbiting substructures in a halo, the final disk morphology is strikingly different. In AREPO, gas from infalling substructures is readily depleted and incorporated into the host halo atmosphere, facilitating the formation of an extended central disk. Conversely, gaseous sub-clumps are more coherent in GADGET simulations, morphologically transforming the disk as they impact it. The numerical artefacts of the SPH solver are particularly severe for poorly resolved flows, and thus inevitably affect cosmological simulations due to their hierarchical nature. Our numerical experiments clearly demonstrate that AREPO delivers a physically more reliable solution.