Machine learning is becoming a popular tool to quantify galaxy morphologies and identify mergers. However, this technique relies on using an appropriate set of training data to be successful. By combining hydrodynamical simulations, synthetic observations and convolutional neural networks (CNNs), we quantitatively assess how realistic simulated galaxy images must be in order to reliably classify mergers. Specifically, we compare the performance of CNNs trained with two types of galaxy images, stellar maps and dust-inclusive radiatively transferred images, each with three levels of observational realism: (1) no observational effects (idealized images), (2) realistic sky and point spread function (semi-realistic images), (3) insertion into a real sky image (fully realistic images). We find that networks trained on either idealized or semi-real images have poor performance when applied to survey-realistic images. In contrast, networks trained on fully realistic images achieve 87.1% classification performance. Importantly, the level of realism in the training images is much more important than whether the images included radiative transfer, or simply used the stellar maps (87.1% compared to 79.6% accuracy, respectively). Therefore, one can avoid the large computational and storage cost of running radiative transfer with a relatively modest compromise in classification performance. Making photometry-based networks insensitive to colour incurs a very mild penalty to performance with survey-realistic data (86.0% with r-only compared to 87.1% with gri). This result demonstrates that while colour can be exploited by colour-sensitive networks, it is not necessary to achieve high accuracy and so can be avoided if desired. We provide the public release of our statistical observational realism suite, RealSim, as a companion to this paper.