The first multi-messenger gravitational wave event has had a transformative effect on the space of modified gravity models. In this paper we study the enhanced tests of gravity that are possible with a future set of gravitational wave standard siren events. We perform MCMC constraint forecasts for parameters in Horndeski scalar-tensor theories. In particular, we focus on the complementarity of gravitational waves with electromagnetic large-scale structure data from galaxy surveys. We find that the addition of fifty low redshift ($z lesssim 0.2$) standard sirens from the advanced LIGO network offers only a modest improvement (a factor 1.1 -- 1.3, where 1.0 is no improvement) over existing constraints from electromagnetic observations of large-scale structures. In contrast, high redshift (up to $z sim 10$) standard sirens from the future LISA satellite will improve constraints on the time evolution of the Planck mass in Horndeski theories by a factor $sim 5$. By simulating different scenarios, we find this improvement to be robust to marginalisation over unknown merger inclination angles and to variation between three plausible models for the merger source population.