Gravitational wave observations of compact binary coalescences provide precision probes of strong-field gravity. There is thus now a standard set of null tests of general relativity (GR) applied to LIGO-Virgo detections and many more such tests proposed. However, the relation between all these tests is not yet well understood. We start to investigate this by applying a set of standard tests to simulated observations of binary black holes in GR and with phenomenological deviations from GR. The phenomenological deviations include self-consistent modifications to the energy flux in an effective-one-body (EOB) model, the deviations used in the second post-Newtonian (2PN) TIGER and FTA parameterized tests, and the dispersive propagation due to a massive graviton. We consider four types of tests: residuals, inspiral-merger-ringdown consistency, parameterized (TIGER and FTA), and modified dispersion relation. We also check the consistency of the unmodeled reconstruction of the waveforms with the waveform recovered using GR templates. These tests are applied to simulated observations similar to GW150914 with both large and small deviations from GR and similar to GW170608 just with small deviations from GR. We find that while very large deviations from GR are picked up with high significance by almost all tests, more moderate deviations are picked up by only a few tests, and some deviations are not recognized as GR violations by any test at the moderate signal-to-noise ratios we consider. Moreover, the tests that identify various deviations with high significance are not necessarily the expected ones. We also find that the 2PN (1PN) TIGER and FTA tests recover much smaller deviations than the true values in the modified EOB (massive graviton) case. Additionally, we find that of the GR deviations we consider, the residuals test is only able to detect extreme deviations from GR. (Abridged)