Scalable and fault-tolerant quantum computation will require error correction. This will demand constant measurement of many-qubit observables, implemented using a vast number of CNOT gates. Indeed, practically all operations performed by a fault-tolerant device will be these CNOTs, or equivalent two-qubit controlled operations. It is therefore important to devise benchmarks for these gates that explicitly quantify their effectiveness at this task. Here we develop such benchmarks, and demonstrate their use by applying them to a range of differently implemented controlled gates and a particular quantum error correcting code. Specifically, we consider spin qubits confined to quantum dots that are coupled either directly or via floating gates to implement the minimal 17-qubit instance of the surface code. Our results show that small differences in the gate fidelity can lead to large differences in the performance of the surface code. This shows that gate fidelity is not, in general, a good predictor of code performance.