We combine numerical optimization techniques [Uskov et al., Phys. Rev. A 79, 042326 (2009)] with symmetries of the Weyl chamber to obtain optimal implementations of generic linear-optical KLM-type two-qubit entangling gates. We find that while any two-qubit controlled-U gate, including CNOT and CS, can be implemented using only two ancilla resources with success probability S > 0.05, a generic SU(4) operation requires three unentangled ancilla photons, with success S > 0.0063. Specifically, we obtain a maximal success probability close to 0.0072 for the B gate. We show that single-shot implementation of a generic SU(4) gate offers more than an order of magnitude increase in the success probability and two-fold reduction in overhead ancilla resources compared to standard triple-CNOT and double-B gate decompositions.