We present modular and optimal architectures for implementing arbitrary discrete unitary transformations on light. These architectures are based on systematically combining smaller M-mode linear optical interferometers together to implement a larger N-mode transformation. Thus this work enables the implementation of large linear optical transformations using smaller modules that act on the spatial or the internal degrees of freedom of light such as polarization, time or orbital angular momentum. The architectures lead to a rectangular gate structure, which is optimal in the sense that realizing arbitrary transformations on these architectures needs a minimal number of optical elements and minimal circuit depth. Moreover, the rectangular structure ensures that each the different optical modes incur balanced optical losses, so the architectures promise substantially enhanced process fidelities as compared to existing schemes.