Inverse design of large-area metasurfaces can potentially exploit the full parameter space that such devices offer and achieve highly efficient multifunctional flat optical elements. However, since practically useful flat optics elements are large in the linear dimension, an accurate simulation of their scattering properties is challenging. Here, we demonstrate a method to compute accurate simulations and gradients of large-area metasurfaces. Our approach relies on two key ingredients - a simulation distribution strategy that allows a linear reduction in the simulation time with number of compute (GPU) nodes and an efficient single-node computation using the Transition-matrix (T-matrix) method. We demonstrate ability to perform a distributed simulation of large-area, while accurately accounting for scatterer-scatterer interactions significantly beyond the locally periodic approximation, and efficiently compute gradients with respect to the metasurface design parameters. This scalable and accurate metasurface simulation method opens the door to gradient-based optimization of full large-area metasurfaces.