We use $>$9400 $log(m/M_{odot})>10$ quiescent and star-forming galaxies at $zlesssim2$ in COSMOS/UltraVISTA to study the average size evolution of these systems, with focus on the rare, ultra-massive population at $log(m/M_{odot})>11.4$. The large 2-square degree survey area delivers a sample of $sim400$ such ultra-massive systems. Accurate sizes are derived using a calibration based on high-resolution images from the Hubble Space Telescope. We find that, at these very high masses, the size evolution of star-forming and quiescent galaxies is almost indistinguishable in terms of normalization and power-law slope. We use this result to investigate possible pathways of quenching massive $m>M^*$ galaxies at $z<2$. We consistently model the size evolution of quiescent galaxies from the star-forming population by assuming different simple models for the suppression of star-formation. These models include an instantaneous and delayed quenching without altering the structure of galaxies and a central starburst followed by compaction. We find that instantaneous quenching reproduces well the observed mass-size relation of massive galaxies at $z>1$. Our starburst$+$compaction model followed by individual growth of the galaxies by minor mergers is preferred over other models without structural change for $log(m/M_{odot})>11.0$ galaxies at $z>0.5$. None of our models is able to meet the observations at $m>M^*$ and $z<1$ with out significant contribution of post-quenching growth of individual galaxies via mergers. We conclude that quenching is a fast process in galaxies with $ m ge 10^{11} M_odot$, and that major mergers likely play a major role in the final steps of their evolution.