To capture high-speed videos using a two-dimensional detector, video snapshot compressive imaging (SCI) is a promising system, where the video frames are coded by different masks and then compressed to a snapshot measurement. Following this, efficient algorithms are desired to reconstruct the high-speed frames, where the state-of-the-art results are achieved by deep learning networks. However, these networks are usually trained for specific small-scale masks and often have high demands of training time and GPU memory, which are hence {bf em not flexible} to $i$) a new mask with the same size and $ii$) a larger-scale mask. We address these challenges by developing a Meta Modulated Convolutional Network for SCI reconstruction, dubbed MetaSCI. MetaSCI is composed of a shared backbone for different masks, and light-weight meta-modulation parameters to evolve to different modulation parameters for each mask, thus having the properties of {bf em fast adaptation} to new masks (or systems) and ready to {bf em scale to large data}. Extensive simulation and real data results demonstrate the superior performance of our proposed approach. Our code is available at {smallurl{https://github.com/xyvirtualgroup/MetaSCI-CVPR2021}}.