Scene text removal (STR) contains two processes: text localization and background reconstruction. Through integrating both processes into a single network, previous methods provide an implicit erasure guidance by modifying all pixels in the entire image. However, there exists two problems: 1) the implicit erasure guidance causes the excessive erasure to non-text areas; 2) the one-stage erasure lacks the exhaustive removal of text region. In this paper, we propose a ProgrEssively Region-based scene Text eraser (PERT), introducing an explicit erasure guidance and performing balanced multi-stage erasure for accurate and exhaustive text removal. Firstly, we introduce a new region-based modification strategy (RegionMS) to explicitly guide the erasure process. Different from previous implicitly guided methods, RegionMS performs targeted and regional erasure on only text region, and adaptively perceives stroke-level information to improve the integrity of non-text areas with only bounding box level annotations. Secondly, PERT performs balanced multi-stage erasure with several progressive erasing stages. Each erasing stage takes an equal step toward the text-erased image to ensure the exhaustive erasure of text regions. Compared with previous methods, PERT outperforms them by a large margin without the need of adversarial loss, obtaining SOTA results with high speed (71 FPS) and at least 25% lower parameter complexity. Code is available at https://github.com/wangyuxin87/PERT.