Simple models of infectious diseases tend to assume random mixing of individuals, but real interactions are not random pairwise encounters: they occur within various types of gatherings such as workplaces, households, schools, and concerts, best described by a higher-order network structure. We model contagions on higher-order networks using group-based approximate master equations, in which we track all states and interactions within a group of nodes and assume a mean-field coupling between them. Using the Susceptible-Infected-Susceptible dynamics, our approach reveals the existence of a mesoscopic localization regime, where a disease can concentrate and self-sustain only around large groups in the network overall organization. In this regime, the phase transition is smeared, characterized by an inhomogeneous activation of the groups. At the mesoscopic level, we observe that the distribution of infected nodes within groups of a same size can be very dispersed, even bimodal. When considering heterogeneous networks, both at the level of nodes and groups, we characterize analytically the region associated with mesoscopic localization in the structural parameter space. We put in perspective this phenomenon with eigenvector localization and discuss how a focus on higher-order structures is needed to discern the more subtle localization at the mesoscopic level. Finally, we discuss how mesoscopic localization affects the response to structural interventions and how this framework could provide important insights for a broad range of dynamics.