We compute adiabatic waveforms for extreme mass-ratio inspirals (EMRIs) by stitching together a long inspiral waveform from a sequence of waveform snapshots, each of which corresponds to a particular geodesic orbit. We show that the complicated total waveform can be regarded as a sum of voices. Each voice evolves in a simple way on long timescales, a property which can be exploited to efficiently produce waveform models that faithfully encode the properties of EMRI systems. We look at examples for a range of different orbital geometries: spherical orbits, equatorial eccentric orbits, and one example of generic (inclined and eccentric) orbits. To our knowledge, this is the first calculation of a generic EMRI waveform that uses strong-field radiation reaction. We examine waveforms in both the time and frequency domains. Although EMRIs evolve slowly enough that the stationary phase approximation (SPA) to the Fourier transform is valid, the SPA calculation must be done to higher order for some voices, since their instantaneous frequency can change from chirping forward ($dot f > 0$) to chirping backward ($dot f < 0$). The approach we develop can eventually be extended to more complete EMRI waveform models, for example to include effects neglected by the adiabatic approximation such as the conservative self force and spin-curvature coupling.