As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We propose Mauve, a comparison measure for open-ended text generation, which directly compares a generation models distribution to that of human-written text. Mauve measures the mean area under a divergence curve for the two distributions, exploring the trade-off between two types of errors: those arising from parts of the human distribution that the model distribution approximates well, and those it does not. Mauve extends a family of information divergence metrics, introducing a tractable approximation based on computing the KL divergence in a quantized embedding space. This yields an efficient implementation that scales up to modern text generation models. Through an extensive empirical study on three open-ended generation tasks, we find that Mauve identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.