When training end-to-end learned models for lossy compression, one has to balance the rate and distortion losses. This is typically done by manually setting a tradeoff parameter $beta$, an approach called $beta$-VAE. Using this approach it is difficult to target a specific rate or distortion value, because the result can be very sensitive to $beta$, and the appropriate value for $beta$ depends on the model and problem setup. As a result, model comparison requires extensive per-model $beta$-tuning, and producing a whole rate-distortion curve (by varying $beta$) for each model to be compared. We argue that the constrained optimization method of Rezende and Viola, 2018 is a lot more appropriate for training lossy compression models because it allows us to obtain the best possible rate subject to a distortion constraint. This enables pointwise model comparisons, by training two models with the same distortion target and comparing their rate. We show that the method does manage to satisfy the constraint on a realistic image compression task, outperforms a constrained optimization method based on a hinge-loss, and is more practical to use for model selection than a $beta$-VAE.