We observe that the development cross-entropy loss of supervised neural machine translation models scales like a power law with the amount of training data and the number of non-embedding parameters in the model. We discuss some practical implication
s of these results, such as predicting BLEU achieved by large scale models and predicting the ROI of labeling data in low-resource language pairs.