Do you want to publish a course? Click here

We observe that the development cross-entropy loss of supervised neural machine translation models scales like a power law with the amount of training data and the number of non-embedding parameters in the model. We discuss some practical implication s of these results, such as predicting BLEU achieved by large scale models and predicting the ROI of labeling data in low-resource language pairs.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا