From Data Quality to Model Quality: an Exploratory Study on Deep Learning


Abstract in English

Nowadays, people strive to improve the accuracy of deep learning models. However, very little work has focused on the quality of data sets. In fact, data quality determines model quality. Therefore, it is important for us to make research on how data quality affects on model quality. In this paper, we mainly consider four aspects of data quality, including Dataset Equilibrium, Dataset Size, Quality of Label, Dataset Contamination. We deign experiment on MNIST and Cifar-10 and try to find out the influence the four aspects make on model quality. Experimental results show that four aspects all have decisive impact on the quality of models. It means that decrease in data quality in these aspects will reduce the accuracy of model.

Download