A data driven trimming procedure for robust classification


الملخص بالإنكليزية

Classification rules can be severely affected by the presence of disturbing observations in the training sample. Looking for an optimal classifier with such data may lead to unnecessarily complex rules. So, simpler effective classification rules could be achieved if we relax the goal of fitting a good rule for the whole training sample but only consider a fraction of the data. In this paper we introduce a new method based on trimming to produce classification rules with guaranteed performance on a significant fraction of the data. In particular, we provide an automatic way of determining the right trimming proportion and obtain in this setting oracle bounds for the classification error on the new data set.

تحميل البحث