ﻻ يوجد ملخص باللغة العربية
Tree ensembles distribute feature importance evenly amongst groups of correlated features. The average feature ranking of the correlated group is suppressed, which reduces interpretability and complicates feature selection. In this paper we present ControlBurn, a feature selection algorithm that uses a weighted LASSO-based feature selection method to prune unnecessary features from tree ensembles, just as low-intensity fire reduces overgrown vegetation. Like the linear LASSO, ControlBurn assigns all the feature importance of a correlated group of features to a single feature. Moreover, the algorithm is efficient and only requires a single training iteration to run, unlike iterative wrapper-based feature selection methods. We show that ControlBurn performs substantially better than feature selection methods with comparable computational costs on datasets with correlated features.
Uplift modeling is a predictive modeling technique that estimates the user-level incremental effect of a treatment using machine learning models. It is often used for targeting promotions and advertisements, as well as for the personalization of prod
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware me
Feature selection in learning to rank has recently emerged as a crucial issue. Whereas several preprocessing approaches have been proposed, only a few works have been focused on integrating the feature selection into the learning process. In this wor
In this paper, we propose a new wrapper feature selection approach with partially labeled training examples where unlabeled observations are pseudo-labeled using the predictions of an initial classifier trained on the labeled training set. The wrappe
Major complications arise from the recent increase in the amount of high-dimensional data, including high computational costs and memory requirements. Feature selection, which identifies the most relevant and informative attributes of a dataset, has