ﻻ يوجد ملخص باللغة العربية
Generalization is a central problem in Machine Learning. Indeed most prediction methods require careful calibration of hyperparameters usually carried out on a hold-out textit{validation} dataset to achieve generalization. The main goal of this paper is to introduce a novel approach to achieve generalization without any data splitting, which is based on a new risk measure which directly quantifies a models tendency to overfit. To fully understand the intuition and advantages of this new approach, we illustrate it in the simple linear regression model ($Y=Xbeta+xi$) where we develop a new criterion. We highlight how this criterion is a good proxy for the true generalization risk. Next, we derive different procedures which tackle several structures simultaneously (correlation, sparsity,...). Noticeably, these procedures textbf{concomitantly} train the model and calibrate the hyperparameters. In addition, these procedures can be implemented via classical gradient descent methods when the criterion is differentiable w.r.t. the hyperparameters. Our numerical experiments reveal that our procedures are computationally feasible and compare favorably to the popular approach (Ridge, LASSO and Elastic-Net combined with grid-search cross-validation) in term of generalization. They also outperform the baseline on two additional tasks: estimation and support recovery of $beta$. Moreover, our procedures do not require any expertise for the calibration of the initial parameters which remain the same for all the datasets we experimented on.
Deep Learning (DL) is considered the state-of-the-art in computer vision, speech recognition and natural language processing. Until recently, it was also widely accepted that DL is irrelevant for learning tasks on tabular data, especially in the smal
Generalization in deep learning has been the topic of much recent theoretical and empirical research. Here we introduce desiderata for techniques that predict generalization errors for deep learning models in supervised learning. Such predictions sho
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. Howeve
The insights revealed from process mining heavily rely on the quality of event logs. Activities extracted from healthcare information systems with the free-text nature may lead to inconsistent labels. Such inconsistency would then lead to redundancy
This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within th