ﻻ يوجد ملخص باللغة العربية
Gaussian processes are often considered a gold standard in uncertainty estimation with low dimensional data, but they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) was introduced as a solution to this problem: a deep feature extractor is used to transform the inputs over which a Gaussian process kernel is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that for certain feature extractors, far-away data points are mapped to the same features as those of training-set points. With this insight we propose to constrain DKLs feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and single forward pass uncertainty methods, while maintaining the speed and accuracy of softmax neural networks.
We propose a deep learning approach for discovering kernels tailored to identifying clusters over sample data. Our neural network produces sample embeddings that are motivated by--and are at least as expressive as--spectral clustering. Our training o
Recurrent neural network based solutions are increasingly being used in the analysis of longitudinal Electronic Health Record data. However, most works focus on prediction accuracy and neglect prediction uncertainty. We propose Deep Kernel Accelerate
Deep neural networks are increasingly being used for the analysis of medical images. However, most works neglect the uncertainty in the models prediction. We propose an uncertainty-aware deep kernel learning model which permits the estimation of the
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. S
It has been empirically observed that, in deep neural networks, the solutions found by stochastic gradient descent from different random initializations can be often connected by a path with low loss. Recent works have shed light on this intriguing p