No Arabic abstract
Estimating the kernel mean in a reproducing kernel Hilbert space is a critical component in many kernel learning algorithms. Given a finite sample, the standard estimate of the target kernel mean is the empirical average. Previous works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions and present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distribution. Theoretically, we show that the marginalized kernel mean estimator introduces implicit regularization in kernel mean estimation. Empirically, we show on a variety of datasets that the marginalized kernel mean estimator obtains much lower estimation error than the existing estimators.
Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low computational cost and low sample complexity. However, kernel mean embeddings have had limited applications to problems that consist in optimizing distributions, due to the difficulty of characterizing which Hilbert space vectors correspond to a probability distribution. In this note, we propose to leverage the kernel sums-of-squares parameterization of positive functions of Marteau-Ferey et al. [2020] to fit distributions in the MMD geometry. First, we show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. Then, we provide algorithms to optimize such distributions in the finite-sample setting, which we illustrate in a density fitting numerical experiment.
Given a publicly available pool of machine learning models constructed for various tasks, when a user plans to build a model for her own machine learning application, is it possible to build upon models in the pool such that the previous efforts on these existing models can be reused rather than starting from scratch? Here, a grand challenge is how to find models that are helpful for the current application, without accessing the raw training data for the models in the pool. In this paper, we present a two-phase framework. In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model. Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification. Theoretical results and extensive experiments validate the effectiveness of our approach.
We propose a method for estimating channel parameters from RSSI measurements and the lost packet count, which can work in the presence of losses due to both interference and signal attenuation below the noise floor. This is especially important in the wireless networks, such as vehicular, where propagation model changes with the density of nodes. The method is based on Stochastic Expectation Maximization, where the received data is modeled as a mixture of distributions (no/low interference and strong interference), incomplete (censored) due to packet losses. The PDFs in the mixture are Gamma, according to the commonly accepted model for wireless signal and interference power. This approach leverages the loss count as additional information, hence outperforming maximum likelihood estimation, which does not use this information (ML-), for a small number of received RSSI samples. Hence, it allows inexpensive on-line channel estimation from ad-hoc collected data. The method also outperforms ML- on uncensored data mixtures, as ML- assumes that samples are from a single-mode PDF.
Training deep neural models in the presence of corrupted supervision is challenging as the corrupted data points may significantly impact the generalization performance. To alleviate this problem, we present an efficient robust algorithm that achieves strong guarantees without any assumption on the type of corruption and provides a unified framework for both classification and regression problems. Unlike many existing approaches that quantify the quality of the data points (e.g., based on their individual loss values), and filter them accordingly, the proposed algorithm focuses on controlling the collective impact of data points on the average gradient. Even when a corrupted data point failed to be excluded by our algorithm, the data point will have a very limited impact on the overall loss, as compared with state-of-the-art filtering methods based on loss values. Extensive experiments on multiple benchmark datasets have demonstrated the robustness of our algorithm under different types of corruption.
We study the expressive power of kernel methods and the algorithmic feasibility of multiple kernel learning for a special rich class of kernels. Specifically, we define emph{Euclidean kernels}, a diverse class that includes most, if not all, families of kernels studied in literature such as polynomial kernels and radial basis functions. We then describe the geometric and spectral structure of this family of kernels over the hypercube (and to some extent for any compact domain). Our structural results allow us to prove meaningful limitations on the expressive power of the class as well as derive several efficient algorithms for learning kernels over different domains.