ﻻ يوجد ملخص باللغة العربية
We introduce a general framework for approximating regular conditional distributions (RCDs). Our approximations of these RCDs are implemented by a new class of geometric deep learning models with inputs in $mathbb{R}^d$ and outputs in the Wasserstein-$1$ space $mathcal{P}_1(mathbb{R}^D)$. We find that the models built using our framework can approximate any continuous functions from $mathbb{R}^d$ to $mathcal{P}_1(mathbb{R}^D)$ uniformly on compacts, and quantitative rates are obtained. We identify two methods for avoiding the curse of dimensionality; i.e.: the number of parameters determining the approximating neural network depends only polynomially on the involved dimension and the approximation error. The first solution describes functions in $C(mathbb{R}^d,mathcal{P}_1(mathbb{R}^D))$ which can be efficiently approximated on any compact subset of $mathbb{R}^d$. Conversely, the second approach describes sets in $mathbb{R}^d$, on which any function in $C(mathbb{R}^d,mathcal{P}_1(mathbb{R}^D))$ can be efficiently approximated. Our framework is used to obtain an affirmative answer to the open conjecture of Bishop (1994); namely: mixture density networks are universal regular conditional distributions. The predictive performance of the proposed models is evaluated against comparable learning models on various probabilistic predictions tasks in the context of ELMs, model uncertainty, and heteroscedastic regression. All the results are obtained for more general input and output spaces and thus apply to geometric deep learning contexts.
Modeling distributions of covariates, or density estimation, is a core challenge in unsupervised learning. However, the majority of work only considers the joint distribution, which has limited utility in practical situations. A more general and usef
Modifications to a neural networks input and output layers are often required to accommodate the specificities of most practical learning tasks. However, the impact of such changes on architectures approximation capabilities is largely not understood
We give a universal kernel that renders all the regular languages linearly separable. We are not able to compute this kernel efficiently and conjecture that it is intractable, but we do have an efficient $eps$-approximation.
Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is associated with a
The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compress