No Arabic abstract
Characterizing the appearance of real-world surfaces is a fundamental problem in multidimensional reflectometry, computer vision and computer graphics. For many applications, appearance is sufficiently well characterized by the bidirectional reflectance distribution function (BRDF). We treat BRDF measurements as samples of points from high-dimensional non-linear non-convex manifolds. BRDF manifolds form an infinite-dimensional space, but typically the available measurements are very scarce for complicated problems such as BRDF estimation. Therefore, an efficient learning strategy is crucial when performing the measurements. In this paper, we build the foundation of a mathematical framework that allows to develop and apply new techniques within statistical design of experiments and generalized proactive learning, in order to establish more efficient sampling and measurement strategies for BRDF data manifolds.
Normalizing flows are invertible neural networks with tractable change-of-volume terms, which allows optimization of their parameters to be efficiently performed via maximum likelihood. However, data of interest is typically assumed to live in some (often unknown) low-dimensional manifold embedded in high-dimensional ambient space. The result is a modelling mismatch since -- by construction -- the invertibility requirement implies high-dimensional support of the learned distribution. Injective flows, mapping from low- to high-dimensional space, aim to fix this discrepancy by learning distributions on manifolds, but the resulting volume-change term becomes more challenging to evaluate. Current approaches either avoid computing this term entirely using various heuristics, or assume the manifold is known beforehand and therefore are not widely applicable. Instead, we propose two methods to tractably calculate the gradient of this term with respect to the parameters of the model, relying on careful use of automatic differentiation and techniques from numerical linear algebra. Both approaches perform end-to-end nonlinear manifold learning and density estimation for data projected onto this manifold. We study the trade-offs between our proposed methods, empirically verify that we outperform approaches ignoring the volume-change term by more accurately learning manifolds and the corresponding distributions on them, and show promising results on out-of-distribution detection.
We introduce manifold-learning flows (M-flows), a new class of generative models that simultaneously learn the data manifold as well as a tractable probability density on that manifold. Combining aspects of normalizing flows, GANs, autoencoders, and energy-based models, they have the potential to represent datasets with a manifold structure more faithfully and provide handles on dimensionality reduction, denoising, and out-of-distribution detection. We argue why such models should not be trained by maximum likelihood alone and present a new training algorithm that separates manifold and density updates. In a range of experiments we demonstrate how M-flows learn the data manifold and allow for better inference than standard flows in the ambient data space.
Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution. We formalize this bias and investigate the situations in which it can be harmful and sometimes even helpful. We further introduce novel corrective weights to remove bias when doing so is beneficial. Through this, our work not only provides a useful mechanism that can improve the active learning approach, but also an explanation of the empirical successes of various existing approaches which ignore this bias. In particular, we show that this bias can be actively helpful when training overparameterized models -- like neural networks -- with relatively little data.
We present a study of kernel MMD two-sample test statistics in the manifold setting, assuming the high-dimensional observations are close to a low-dimensional manifold. We characterize the property of the test (level and power) in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, we show that when data densities are supported on a $d$-dimensional sub-manifold $mathcal{M}$ embedded in an $m$-dimensional space, the kernel MMD two-sample test for data sampled from a pair of distributions $(p, q)$ that are Holder with order $beta$ is consistent and powerful when the number of samples $n$ is greater than $delta_2(p,q)^{-2-d/beta}$ up to certain constant, where $delta_2$ is the squared $ell_2$-divergence between two distributions on manifold. Moreover, to achieve testing consistency under this scaling of $n$, our theory suggests that the kernel bandwidth $gamma$ scales with $n^{-1/(d+2beta)}$. These results indicate that the kernel MMD two-sample test does not have a curse-of-dimensionality when the data lie on the low-dimensional manifold. We demonstrate the validity of our theory and the property of the MMD test for manifold data using several numerical experiments.
This paper is part of an ongoing program to develop a theory of generalized differential geometry. We consider the space $mathcal{G}[X,Y]$ of Colombeau generalized functions defined on a manifold $X$ and taking values in a manifold $Y$. This space is essential in order to study concepts such as flows of generalized vector fields or geodesics of generalized metrics. We introduce an embedding of the space of continuous mappings $mathcal{C}(X,Y)$ into $mathcal{G}[X,Y]$ and study the sheaf properties of $mathcal{G}[X,Y]$. Similar results are obtained for spaces of generalized vector bundle homomorphisms. Based on these constructions we propose the definition of a space $mathcal{D}[X,Y]$ of distributions on $X$ taking values in $Y$. $mathcal{D}[X,Y]$ is realized as a quotient of a certain subspace of $mathcal{G}[X,Y]$.