No Arabic abstract
We consider learning a sparse pairwise Markov Random Field (MRF) with continuous-valued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finite-sample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRFs with continuous variables and also has desirable asymptotic properties, including consistency and normality under mild conditions. Further, we establish that the population version of the optimization criterion employed in Vuffray et al. (2019) can be interpreted as local maximum likelihood estimation (MLE). As part of our analysis, we introduce a robust variation of sparse linear regression a` la Lasso, which may be of interest in its own right.
Despite the availability of many Markov Random Field (MRF) optimization algorithms, their widespread usage is currently limited due to imperfect MRF modelling arising from hand-crafted model parameters and the selection of inferior inference algorithm. In addition to differentiability, the two main aspects that enable learning these model parameters are the forward and backward propagation time of the MRF optimization algorithm and its inference capabilities. In this work, we introduce two fast and differentiable message passing algorithms, namely, Iterative Semi-Global Matching Revised (ISGMR) and Parallel Tree-Reweighted Message Passing (TRWP) which are greatly sped up on a GPU by exploiting massive parallelism. Specifically, ISGMR is an iterative and revised version of the standard SGM for general pairwise MRFs with improved optimization effectiveness, and TRWP is a highly parallel version of Sequential TRW (TRWS) for faster optimization. Our experiments on the standard stereo and denoising benchmarks demonstrated that ISGMR and TRWP achieve much lower energies than SGM and Mean-Field (MF), and TRWP is two orders of magnitude faster than TRWS without losing effectiveness in optimization. We further demonstrated the effectiveness of our algorithms on end-to-end learning for semantic segmentation. Notably, our CUDA implementations are at least $7$ and $700$ times faster than PyTorch GPU implementations for forward and backward propagation respectively, enabling efficient end-to-end learning with message passing.
We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS).
We derive two sufficient conditions for a function of a Markov random field (MRF) on a given graph to be a MRF on the same graph. The first condition is information-theoretic and parallels a recent information-theoretic characterization of lumpability of Markov chains. The second condition, which is easier to check, is based on the potential functions of the corresponding Gibbs field. We illustrate our sufficient conditions at the hand of several examples and discuss implications for practical applications of MRFs. As a side result, we give a partial characterization of functions of MRFs that are information-preserving.
We propose a novel framework, called Markov-Lipschitz deep learning (MLDL), to tackle geometric deterioration caused by collapse, twisting, or crossing in vector-based neural network transformations for manifold-based representation learning and manifold data generation. A prior constraint, called locally isometric smoothness (LIS), is imposed across-layers and encoded into a Markov random field (MRF)-Gibbs distribution. This leads to the best possible solutions for local geometry preservation and robustness as measured by locally geometric distortion and locally bi-Lipschitz continuity. Consequently, the layer-wise vector transformations are enhanced into well-behaved, LIS-constrained metric homeomorphisms. Extensive experiments, comparisons, and ablation study demonstrate significant advantages of MLDL for manifold learning and manifold data generation. MLDL is general enough to enhance any vector transformation-based networks. The code is available at https://github.com/westlake-cairi/Markov-Lipschitz-Deep-Learning.
Estimating global pairwise interaction effects, i.e., the difference between the joint effect and the sum of marginal effects of two input features, with uncertainty properly quantified, is centrally important in science applications. We propose a non-parametric probabilistic method for detecting interaction effects of unknown form. First, the relationship between the features and the output is modelled using a Bayesian neural network, capable of representing complex interactions and principled uncertainty. Second, interaction effects and their uncertainty are estimated from the trained model. For the second step, we propose an intuitive global interaction measure: Bayesian Group Expected Hessian (GEH), which aggregates information of local interactions as captured by the Hessian. GEH provides a natural trade-off between type I and type II error and, moreover, comes with theoretical guarantees ensuring that the estimated interaction effects and their uncertainty can be improved by training a more accurate BNN. The method empirically outperforms available non-probabilistic alternatives on simulated and real-world data. Finally, we demonstrate its ability to detect interpretable interactions between higher-level features (at deeper layers of the neural network).