ترغب بنشر مسار تعليمي؟ اضغط هنا

Wasserstein geometry and information geometry are two important structures to be introduced in a manifold of probability distributions. Wasserstein geometry is defined by using the transportation cost between two distributions, so it reflects the met ric of the base manifold on which the distributions are defined. Information geometry is defined to be invariant under reversible transformations of the base space. Both have their own merits for applications. In particular, statistical inference is based upon information geometry, where the Fisher metric plays a fundamental role, whereas Wasserstein geometry is useful in computer vision and AI applications. In this study, we analyze statistical inference based on the Wasserstein geometry in the case that the base space is one-dimensional. By using the location-scale model, we further derive the W-estimator that explicitly minimizes the transportation cost from the empirical distribution to a statistical model and study its asymptotic behaviors. We show that the W-estimator is consistent and explicitly give its asymptotic distribution by using the functional delta method. The W-estimator is Fisher efficient in the Gaussian case.
Matrix scaling is a classical problem with a wide range of applications. It is known that the Sinkhorn algorithm for matrix scaling is interpreted as alternating e-projections from the viewpoint of classical information geometry. Recently, a generali zation of matrix scaling to completely positive maps called operator scaling has been found to appear in various fields of mathematics and computer science, and the Sinkhorn algorithm has been extended to operator scaling. In this study, the operator Sinkhorn algorithm is studied from the viewpoint of quantum information geometry through the Choi representation of completely positive maps. The operator Sinkhorn algorithm is shown to coincide with alternating e-projections with respect to the symmetric logarithmic derivative metric, which is a Riemannian metric on the space of quantum states relevant to quantum estimation theory. Other types of alternating e-projections algorithms are also provided by using different information geometric structures on the positive definite cone.
We consider parameter estimation of ordinary differential equation (ODE) models from noisy observations. For this problem, one conventional approach is to fit numerical solutions (e.g., Euler, Runge--Kutta) of ODEs to data. However, such a method doe s not account for the discretization error in numerical solutions and has limited estimation accuracy. In this study, we develop an estimation method that quantifies the discretization error based on data. The key idea is to model the discretization error as random variables and estimate their variance simultaneously with the ODE parameter. The proposed method has the form of iteratively reweighted least squares, where the discretization error variance is updated with the isotonic regression algorithm and the ODE parameter is updated by solving a weighted least squares problem using the adjoint system. Experimental results demonstrate that the proposed method attains robust estimation with at least comparable accuracy to the conventional method by successfully quantifying the reliability of the numerical solutions.
Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed wh ich do not require explicit computation of the normalization constant, such as noise contrastive estimation (NCE) and score matching. However, model selection methods for general non-normalized models have not been proposed so far. In this study, we develop information criteria for non-normalized models estimated by NCE or score matching. They are approximately unbiased estimators of discrepancy measures for non-normalized models. Simulation results and applications to real data demonstrate that the proposed criteria enable selection of the appropriate non-normalized model in a data-driven manner.
We investigate predictive density estimation under the $L^2$ Wasserstein loss for location families and location-scale families. We show that plug-in densities form a complete class and that the Bayesian predictive density is given by the plug-in den sity with the posterior mean of the location and scale parameters. We provide Bayesian predictive densities that dominate the best equivariant one in normal models.
We investigate upper and lower hedging prices of multivariate contingent claims from the viewpoint of game-theoretic probability and submodularity. By considering a game between Market and Investor in discrete time, the pricing problem is reduced to a backward induction of an optimization over simplexes. For European options with payoff functions satisfying a combinatorial property called submodularity or supermodularity, this optimization is solved in closed form by using the Lovasz extension and the upper and lower hedging prices can be calculated efficiently. This class includes the options on the maximum or the minimum of several assets. We also study the asymptotic behavior as the number of game rounds goes to infinity. The upper and lower hedging prices of European options converge to the solutions of the Black-Scholes-Barenblatt equations. For European options with submodular or supermodular payoff functions, the Black-Scholes-Barenblatt equation is reduced to the linear Black-Scholes equation and it is solved in closed form. Numerical results show the validity of the theoretical results.
We develop a general method for estimating a finite mixture of non-normalized models. Here, a non-normalized model is defined to be a parametric distribution with an intractable normalization constant. Existing methods for estimating non-normalized m odels without computing the normalization constant are not applicable to mixture models because they contain more than one intractable normalization constant. The proposed method is derived by extending noise contrastive estimation (NCE), which estimates non-normalized models by discriminating between the observed data and some artificially generated noise. We also propose an extension of NCE with multiple noise distributions. Then, based on the observation that conventional classification learning with neural networks is implicitly assuming an exponential family as a generative model, we introduce a method for clustering unlabeled data by estimating a finite mixture of distributions in an exponential family. Estimation of this mixture model is attained by the proposed extensions of NCE where the training data of neural networks are used as noise. Thus, the proposed method provides a probabilistically principled clustering method that is able to utilize a deep representation. Application to image clustering using a deep neural network gives promising results.
We develop an empirical Bayes (EB) algorithm for the matrix completion problems. The EB algorithm is motivated from the singular value shrinkage estimator for matrix means by Efron and Morris (1972). Since the EB algorithm is essentially the EM algor ithm applied to a simple model, it does not require heuristic parameter tuning other than tolerance. Numerical results demonstrated that the EB algorithm achieves a good trade-off between accuracy and efficiency compared to existing algorithms and that it works particularly well when the difference between the number of rows and columns is large. Application to real data also shows the practical utility of the EB algorithm.
We develop singular value shrinkage priors for the mean matrix parameters in the matrix-variate normal model with known covariance matrices. Our priors are superharmonic and put more weight on matrices with smaller singular values. They are a natural generalization of the Stein prior. Bayes estimators and Bayesian predictive densities based on our priors are minimax and dominate those based on the uniform prior in finite samples. In particular, our priors work well when the true value of the parameter has low rank.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا