No Arabic abstract
This paper proposes a generalized framework with joint normalization which learns lower-dimensional subspaces with maximum discriminative power by making use of the Riemannian geometry. In particular, we model the similarity/dissimilarity between subspaces using various metrics defined on Grassmannian and formulate dimen-sionality reduction as a non-linear constraint optimization problem considering the orthogonalization. To obtain the linear mapping, we derive the components required to per-form Riemannian optimization (e.g., Riemannian conju-gate gradient) from the original Grassmannian through an orthonormal projection. We respect the Riemannian ge-ometry of the Grassmann manifold and search for this projection directly from one Grassmann manifold to an-other face-to-face without any additional transformations. In this natural geometry-aware way, any metric on the Grassmann manifold can be resided in our model theoreti-cally. We have combined five metrics with our model and the learning process can be treated as an unconstrained optimization problem on a Grassmann manifold. Exper-iments on several datasets demonstrate that our approach leads to a significant accuracy gain over state-of-the-art methods.
We propose a novel second-order ODE as the continuous-time limit of a Riemannian accelerated gradient-based method on a manifold with curvature bounded from below. This ODE can be seen as a generalization of the ODE derived for Euclidean spaces, and can also serve as an analysis tool. We study the convergence behavior of this ODE for different classes of functions, such as geodesically convex, strongly-convex and weakly-quasi-convex. We demonstrate how such an ODE can be discretized using a semi-implicit and Nesterov-inspired numerical integrator, that empirically yields stable algorithms which are faithful to the continuous-time analysis and exhibit accelerated convergence.
The vast majority of Dimensionality Reduction (DR) techniques rely on second-order statistics to define their optimization objective. Even though this provides adequate results in most cases, it comes with several shortcomings. The methods require carefully designed regularizers and they are usually prone to outliers. In this work, a new DR framework, that can directly model the target distribution using the notion of similarity instead of distance, is introduced. The proposed framework, called Similarity Embedding Framework, can overcome the aforementioned limitations and provides a conceptually simpler way to express optimization targets similar to existing DR techniques. Deriving a new DR technique using the Similarity Embedding Framework becomes simply a matter of choosing an appropriate target similarity matrix. A variety of classical tasks, such as performing supervised dimensionality reduction and providing out-of-of-sample extensions, as well as, new novel techniques, such as providing fast linear embeddings for complex techniques, are demonstrated in this paper using the proposed framework. Six datasets from a diverse range of domains are used to evaluate the proposed method and it is demonstrated that it can outperform many existing DR techniques.
We explore the application of linear discriminant analysis (LDA) to the features obtained in different layers of pretrained deep convolutional neural networks (CNNs). The advantage of LDA compared to other techniques in dimensionality reduction is that it reduces dimensions while preserving the global structure of data, so distances in the low-dimensional structure found are meaningful. The LDA applied to the CNN features finds that the centroids of classes corresponding to the similar data lay closer than classes corresponding to different data. We applied the method to a modification of the MNIST dataset with ten additional classes, each new class with half of the images from one of the standard ten classes. The method finds the new classes close to the corresponding standard classes we took the data form. We also applied the method to a dataset of images of butterflies to find that related subspecies are found to be close. For both datasets, we find a performance similar to state-of-the-art methods.
There are two widely used models for the Grassmannian $operatorname{Gr}(k,n)$, as the set of equivalence classes of orthogonal matrices $operatorname{O}(n)/(operatorname{O}(k) times operatorname{O}(n-k))$, and as the set of trace-$k$ projection matrices ${P in mathbb{R}^{n times n} : P^{mathsf{T}} = P = P^2,; operatorname{tr}(P) = k}$. The former, standard in manifold optimization, has the advantage of giving numerically stable algorithms but the disadvantage of having to work with equivalence classes of matrices. The latter, widely used in coding theory and probability, has the advantage of using actual matrices (as opposed to equivalence classes) but working with projection matrices is numerically unstable. We present an alternative that has both advantages and suffers from neither of the disadvantages; by representing $k$-dimensional subspaces as symmetric orthogonal matrices of trace $2k-n$, we obtain [ operatorname{Gr}(k,n) cong {Q in operatorname{O}(n) : Q^{mathsf{T}} = Q, ; operatorname{tr}(Q) = 2k -n}. ] As with the other two models, we show that differential geometric objects and operations -- tangent vector, metric, normal vector, exponential map, geodesic, parallel transport, gradient, Hessian, etc -- have closed-form analytic expressions that are computable with standard numerical linear algebra. In the proposed model, these expressions are considerably simpler, a result of representing $operatorname{Gr}(k,n)$ as a linear section of a compact matrix Lie group $operatorname{O}(n)$, and can be computed with at most one QR decomposition and one exponential of a special skew-symmetric matrix that takes only $O(nk(n-k))$ time. In particular, we completely avoid eigen- and singular value decompositions in our steepest descent, conjugate gradient, quasi-Newton, and Newton methods for the Grassmannian.
We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize Rfw to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian linear oracle required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian linear oracle can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of Rfw against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.