No Arabic abstract
Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the multi-criteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as our novel Fair-PCA problem and the Nash Social Welfare (NSW) problem. In Fair-PCA, the input data is divided into $k$ groups, and the goal is to find a single $d$-dimensional representation for all groups for which the minimum variance of any one group is maximized. In NSW, the goal is to maximize the product of the individual variances of the groups achieved by the common low-dimensional space. Our main result is an exact polynomial-time algorithm for the two-criterion dimensionality reduction problem when the two criteria are increasing concave functions. As an application of this result, we obtain a polynomial time algorithm for Fair-PCA for $k=2$ groups and a polynomial time algorithm for NSW objective for $k=2$ groups. We also give approximation algorithms for $k>2$. Our technical contribution in the above results is to prove new low-rank properties of extreme point solutions to semi-definite programs. We conclude with experiments indicating the effectiveness of algorithms based on extreme point solutions of semi-definite programs on several real-world data sets.
Grassmann manifolds have been widely used to represent the geometry of feature spaces in a variety of problems in medical imaging and computer vision including but not limited to shape analysis, action recognition, subspace clustering and motion segmentation. For these problems, the features usually lie in a very high-dimensional Grassmann manifold and hence an appropriate dimensionality reduction technique is called for in order to curtail the computational burden. To this end, the Principal Geodesic Analysis (PGA), a nonlinear extension of the well known principal component analysis, is applicable as a general tool to many Riemannian manifolds. In this paper, we propose a novel framework for dimensionality reduction of data in Riemannian homogeneous spaces and then focus on the Grassman manifold which is an example of a homogeneous space. Our framework explicitly exploits the geometry of the homogeneous space yielding reduced dimensional nested sub-manifolds that need not be geodesic submanifolds and thus are more expressive. Specifically, we project points in a Grassmann manifold to an embedded lower dimensional Grassmann manifold. A salient feature of our method is that it leads to higher expressed variance compared to PGA which we demonstrate via synthetic and real data experiments.
We show how to sketch semidefinite programs (SDPs) using positive maps in order to reduce their dimension. More precisely, we use Johnsonhyp{}Lindenstrauss transforms to produce a smaller SDP whose solution preserves feasibility or approximates the value of the original problem with high probability. These techniques allow to improve both complexity and storage space requirements. They apply to problems in which the Schatten 1-norm of the matrices specifying the SDP and also of a solution to the problem is constant in the problem size. Furthermore, we provide some results which clarify the limitations of positive, linear sketches in this setting.
Following the groundbreaking algorithm of Moser and Tardos for the Lovasz Local Lemma (LLL), there has been a plethora of results analyzing local search algorithms for various constraint satisfaction problems. The algorithms considered fall into two broad categories: resampling algorithms, analyzed via different algorithmic LLL conditions; and backtracking algorithms, analyzed via entropy compression arguments. This paper introduces a new convergence condition that seamlessly handles resampling, backtracking, and hybrid algorithms, i.e., algorithms that perform both resampling and backtracking steps. Unlike all past LLL work, our condition replaces the notion of a dependency or causality graph by quantifying point-to-set correlations between bad events. As a result, our condition simultaneously: (i)~captures the most general algorithmic LLL condition known as a special case; (ii)~significantly simplifies the analysis of entropy compression applications; (iii)~relates backtracking algorithms, which are conceptually very different from resampling algorithms, to the LLL; and most importantly (iv)~allows for the analysis of hybrid algorithms, which were outside the scope of previous techniques. We give several applications of our condition, including a new hybrid vertex coloring algorithm that extends the recent breakthrough result of Molloy for coloring triangle-free graphs to arbitrary graphs.
Many, if not most network analysis algorithms have been designed specifically for single-relational networks; that is, networks in which all edges are of the same type. For example, edges may either represent friendship, kinship, or collaboration, but not all of them together. In contrast, a multi-relational network is a network with a heterogeneous set of edge labels which can represent relationships of various types in a single data structure. While multi-relational networks are more expressive in terms of the variety of relationships they can capture, there is a need for a general framework for transferring the many single-relational network analysis algorithms to the multi-relational domain. It is not sufficient to execute a single-relational network analysis algorithm on a multi-relational network by simply ignoring edge labels. This article presents an algebra for mapping multi-relational networks to single-relational networks, thereby exposing them to single-relational network analysis algorithms.
We discuss the theory of certain partially ordered sets that capture the structure of commutation classes of words in monoids. As a first application, it follows readily that counting words in commutation classes is #P-complete. We then apply the partially ordered sets to Coxeter groups. Some results are a proof that enumerating the reduced words of elements of Coxeter groups is #P-complete, a recursive formula for computing the number of commutation classes of reduced words, as well as stronger bounds on the maximum number of commutation classes than were previously known. This also allows us to improve the known bounds on the number of primitive sorting networks.