No Arabic abstract
The Quadratic Assignment Problem (QAP) is a well-known permutation-based combinatorial optimization problem with real applications in industrial and logistics environments. Motivated by the challenge that this NP-hard problem represents, it has captured the attention of the optimization community for decades. As a result, a large number of algorithms have been proposed to tackle this problem. Among these, exact methods are only able to solve instances of size $n<40$. To overcome this limitation, many metaheuristic methods have been applied to the QAP. In this work, we follow this direction by approaching the QAP through Estimation of Distribution Algorithms (EDAs). Particularly, a non-parametric distance-based exponential probabilistic model is used. Based on the analysis of the characteristics of the QAP, and previous work in the area, we introduce Kernels of Mallows Model under the Hamming distance to the context of EDAs. Conducted experiments point out that the performance of the proposed algorithm in the QAP is superior to (i) the classical EDAs adapted to deal with the QAP, and also (ii) to the specific EDAs proposed in the literature to deal with permutation problems.
Computing efficiently a robust measure of similarity or dissimilarity between graphs is a major challenge in Pattern Recognition. The Graph Edit Distance (GED) is a flexible measure of dissimilarity between graphs which arises in error-tolerant graph matching. It is defined from an optimal sequence of edit operations (edit path) transforming one graph into an other. Unfortunately, the exact computation of this measure is NP-hard. In the last decade, several approaches have been proposed to approximate the GED in polynomial time, mainly by solving linear programming problems. Among them, the bipartite GED has received much attention. It is deduced from a linear sum assignment of the nodes of the two graphs, which can be efficiently computed by Hungarian-type algorithms. However, edit operations on nodes and edges are not handled simultaneously, which limits the accuracy of the approximation. To overcome this limitation, we propose to extend the linear assignment model to a quadratic one, for directed or undirected graphs having labelized nodes and edges. This is realized through the definition of a family of edit paths induced by assignments between nodes. We formally show that the GED, restricted to the paths in this family, is equivalent to a quadratic assignment problem. Since this problem is NP-hard, we propose to compute an approximate solution by an adaptation of the Integer Projected Fixed Point method. Experiments show that the proposed approach is generally able to reach a more accurate approximation of the optimal GED than the bipartite GED, with a computational cost that is still affordable for graphs of non trivial sizes.
In this paper, we show that the quadratic assignment problem (QAP) can be reformulated to an equivalent rank constrained doubly nonnegative (DNN) problem. Under the framework of the difference of convex functions (DC) approach, a semi-proximal DC algorithm (DCA) is proposed for solving the relaxation of the rank constrained DNN problem whose subproblems can be solved by the semi-proximal augmented Lagrangian method (sPALM). We show that the generated sequence converges to a stationary point of the corresponding DC problem, which is feasible to the rank constrained DNN problem. Moreover, numerical experiments demonstrate that for most QAP instances, the proposed approach can find the global optimal solutions efficiently, and for others, the proposed algorithm is able to provide good feasible solutions in a reasonable time.
Let $D$ denote the distance matrix for an $n+1$ point metric space $(X,d)$. In the case that $X$ is an unweighted metric tree, the sum of the entries in $D^{-1}$ is always equal to $2/n$. Such trees can be considered as affinely independent subsets of the Hamming cube $H_n$, and it was conjectured that the value $2/n$ was minimal among all such subsets. In this paper we confirm this conjecture and give a geometric interpretation of our result which applies to any subset of $H_n$.
In this paper, we consider mixtures of two Mallows models for top-$k$ rankings, both with the same location parameter but with different scale parameters, i.e., a mixture of concentric Mallows models. This situation arises when we have a heterogeneous population of voters formed by two homogeneous populations, one of which is a subpopulation of expert voters while the other includes the non-expert voters. We propose efficient sampling algorithms for Mallows top-$k$ rankings. We show the identifiability of both components, and the learnability of their respective parameters in this setting by, first, bounding the sample complexity for the Borda algorithm with top-$k$ rankings and second, proposing polynomial time algorithm for the separation of the rankings in each component. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.
Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been application-driven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions of our theory are violated.