No Arabic abstract
This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of travel between each pair of adjacent locations. Two types of agents are located at the nodes of this network. The buyers choose the most competitive sellers depending on their prices and the cost to reach them. Their utility is assumed additive in both these quantities. Each seller, taking as given other sellers prices, sets her own price to have a demand equal to the one we observed. We give a linear programming formulation for the equilibrium conditions. After formally introducing our model we apply it on two examples: prices offered by petrol stations and quality of services provided by maternity wards. These examples illustrate the applicability of our model to aggregate demand, rank prices and estimate cost structure over the network. We insist on the possibility of applications to large scale data sets using modern linear programming solvers such as Gurobi. In addition to this paper we released a R toolbox to implement our results and an online tutorial (http://optimalnetwork.github.io)
Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join, and that it leads to quadratically fewer output tuples. However, unfortunately, this result has little applicability to the key questions practitioners face. For example, the success metric is often the final approximations accuracy, rather than output cardinality. Moreover, there are many non-uniform sampling strategies that one can employ. Is sampling for joins still futile in all of these settings? If not, what is the best sampling strategy in each case? To the best of our knowledge, there is no formal study answering these questions. This paper aims to improve our understanding of sample-based joins and offer a guideline for practitioners building and using real-world AQP systems. We study limitations of offline samples in approximating join queries: given an offline sampling budget, how well can one approximate the join of two tables? We answer this question for two success metrics: output size and estimator variance. We show that maximizing output size is easy, while there is an information-theoretical lower bound on the lowest variance achievable by any sampling strategy. We then define a hybrid sampling scheme that captures all combinations of stratified, universe, and Bernoulli sampling, and show that this scheme with our optimal parameters achieves the theoretical lower bound within a constant factor. Since computing these optimal parameters requires shuffling statistics across the network, we also propose a decentralized variant where each node acts autonomously using minimal statistics.
Weighting methods are a common tool to de-bias estimates of causal effects. And though there are an increasing number of seemingly disparate methods, many of them can be folded into one unifying regime: causal optimal transport. This new method directly targets distributional balance by minimizing optimal transport distances between treatment and control groups or, more generally, between a source and target population. Our approach is model-free but can also incorporate moments or any other important functions of covariates that the researcher desires to balance. We find that the causal optimal transport outperforms competitor methods when both the propensity score and outcome models are misspecified, indicating it is a robust alternative to common weighting methods. Finally, we demonstrate the utility of our method in an external control study examining the effect of misoprostol versus oxytocin for treatment of post-partum hemorrhage.
Optimal transport has become part of the standard quantitative economics toolbox. It is the framework of choice to describe models of matching with transfers, but beyond that, it allows to: extend quantile regression; identify discrete choice models; provide new algorithms for computing the random coefficient logit model; and generalize the gravity model in trade. This paper offer a brief review of the basics of the theory, its applications to economics, and some extensions.
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the complexity of the Sinkhorn algorithm for finding an $varepsilon$-approximate solution to the UOT problem is of order $widetilde{mathcal{O}}(n^2/ varepsilon)$, which is near-linear time. To the best of our knowledge, this complexity is better than the complexity of the Sinkhorn algorithm for solving the Optimal Transport (OT) problem, which is of order $widetilde{mathcal{O}}(n^2/varepsilon^2)$. Our proof technique is based on the geometric convergence of the Sinkhorn updates to the optimal dual solution of the entropic regularized UOT problem and some properties of the primal solution. It is also different from the proof for the complexity of the Sinkhorn algorithm for approximating the OT problem since the UOT solution does not have to meet the marginal constraints.
In this paper, we first revisit the Koenker and Bassett variational approach to (univariate) quantile regression, emphasizing its link with latent factor representations and correlation maximization problems. We then review the multivariate extension due to Carlier et al. (2016, 2017) which relates vector quantile regression to an optimal transport problem with mean independence constraints. We introduce an entropic regularization of this problem, implement a gradient descent numerical method and illustrate its feasibility on univariate and bivariate examples.