Do you want to publish a course? Click here

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

79   0   0.0 ( 0 )
 Added by Xiaocheng Li
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

We study an online linear programming (OLP) problem under a random input model in which the columns of the constraint matrix along with the corresponding coefficients in the objective function are generated i.i.d. from an unknown distribution and revealed sequentially over time. Virtually all pre-existing online algorithms were based on learning the dual optimal solutions/prices of the linear programs (LP), and their analyses were focused on the aggregate objective value and solving the packing LP where all coefficients in the constraint matrix and objective are nonnegative. However, two major open questions were: (i) Does the set of LP optimal dual prices learned in the pre-existing algorithms converge to those of the offline LP, and (ii) Could the results be extended to general LP problems where the coefficients can be either positive or negative. We resolve these two questions by establishing convergence results for the dual prices under moderate regularity conditions for general LP problems. Specifically, we identify an equivalent form of the dual problem which relates the dual LP with a sample average approximation to a stochastic program. Furthermore, we propose a new type of OLP algorithm, Action-History-Dependent Learning Algorithm, which improves the previous algorithm performances by taking into account the past input data as well as decisions/actions already made. We derive an $O(log n log log n)$ regret bound (under a locally strong convexity and smoothness condition) for the proposed algorithm, against the $O(sqrt{n})$ bound for typical dual-price learning algorithms, where $n$ is the number of decision variables. Numerical experiments demonstrate the effectiveness of the proposed algorithm and the action-history-dependent design.



rate research

Read More

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and find the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.
This paper considers a variant of the online paging problem, where the online algorithm has access to multiple predictors, each producing a sequence of predictions for the page arrival times. The predictors may have occasional prediction errors and it is assumed that at least one of them makes a sublinear number of prediction errors in total. Our main result states that this assumption suffices for the design of a randomized online algorithm whose time-average regret with respect to the optimal offline algorithm tends to zero as the time tends to infinity. This holds (with different regret bounds) for both the full information access model, where in each round, the online algorithm gets the predictions of all predictors, and the bandit access model, where in each round, the online algorithm queries a single predictor. While online algorithms that exploit inaccurate predictions have been a topic of growing interest in the last few years, to the best of our knowledge, this is the first paper that studies this topic in the context of multiple predictors for an online problem with unbounded request sequences. Moreover, to the best of our knowledge, this is also the first paper that aims for (and achieves) online algorithms with a vanishing regret for a classic online problem under reasonable assumptions.
In this paper, we develop a simple and fast online algorithm for solving a class of binary integer linear programs (LPs) arisen in general resource allocation problem. The algorithm requires only one single pass through the input data and is free of doing any matrix inversion. It can be viewed as both an approximate algorithm for solving binary integer LPs and a fast algorithm for solving online LP problems. The algorithm is inspired by an equivalent form of the dual problem of the relaxed LP and it essentially performs (one-pass) projected stochastic subgradient descent in the dual space. We analyze the algorithm in two different models, stochastic input and random permutation, with minimal technical assumptions on the input data. The algorithm achieves $Oleft(m sqrt{n}right)$ expected regret under the stochastic input model and $Oleft((m+log n)sqrt{n}right)$ expected regret under the random permutation model, and it achieves $O(m sqrt{n})$ expected constraint violation under both models, where $n$ is the number of decision variables and $m$ is the number of constraints. The algorithm enjoys the same performance guarantee when generalized to a multi-dimensional LP setting which covers a wider range of applications. In addition, we employ the notion of permutational Rademacher complexity and derive regret bounds for two earlier online LP algorithms for comparison. Both algorithms improve the regret bound with a factor of $sqrt{m}$ by paying more computational cost. Furthermore, we demonstrate how to convert the possibly infeasible solution to a feasible one through a randomized procedure. Numerical experiments illustrate the general applicability and effectiveness of the algorithms.
Consider an online facility assignment problem where a set of facilities $F = { f_1, f_2, f_3, cdots, f_{|F|} }$ of equal capacity $l$ is situated on a metric space and customers arrive one by one in an online manner on that space. We assign a customer $c_i$ to a facility $f_j$ before a new customer $c_{i+1}$ arrives. The cost of this assignment is the distance between $c_i$ and $f_j$. The objective of this problem is to minimize the sum of all assignment costs. Recently Ahmed et al. (TCS, 806, pp. 455-467, 2020) studied the problem where the facilities are situated on a line and computed competitive ratio of Algorithm Greedy which assigns the customer to the nearest available facility. They computed competitive ratio of algorithm named Algorithm Optimal-Fill which assigns the new customer considering optimal assignment of all previous customers. They also studied the problem where the facilities are situated on a connected unweighted graph. In this paper we first consider that $F$ is situated on the vertices of a connected unweighted grid graph $G$ of size $r times c$ and customers arrive one by one having positions on the vertices of $G$. We show that Algorithm Greedy has competitive ratio $r times c + r + c$ and Algorithm Optimal-Fill has competitive ratio $O(r times c)$. We later show that the competitive ratio of Algorithm Optimal-Fill is $2|F|$ for any arbitrary graph. Our bound is tight and better than the previous result. We also consider the facilities are distributed arbitrarily on a plane and provide an algorithm for the scenario. We also provide an algorithm that has competitive ratio $(2n-1)$. Finally, we consider a straight line metric space and show that no algorithm for the online facility assignment problem has competitive ratio less than $9.001$.
A dominant approach to solving large imperfect-information games is Counterfactural Regret Minimization (CFR). In CFR, many regret minimization problems are combined to solve the game. For very large games, abstraction is typically needed to render CFR tractable. Abstractions are often manually tuned, possibly removing important strategic differences in the full game and harming performance. Function approximation provides a natural solution to finding good abstractions to approximate the full game. A common approach to incorporating function approximation is to learn the inputs needed for a regret minimizing algorithm, allowing for generalization across many regret minimization problems. This paper gives regret bounds when a regret minimizing algorithm uses estimates instead of true values. This form of analysis is the first to generalize to a larger class of $(Phi, f)$-regret matching algorithms, and includes different forms of regret such as swap, internal, and external regret. We demonstrate how these results give a slightly tighter bound for Regression Regret-Matching (RRM), and present a novel bound for combining regression with Hedge.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا