ﻻ يوجد ملخص باللغة العربية
We study an online linear programming (OLP) problem under a random input model in which the columns of the constraint matrix along with the corresponding coefficients in the objective function are generated i.i.d. from an unknown distribution and revealed sequentially over time. Virtually all pre-existing online algorithms were based on learning the dual optimal solutions/prices of the linear programs (LP), and their analyses were focused on the aggregate objective value and solving the packing LP where all coefficients in the constraint matrix and objective are nonnegative. However, two major open questions were: (i) Does the set of LP optimal dual prices learned in the pre-existing algorithms converge to those of the offline LP, and (ii) Could the results be extended to general LP problems where the coefficients can be either positive or negative. We resolve these two questions by establishing convergence results for the dual prices under moderate regularity conditions for general LP problems. Specifically, we identify an equivalent form of the dual problem which relates the dual LP with a sample average approximation to a stochastic program. Furthermore, we propose a new type of OLP algorithm, Action-History-Dependent Learning Algorithm, which improves the previous algorithm performances by taking into account the past input data as well as decisions/actions already made. We derive an $O(log n log log n)$ regret bound (under a locally strong convexity and smoothness condition) for the proposed algorithm, against the $O(sqrt{n})$ bound for typical dual-price learning algorithms, where $n$ is the number of decision variables. Numerical experiments demonstrate the effectiveness of the proposed algorithm and the action-history-dependent design.
We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number of batche
This paper considers a variant of the online paging problem, where the online algorithm has access to multiple predictors, each producing a sequence of predictions for the page arrival times. The predictors may have occasional prediction errors and i
In this paper, we develop a simple and fast online algorithm for solving a class of binary integer linear programs (LPs) arisen in general resource allocation problem. The algorithm requires only one single pass through the input data and is free of
Consider an online facility assignment problem where a set of facilities $F = { f_1, f_2, f_3, cdots, f_{|F|} }$ of equal capacity $l$ is situated on a metric space and customers arrive one by one in an online manner on that space. We assign a custom
A dominant approach to solving large imperfect-information games is Counterfactural Regret Minimization (CFR). In CFR, many regret minimization problems are combined to solve the game. For very large games, abstraction is typically needed to render C