ترغب بنشر مسار تعليمي؟ اضغط هنا

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesnt exceed the budget for each resource. The objective is once again to maximize the total reward. This problem turns out to be a common generalization of classic linear contextual bandits (linContextual), bandits with knapsacks (BwK), and the online stochastic packing problem (OSPP). We present algorithms with near-optimal regret bounds for this problem. Our bounds compare favorably to results on the unstructured version of the problem where the relation between the contexts and the outcomes could be arbitrary, but the algorithm only competes against a fixed set of policies accessible through an optimization oracle. We combine techniques from the work on linContextual, BwK, and OSPP in a nontrivial manner while also tackling new difficulties that are not present in any of these special cases.
We consider a contextual version of multi-armed bandit problem with global knapsack constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource consumption vector, both dependent on the context, and the global knapsack constraints require the total consumption for each resource to be below some pre-fixed budget. The learning agent competes with an arbitrary set of context-dependent policies. This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it. We give a computationally efficient algorithm for this problem with slightly better regret bounds, by generalizing the approach of Agarwal et al. (2014) for the non-constrained version of the problem. The computational time of our algorithm scales logarithmically in the size of the policy space. This answers the main open question of Badanidiyuru et al. (2014). We also extend our results to a variant where there are no knapsack constraints but the objective is an arbitrary Lipschitz concave function of the sum of outcome vectors.
We introduce the online stochastic Convex Programming (CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints. Many well-studied problems like online stochastic p acking and covering, online stochastic matching with concave returns, etc. form a special case of online stochastic CP. We present fast algorithms for these problems, which achieve near-optimal regret guarantees for both the i.i.d. and the random permutation models of stochastic inputs. When applied to the special case online packing, our ideas yield a simpler and faster primal-dual algorithm for this well studied problem, which achieves the optimal competitive ratio. Our techniques make explicit the connection of primal-dual paradigm and online learning to online stochastic CP.
In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al.[2013]. We also consider an extension of this model to allow linear contexts, similar to the linear contextual extension of the MAB model. We demonstrate that a natural and simple extension of the UCB family of algorithms for MAB provides a polynomial time algorithm that has near-optimal regret guarantees for this substantially more general model, and matches the bounds provided by Badanidiyuru et al.[2013] for the special case of BwK, which is quite surprising. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well studied problems/algorithms such as the Blackwell approachability problem, online convex optimization, and the Frank-Wolfe technique for convex optimization. We give examples of several concrete applications, where this more general model of bandits allows for richer and/or more efficient formulations of the problem.
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical pe rformance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that simultaneously proves both the optimal problem-dependent bound of $(1+epsilon)sum_i frac{ln T}{Delta_i}+O(frac{N}{epsilon^2})$ and the first near-optimal problem-independent bound of $O(sqrt{NTln T})$ on the expected regret of this algorithm. Our near-optimal problem-independent bound solves a COLT 2012 open problem of Chapelle and Li. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are conceptually simple, easily extend to distributions other than the Beta distribution, and also extend to the more general contextual bandits setting [Manuscript, Agrawal and Goyal, 2012].
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical pe rformance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studi
Recently, several new pari-mutuel mechanisms have been introduced to organize markets for contingent claims. Hanson introduced a market maker derived from the logarithmic scoring rule, and later Chen and Pennock developed a cost function formulation for the market maker. On the other hand, the SCPM model of Peters et al. is based on ideas from a call auction setting using a convex optimization model. In this work, we develop a unified framework that bridges these seemingly unrelated models for centrally organizing contingent claim markets. The framework, developed as a generalization of the SCPM, will support many desirable properties such as proper scoring, truthful bidding (in a myopic sense), efficient computation, and guarantees on worst case loss. In fact, our unified framework will allow us to express various proper scoring rules, existing or new, from classical utility functions in a convex optimization problem representing the market organizer. Additionally, we utilize concepts from duality to show that the market model is equivalent to a risk minimization problem where a convex risk measure is employed. This will allow us to more clearly understand the differences in the risk attitudes adopted by various mechanisms, and particularly deepen our intuition about popular mechanisms like Hansons market-maker. In aggregate, we believe this work advances our understanding of the objectives that the market organizer is optimizing in popular pari-mutuel mechanisms by recasting them into one unified framework.
We consider a robust model proposed by Scarf, 1958, for stochastic optimization when only the marginal probabilities of (binary) random variables are given, and the correlation between the random variables is unknown. In the robust model, the objecti ve is to minimize expected cost against worst possible joint distribution with those marginals. We introduce the concept of correlation gap to compare this model to the stochastic optimization model that ignores correlations and minimizes expected cost under independent Bernoulli distribution. We identify a class of functions, using concepts of summable cost sharing schemes from game theory, for which the correlation gap is well-bounded and the robust model can be approximated closely by the independent distribution model. As a result, we derive efficient approximation factors for many popular cost functions, like submodular functions, facility location, and Steiner tree. As a byproduct, our analysis also yields some new results in the areas of social welfare maximization and existence of Walrasian equilibria, which may be of independent interest.
In this paper, we present approximation algorithms for combinatorial optimization problems under probabilistic constraints. Specifically, we focus on stochastic variants of two important combinatorial optimization problems: the k-center problem and t he set cover problem, with uncertainty characterized by a probability distribution over set of points or elements to be covered. We consider these problems under adaptive and non-adaptive settings, and present efficient approximation algorithms for the case when underlying distribution is a product distribution. In contrast to the expected cost model prevalent in stochastic optimization literature, our problem definitions support restrictions on the probability distributions of the total costs, via incorporating constraints that bound the probability with which the incurred costs may exceed a given threshold.
Nanoparticles of superconducting YBa2Cu3O7-delta (YBCO) (Tc = 91 K) exhibit ferromagnetism at room temperature while the bulk YBCO, obtained by heating the nanoparticles at high temperature (940 degree C), shows a linear magnetization curve. Across t he superconducting transition temperature, the magnetization curve changes from that of a soft ferromagnet to a superconductor. Furthermore, our experiments reveal that not only nanoparticles of metal oxides but also metal nitrides such as NbN (Tc = 6 - 12 K) and delta-MoN (Tc ~ 6 K) exhibit room-temperature ferromagnetism.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا