ترغب بنشر مسار تعليمي؟ اضغط هنا

Individual decision-makers consume information revealed by the previous decision makers, and produce information that may help in future decisions. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in other do mains such as medical decisions. Each decision-maker would individually prefer to exploit: select an action with the highest expected reward given her current information. At the same time, each decision-maker would prefer previous decision-makers to explore, producing information about the rewards of various actions. A social planner, by means of carefully designed information disclosure, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We formulate this problem as a multi-armed bandit problem (and various generalizations thereof) under incentive-compatibility constraints induced by the agents Bayesian priors. We design an incentive-compatible bandit algorithm for the social planner whose regret is asymptotically optimal among all bandit algorithms (incentive-compatible or not). Further, we provide a black-box reduction from an arbitrary multi-arm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit setting that incorporate contexts and arbitrary auxiliary feedback.
We consider classification and regression tasks where we have missing data and assume that the (clean) data resides in a low rank subspace. Finding a hidden subspace is known to be computationally hard. Nevertheless, using a non-proper formulation we give an efficient agnostic algorithm that classifies as good as the best linear classifier coupled with the best low-dimensional subspace in which the data resides. A direct implication is that our algorithm can linearly (and non-linearly through kernels) classify provably as well as the best classifier that has access to the full data.
We consider a setting where $n$ buyers, with combinatorial preferences over $m$ items, and a seller, running a priority-based allocation mechanism, repeatedly interact. Our goal, from observing limited information about the results of these interacti ons, is to reconstruct both the preferences of the buyers and the mechanism of the seller. More specifically, we consider an online setting where at each stage, a subset of the buyers arrive and are allocated items, according to some unknown priority that the seller has among the buyers. Our learning algorithm observes only which buyers arrive and the allocation produced (or some function of the allocation, such as just which buyers received positive utility and which did not), and its goal is to predict the outcome for future subsets of buyers. For this task, the learning algorithm needs to reconstruct both the priority among the buyers and the preferences of each buyer. We derive mistake bound algorithms for additive, unit-demand and single minded buyers. We also consider the case where buyers utilities for a fixed bundle can change between stages due to different (observed) prices. Our algorithms are efficient both in computation time and in the maximum number of mistakes (both polynomial in the number of buyers and items).
Auction theory traditionally assumes that bidders valuation distributions are known to the auctioneer, such as in the celebrated, revenue-optimal Myerson auction. However, this theory does not describe how the auctioneer comes to possess this informa tion. Recently, Cole and Roughgarden [2014] showed that an approximation based on a finite sample of independent draws from each bidders distribution is sufficient to produce a near-optimal auction. In this work, we consider the problem of learning bidders valuation distributions from much weaker forms of observations. Specifically, we consider a setting where there is a repeated, sealed-bid auction with $n$ bidders, but all we observe for each round is who won, but not how much they bid or paid. We can also participate (i.e., submit a bid) ourselves, and observe when we win. From this information, our goal is to (approximately) recover the inherently recoverable part of the underlying bid distributions. We also consider extensions where different subsets of bidders participate in each round, and where bidders valuations have a common-value component added to their independent private values.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا