No Arabic abstract
We consider a multi-round auction setting motivated by pay-per-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneers goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the multi-armed bandit problem, and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by regret, the difference in social welfare between the algorithm and the benchmark which always selects the same best advertisement. We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties -- essentially, they must separate exploration from exploitation -- and they incur much higher regret than the optimal multi-armed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.
In this paper, we consider several finite-horizon Bayesian multi-armed bandit problems with side constraints which are computationally intractable (NP-Hard) and for which no optimal (or near optimal) algorithms are known to exist with sub-exponential running time. All of these problems violate the standard exchange property, which assumes that the reward from the play of an arm is not contingent upon when the arm is played. Not only are index policies suboptimal in these contexts, there has been little analysis of such policies in these problem settings. We show that if we consider near-optimal policies, in the sense of approximation algorithms, then there exists (near) index policies. Conceptually, if we can find policies that satisfy an approximate version of the exchange property, namely, that the reward from the play of an arm depends on when the arm is played to within a constant factor, then we have an avenue towards solving these problems. However such an approximate version of the idling bandit property does not hold on a per-play basis and are shown to hold in a global sense. Clearly, such a property is not necessarily true of arbitrary single arm policies and finding such single arm policies is nontrivial. We show that by restricting the state spaces of arms we can find single arm policies and that these single arm policies can be combined into global (near) index policies where the approximate version of the exchange property is true in expectation. The number of different bandit problems that can be addressed by this technique already demonstrate its wide applicability.
We analyze statistical discrimination in hiring markets using a multi-armed bandit model. Myopic firms face workers arriving with heterogeneous observable characteristics. The association between the workers skill and characteristics is unknown ex ante; thus, firms need to learn it. Laissez-faire causes perpetual underestimation: minority workers are rarely hired, and therefore, underestimation towards them tends to persist. Even a slight population-ratio imbalance frequently produces perpetual underestimation. We propose two policy solutions: a novel subsidy rule (the hybrid mechanism) and the Rooney Rule. Our results indicate that temporary affirmative actions effectively mitigate discrimination caused by insufficient data.
The early sections of this paper present an analysis of a Markov decision model that is known as the multi-armed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis includes efficient procedures for computing the expected utility associated with the use of a priority policy and for identifying a priority policy that is optimal. The methodology in these sections is novel, building on the use of elementary row operations. In the later sections of this paper, the analysis is adapted to accommodate constraints that link the bandits.
Crowd sensing is a new paradigm which leverages the pervasive smartphones to efficiently collect and upload sensing data, enabling numerous novel applications. To achieve good service quality for a crowd sensing application, incentive mechanisms are necessary for attracting more user participation. Most of existing mechanisms apply only for the budget-constraint scenario where the platform (the crowd sensing organizer) has a budget limit. On the contrary, we focus on a different scenario where the platform has a service limit. Based on the offline and online auction model, we consider a general problem: users submit their private profiles to the platform, and the platform aims at selecting a subset of users before a specified deadline for minimizing the total payment while a specific service can be completed. Specially, we design offline and online service-constraint incentive mechanisms for the case where the value function of selected users is monotone submodular. The mechanisms are individual rationality, task feasibility, computational efficiency, truthfulness, consumer sovereignty, constant frugality, and also performs well in practice. Finally, we use extensive simulations to demonstrate the theoretical properties of our mechanisms.
Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the tasks requester, and may be adjusted based on the quality of the completed work, for example, through the use of bonus payments. In this paper, we study the requesters problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each arm representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature). Our results advance the state of art on several different topics: the theory of crowdsourcing markets, principal-agent problems, multi-armed bandits, and dynamic pricing.