ﻻ يوجد ملخص باللغة العربية
Consider a multi-agent network comprised of risk averse social sensors and a controller that jointly seek to estimate an unknown state of nature, given noisy measurements. The network of social sensors perform Bayesian social learning - each sensor fuses the information revealed by previous social sensors along with its private valuation using Bayes rule - to optimize a local cost function. The controller sequentially modifies the cost function of the sensors by discriminatory pricing (control inputs) to realize long term global objectives. We formulate the stochastic control problem faced by the controller as a Partially Observed Markov Decision Process (POMDP) and derive structural results for the optimal control policy as a function of the risk-aversion factor in the Conditional Value-at-Risk (CVaR) cost function of the sensors. We show that the optimal price sequence when the sensors are risk- averse is a super-martingale; i.e, it decreases on average over time.
A sequence of social sensors estimate an unknown parameter (modeled as a state of nature) by performing Bayesian Social Learning, and myopically optimize individual reward functions. The decisions of the social sensors contain quantized information a
Multistage risk-averse optimal control problems with nested conditional risk mappings are gaining popularity in various application domains. Risk-averse formulations interpolate between the classical expectation-based stochastic and minimax optimal c
The multi-armed bandit (MAB) is a classical online optimization model for the trade-off between exploration and exploitation. The traditional MAB is concerned with finding the arm that minimizes the mean cost. However, minimizing the mean does not ta
This paper considers a statistical signal processing problem involving agent based models of financial markets which at a micro-level are driven by socially aware and risk- averse trading agents. These agents trade (buy or sell) stocks by exploiting
We consider the problem of designing policies for partially observable Markov decision processes (POMDPs) with dynamic coherent risk objectives. Synthesizing risk-averse optimal policies for POMDPs requires infinite memory and thus undecidable. To ov