ترغب بنشر مسار تعليمي؟ اضغط هنا

Controlled Information Fusion with Risk-Averse CVaR Social Sensors

66   0   0.0 ( 0 )
 نشر من قبل Vikram Krishnamurthy
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Consider a multi-agent network comprised of risk averse social sensors and a controller that jointly seek to estimate an unknown state of nature, given noisy measurements. The network of social sensors perform Bayesian social learning - each sensor fuses the information revealed by previous social sensors along with its private valuation using Bayes rule - to optimize a local cost function. The controller sequentially modifies the cost function of the sensors by discriminatory pricing (control inputs) to realize long term global objectives. We formulate the stochastic control problem faced by the controller as a Partially Observed Markov Decision Process (POMDP) and derive structural results for the optimal control policy as a function of the risk-aversion factor in the Conditional Value-at-Risk (CVaR) cost function of the sensors. We show that the optimal price sequence when the sensors are risk- averse is a super-martingale; i.e, it decreases on average over time.



قيم البحث

اقرأ أيضاً

A sequence of social sensors estimate an unknown parameter (modeled as a state of nature) by performing Bayesian Social Learning, and myopically optimize individual reward functions. The decisions of the social sensors contain quantized information a bout the underlying state. How should a fusion center dynamically incentivize the social sensors for acquiring information about the underlying state? This paper presents five results. First, sufficient conditions on the model parameters are provided under which the optimal policy for the fusion center has a threshold structure. The optimal policy is determined in closed form, and is such that it switches between two exactly specified incentive policies at the threshold. Second, it is shown that the optimal incentive sequence is a sub-martingale, i.e, the optimal incentives increase on average over time. Third, it is shown that it is possible for the fusion center to learn the true state asymptotically by employing a sub-optimal policy; in other words, controlled information fusion with social sensors can be consistent. Fourth, uniform bounds on the average additional cost incurred by the fusion center for employing a sub-optimal policy are provided. This characterizes the trade-off between the cost of information acquisition and consistency for the fusion center. Finally, when it is sufficient to estimate the state with a degree of confidence, uniform bounds on the budget saved by employing policies that guarantee state estimation in finite time are provided.
Multistage risk-averse optimal control problems with nested conditional risk mappings are gaining popularity in various application domains. Risk-averse formulations interpolate between the classical expectation-based stochastic and minimax optimal c ontrol. This way, risk-averse problems aim at hedging against extreme low-probability events without being overly conservative. At the same time, risk-based constraints may be employed either as surrogates for chance (probabilistic) constraints or as a robustification of expectation-based constraints. Such multistage problems, however, have been identified as particularly hard to solve. We propose a decomposition method for such nested problems that allows us to solve them via efficient numerical optimization methods. Alongside, we propose a new form of risk constraints which accounts for the propagation of uncertainty in time.
The multi-armed bandit (MAB) is a classical online optimization model for the trade-off between exploration and exploitation. The traditional MAB is concerned with finding the arm that minimizes the mean cost. However, minimizing the mean does not ta ke the risk of the problem into account. We now want to accommodate risk-averse decision makers. In this work, we introduce a coherent risk measure as the criterion to form a risk-averse MAB. In particular, we derive an index-based online sampling framework for the risk-averse MAB. We develop this framework in detail for three specific risk measures, i.e. the conditional value-at-risk, the mean-deviation and the shortfall risk measures. Under each risk measure, the convergence rate for the upper bound on the pseudo regret, defined as the difference between the expectation of the empirical risk based on the observation sequence and the true risk of the optimal arm, is established.
This paper considers a statistical signal processing problem involving agent based models of financial markets which at a micro-level are driven by socially aware and risk- averse trading agents. These agents trade (buy or sell) stocks by exploiting information about the decisions of previous agents (social learning) via an order book in addition to a private (noisy) signal they receive on the value of the stock. We are interested in the following: (1) Modelling the dynamics of these risk averse agents, (2) Sequential detection of a market shock based on the behaviour of these agents. Structural results which characterize social learning under a risk measure, CVaR (Conditional Value-at-risk), are presented and formulation of the Bayesian change point detection problem is provided. The structural results exhibit two interesting prop- erties: (i) Risk averse agents herd more often than risk neutral agents (ii) The stopping set in the sequential detection problem is non-convex. The framework is validated on data from the Yahoo! Tech Buzz game dataset.
We consider the problem of designing policies for partially observable Markov decision processes (POMDPs) with dynamic coherent risk objectives. Synthesizing risk-averse optimal policies for POMDPs requires infinite memory and thus undecidable. To ov ercome this difficulty, we propose a method based on bounded policy iteration for designing stochastic but finite state (memory) controllers, which takes advantage of standard convex optimization methods. Given a memory budget and optimality criterion, the proposed method modifies the stochastic finite state controller leading to sub-optimal solutions with lower coherent risk.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا