ترغب بنشر مسار تعليمي؟ اضغط هنا

The problem of portfolio management represents an important and challenging class of dynamic decision making problems, where rebalancing decisions need to be made over time with the consideration of many factors such as investors preferences, trading environments, and market conditions. In this paper, we present a new portfolio policy network architecture for deep reinforcement learning (DRL)that can exploit more effectively cross-asset dependency information and achieve better performance than state-of-the-art architectures. In particular, we introduce a new property, referred to as textit{asset permutation invariance}, for portfolio policy networks that exploit multi-asset time series data, and design the first portfolio policy network, named WaveCorr, that preserves this invariance property when treating asset correlation information. At the core of our design is an innovative permutation invariant correlation processing layer. An extensive set of experiments are conducted using data from both Canadian (TSX) and American stock markets (S&P 500), and WaveCorr consistently outperforms other architectures with an impressive 3%-25% absolute improvement in terms of average annual return, and up to more than 200% relative improvement in average Sharpe ratio. We also measured an improvement of a factor of up to 5 in the stability of performance under random choices of initial asset ordering and weights. The stability of the network has been found as particularly valuable by our industrial partner.
Recently equal risk pricing, a framework for fair derivative pricing, was extended to consider dynamic risk measures. However, all current implementations either employ a static risk measure that violates time consistency, or are based on traditional dynamic programming solution schemes that are impracticable in problems with a large number of underlying assets (due to the curse of dimensionality) or with incomplete asset dynamics information. In this paper, we extend for the first time a famous off-policy deterministic actor-critic deep reinforcement learning (ACRL) algorithm to the problem of solving a risk averse Markov decision process that models risk using a time consistent recursive expectile risk measure. This new ACRL algorithm allows us to identify high quality time consistent hedging policies (and equal risk prices) for options, such as basket options, that cannot be handled using traditional methods, or in context where only historical trajectories of the underlying assets are available. Our numerical experiments, which involve both a simple vanilla option and a more exotic basket option, confirm that the new ACRL algorithm can produce 1) in simple environments, nearly optimal hedging policies, and highly accurate prices, simultaneously for a range of maturities 2) in complex environments, good quality policies and prices using reasonable amount of computing resources; and 3) overall, hedging strategies that actually outperform the strategies produced using static risk measures when the risk is evaluated at later points of time.
Optimal stopping is the problem of deciding the right time at which to take a particular action in a stochastic system, in order to maximize an expected reward. It has many applications in areas such as finance, healthcare, and statistics. In this pa per, we employ deep Reinforcement Learning (RL) to learn optimal stopping policies in two financial engineering applications: namely option pricing, and optimal option exercise. We present for the first time a comprehensive empirical evaluation of the quality of optimal stopping policies identified by three state of the art deep RL algorithms: double deep Q-learning (DDQN), categorical distributional RL (C51), and Implicit Quantile Networks (IQN). In the case of option pricing, our findings indicate that in a theoretical Black-Schole environment, IQN successfully identifies nearly optimal prices. On the other hand, it is slightly outperformed by C51 when confronted to real stock data movements in a put option exercise problem that involves assets from the S&P500 index. More importantly, the C51 algorithm is able to identify an optimal stopping policy that achieves 8% more out-of-sample returns than the best of four natural benchmark policies. We conclude with a discussion of our findings which should pave the way for relevant future research.
305 - Utsav Sadana , Erick Delage 2020
Conditional Value at Risk (CVaR) is widely used to account for the preferences of a risk-averse agent in the extreme loss scenarios. To study the effectiveness of randomization in interdiction games with an interdictor that is both risk and ambiguity averse, we introduce a distributionally robust network interdiction game where the interdictor randomizes over the feasible interdiction plans in order to minimize the worst-case CVaR of the flow with respect to both the unknown distribution of the capacity of the arcs and his mixed strategy over interdicted arcs. The flow player, on the contrary, maximizes the total flow in the network. By using the budgeted uncertainty set, we control the degree of conservatism in the model and reformulate the interdictors non-linear problem as a bi-convex optimization problem. For solving this problem to any given optimality level, we devise a spatial branch and bound algorithm that uses the McCormick inequalities and reduced reformulation linearization technique (RRLT) to obtain convex relaxation of the problem. We also develop a column generation algorithm to identify the optimal support of the convex relaxation which is then used in the coordinate descent algorithm to determine the upper bounds. The efficiency and convergence of the spatial branch and bound algorithm is established in the numerical experiments. Further, our numerical experiments show that randomized strategies can have significantly better in-sample and out-of-sample performance than optimal deterministic ones.
In this paper, we consider the problem of equal risk pricing and hedging in which the fair price of an option is the price that exposes both sides of the contract to the same level of risk. Focusing for the first time on the context where risk is mea sured according to convex risk measures, we establish that the problem reduces to solving independently the writer and the buyers hedging problem with zero initial capital. By further imposing that the risk measures decompose in a way that satisfies a Markovian property, we provide dynamic programming equations that can be used to solve the hedging problems for both the case of European and American options. All of our results are general enough to accommodate situations where the risk is measured according to a worst-case risk measure as is typically done in robust optimization. Our numerical study illustrates the advantages of equal risk pricing over schemes that only account for a single party, pricing based on quadratic hedging (i.e. $epsilon$-arbitrage pricing), or pricing based on a fixed equivalent martingale measure (i.e. Black-Scholes pricing). In particular, the numerical results confirm that when employing an equal risk price both the writer and the buyer end up being exposed to risks that are more similar and on average smaller than what they would experience with the other approaches.
Recently, several new pari-mutuel mechanisms have been introduced to organize markets for contingent claims. Hanson introduced a market maker derived from the logarithmic scoring rule, and later Chen and Pennock developed a cost function formulation for the market maker. On the other hand, the SCPM model of Peters et al. is based on ideas from a call auction setting using a convex optimization model. In this work, we develop a unified framework that bridges these seemingly unrelated models for centrally organizing contingent claim markets. The framework, developed as a generalization of the SCPM, will support many desirable properties such as proper scoring, truthful bidding (in a myopic sense), efficient computation, and guarantees on worst case loss. In fact, our unified framework will allow us to express various proper scoring rules, existing or new, from classical utility functions in a convex optimization problem representing the market organizer. Additionally, we utilize concepts from duality to show that the market model is equivalent to a risk minimization problem where a convex risk measure is employed. This will allow us to more clearly understand the differences in the risk attitudes adopted by various mechanisms, and particularly deepen our intuition about popular mechanisms like Hansons market-maker. In aggregate, we believe this work advances our understanding of the objectives that the market organizer is optimizing in popular pari-mutuel mechanisms by recasting them into one unified framework.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا