Double Deep Q-Learning for Optimal Execution

57 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sebastian Jaimungal

تاريخ النشر 2018

مجال البحث مالية الهندسة المعلوماتية

والبحث باللغة English

تأليف Brian Ning - Franco Ho Ting Lin - Sebastian Jaimungal

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.

قيم البحث

121 - Philippe Casgrain , Brian Ning , Sebastian Jaimungal 2019

Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.

التعلم الآلي علوم الكمبيوتر ونظرية الألعاب المالية الحاسوبية

Universal Trading for Order Execution with Oracle Policy Distillation

105 - Yuchen Fang , Kan Ren , Weiqing Liu 2021

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the shift from the analytical view with model-based market assumptions to model-free perspective, i.e., reinforcement learning, due to its nature of sequential decision optimization. However, the noisy and yet imperfect market information that can be leveraged by the policy has made it quite challenging to build up sample efficient reinforcement learning methods to achieve effective order execution. In this paper, we propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. Particularly, this framework leverages a policy distillation method that can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information to approximate the optimal trading strategy. The extensive experiments have shown significant improvements of our method over various strong baselines, with reasonable trading actions.

الإحصاء والتجارة والسوق الصغير التعلم الآلي

Optimal execution strategies in limit order books with general shape functions

360 - Aurelien Alfonsi , Antje Fruth , Alexander Schied 2010

We consider optimal execution strategies for block market orders placed in a limit order book (LOB). We build on the resilience model proposed by Obizhaeva and Wang (2005) but allow for a general shape of the LOB defined via a given density function. Thus, we can allow for empirically observed LOB shapes and obtain a nonlinear price impact of market orders. We distinguish two possibilities for modeling the resilience of the LOB after a large market order: the exponential recovery of the number of limit orders, i.e., of the volume of the LOB, or the exponential recovery of the bid-ask spread. We consider both of these resilience modes and, in each case, derive explicit optimal execution strategies in discrete time. Applying our results to a block-shaped LOB, we obtain a new closed-form representation for the optimal strategy, which explicitly solves the recursive scheme given in Obizhaeva and Wang (2005). We also provide some evidence for the robustness of optimal strategies with respect to the choice of the shape function and the resilience-type.

الإحصاء والتجارة والسوق الصغير الاحتمالات

Adaptive Execution: Exploration and Learning of Price Impact

138 - Beomsoo Park , Benjamin Van Roy 2012

We consider a model in which a trader aims to maximize expected risk-adjusted profit while trading a single security. In our model, each price change is a linear combination of observed factors, impact resulting from the traders current and prior act ivity, and unpredictable random effects. The trader must learn coefficients of a price impact model while trading. We propose a new method for simultaneous execution and learning - the confidence-triggered regularized adaptive certainty equivalent (CTRACE) policy - and establish a poly-logarithmic finite-time expected regret bound. This bound implies that CTRACE is efficient in the sense that the ({epsilon},{delta})-convergence time is bounded by a polynomial function of 1/{epsilon} and log(1/{delta}) with high probability. In addition, we demonstrate via Monte Carlo simulation that CTRACE outperforms the certainty equivalent policy and a recently proposed reinforcement learning algorithm that is designed to explore efficiently in linear-quadratic control problems.

الإحصاء والتجارة والسوق الصغير

Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response

122 - Svitlana Vyetrenko , Shaojie Xu 2019

We demonstrate an application of risk-sensitive reinforcement learning to optimizing execution in limit order book markets. We represent taking order execution decisions based on limit order book knowledge by a Markov Decision Process; and train a tr ading agent in a market simulator, which emulates multi-agent interaction by synthesizing market response to our agents execution decisions from historical data. Due to market impact, executing high volume orders can incur significant cost. We learn trading signals from market microstructure in presence of simulated market response and derive explainable decision-tree-based execution policies using risk-sensitive Q-learning to minimize execution cost subject to constraints on cost variance.

الإحصاء والتجارة والسوق الصغير التعلم الآلي