ترغب بنشر مسار تعليمي؟ اضغط هنا

WaveCorr: Correlation-savvy Deep Reinforcement Learning for Portfolio Management

223   0   0.0 ( 0 )
 نشر من قبل Saeed Marzban
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

The problem of portfolio management represents an important and challenging class of dynamic decision making problems, where rebalancing decisions need to be made over time with the consideration of many factors such as investors preferences, trading environments, and market conditions. In this paper, we present a new portfolio policy network architecture for deep reinforcement learning (DRL)that can exploit more effectively cross-asset dependency information and achieve better performance than state-of-the-art architectures. In particular, we introduce a new property, referred to as textit{asset permutation invariance}, for portfolio policy networks that exploit multi-asset time series data, and design the first portfolio policy network, named WaveCorr, that preserves this invariance property when treating asset correlation information. At the core of our design is an innovative permutation invariant correlation processing layer. An extensive set of experiments are conducted using data from both Canadian (TSX) and American stock markets (S&P 500), and WaveCorr consistently outperforms other architectures with an impressive 3%-25% absolute improvement in terms of average annual return, and up to more than 200% relative improvement in average Sharpe ratio. We also measured an improvement of a factor of up to 5 in the stability of performance under random choices of initial asset ordering and weights. The stability of the network has been found as particularly valuable by our industrial partner.



قيم البحث

اقرأ أيضاً

We address a portfolio selection problem that combines active (outperformance) and passive (tracking) objectives using techniques from convex analysis. We assume a general semimartingale market model where the assets growth rate processes are driven by a latent factor. Using techniques from convex analysis we obtain a closed-form solution for the optimal portfolio and provide a theorem establishing its uniqueness. The motivation for incorporating latent factors is to achieve improved growth rate estimation, an otherwise notoriously difficult task. To this end, we focus on a model where growth rates are driven by an unobservable Markov chain. The solution in this case requires a filtering step to obtain posterior probabilities for the state of the Markov chain from asset price information, which are subsequently used to find the optimal allocation. We show the optimal strategy is the posterior average of the optimal strategies the investor would have held in each state assuming the Markov chain remains in that state. Finally, we implement a number of historical backtests to demonstrate the performance of the optimal portfolio.
Portfolio management problems are often divided into two types: active and passive, where the objective is to outperform and track a preselected benchmark, respectively. Here, we formulate and solve a dynamic asset allocation problem that combines th ese two objectives in a unified framework. We look to maximize the expected growth rate differential between the wealth of the investors portfolio and that of a performance benchmark while penalizing risk-weighted deviations from a given tracking portfolio. Using stochastic control techniques, we provide explicit closed-form expressions for the optimal allocation and we show how the optimal strategy can be related to the growth optimal portfolio. The admissible benchmarks encompass the class of functionally generated portfolios (FGPs), which include the market portfolio, as the only requirement is that they depend only on the prevailing asset values. Finally, some numerical experiments are presented to illustrate the risk-reward profile of the optimal allocation.
We find economically and statistically significant gains when using machine learning for portfolio allocation between the market index and risk-free asset. Optimal portfolio rules for time-varying expected returns and volatility are implemented with two Random Forest models. One model is employed in forecasting the sign probabilities of the excess return with payout yields. The second is used to construct an optimized volatility estimate. Reward-risk timing with machine learning provides substantial improvements over the buy-and-hold in utility, risk-adjusted returns, and maximum drawdowns. This paper presents a new theoretical basis and unifying framework for machine learning applied to both return- and volatility-timing.
We study the Markowitz portfolio selection problem with unknown drift vector in the multidimensional framework. The prior belief on the uncertain expected rate of return is modeled by an arbitrary probability law, and a Bayesian approach from filteri ng theory is used to learn the posterior distribution about the drift given the observed market data of the assets. The Bayesian Markowitz problem is then embedded into an auxiliary standard control problem that we characterize by a dynamic programming method and prove the existence and uniqueness of a smooth solution to the related semi-linear partial differential equation (PDE). The optimal Markowitz portfolio strategy is explicitly computed in the case of a Gaussian prior distribution. Finally, we measure the quantitative impact of learning, updating the strategy from observed data, compared to non-learning, using a constant drift in an uncertain context, and analyze the sensitivity of the value of information w.r.t. various relevant parameters of our model.
Network slicing is born as an emerging business to operators, by allowing them to sell the customized slices to various tenants at different prices. In order to provide better-performing and cost-efficient services, network slicing involves challengi ng technical issues and urgently looks forward to intelligent innovations to make the resource management consistent with users activities per slice. In that regard, deep reinforcement learning (DRL), which focuses on how to interact with the environment by trying alternative actions and reinforcing the tendency actions producing more rewarding consequences, is assumed to be a promising solution. In this paper, after briefly reviewing the fundamental concepts of DRL, we investigate the application of DRL in solving some typical resource management for network slicing scenarios, which include radio resource slicing and priority-based core network slicing, and demonstrate the advantage of DRL over several competing schemes through extensive simulations. Finally, we also discuss the possible challenges to apply DRL in network slicing from a general perspective.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا