No Arabic abstract
The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a sequential decision making framework and study the effects of: (a) using data collected under previously employed policies, which may be sub-optimal and constraint-violating, and (b) imposing desired constraints while computing near-optimal policies with this data. Our framework relies on solving a minimax objective, where one player evaluates policies via off-policy estimators, and the opponent uses an online learning strategy to control constraint violations. We extensively investigate various choices for off-policy estimation and their corresponding optimization sub-routines, and quantify their impact on computing constraint-aware allocation policies. Our study shows promising results for constructing such policies when back-tested on historical equities data, under various regimes of operation, dimensionality and constraints.
Risk control and optimal diversification constitute a major focus in the finance and insurance industries as well as, more or less consciously, in our everyday life. We present a discussion of the characterization of risks and of the optimization of portfolios that starts from a simple illustrative model and ends by a general functional integral formulation. A major theme is that risk, usually thought one-dimensional in the conventional mean-variance approach, has to be addressed by the full distribution of losses. Furthermore, the time-horizon of the investment is shown to play a major role. We show the importance of accounting for large fluctuations and use the theory of Cramer for large deviations in this context. We first treat a simple model with a single risky asset that examplifies the distinction between the average return and the typical return, the role of large deviations in multiplicative processes, and the different optimal strategies for the investors depending on their size. We then analyze the case of assets whose price variations are distributed according to exponential laws, a situation that is found to describe reasonably well daily price variations. Several portfolio optimization strategies are presented that aim at controlling large risks. We end by extending the standard mean-variance portfolio optimization theory, first within the quasi-Gaussian approximation and then using a general formulation for non-Gaussian correlated assets in terms of the formalism of functional integrals developed in the field theory of critical phenomena.
Mean-variance portfolio optimization problems often involve separable nonconvex terms, including penalties on capital gains, integer share constraints, and minimum position and trade sizes. We propose a heuristic algorithm for this problem based on the alternating direction method of multipliers (ADMM). This method allows for solve times in tens to hundreds of milliseconds with around 1000 securities and 100 risk factors. We also obtain a bound on the achievable performance. Our heuristic and bound are both derived from similar results for other optimization problems with a separable objective and affine equality constraints. We discuss a concrete implementation in the case where the separable terms in the objective are piecewise-quadratic, and we demonstrate their effectiveness empirically in realistic tax-aware portfolio construction problems.
The paper solves the problem of optimal portfolio choice when the parameters of the asset returns distribution, like the mean vector and the covariance matrix are unknown and have to be estimated by using historical data of the asset returns. The new approach employs the Bayesian posterior predictive distribution which is the distribution of the future realization of the asset returns given the observable sample. The parameters of the posterior predictive distributions are functions of the observed data values and, consequently, the solution of the optimization problem is expressed in terms of data only and does not depend on unknown quantities. In contrast, the optimization problem of the traditional approach is based on unknown quantities which are estimated in the second step leading to a suboptimal solution. We also derive a very useful stochastic representation of the posterior predictive distribution whose application leads not only to the solution of the considered optimization problem, but provides the posterior predictive distribution of the optimal portfolio return used to construct a prediction interval. A Bayesian efficient frontier, a set of optimal portfolios obtained by employing the posterior predictive distribution, is constructed as well. Theoretically and using real data we show that the Bayesian efficient frontier outperforms the sample efficient frontier, a common estimator of the set of optimal portfolios known to be overoptimistic.
In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multiconstraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.
We find economically and statistically significant gains when using machine learning for portfolio allocation between the market index and risk-free asset. Optimal portfolio rules for time-varying expected returns and volatility are implemented with two Random Forest models. One model is employed in forecasting the sign probabilities of the excess return with payout yields. The second is used to construct an optimized volatility estimate. Reward-risk timing with machine learning provides substantial improvements over the buy-and-hold in utility, risk-adjusted returns, and maximum drawdowns. This paper presents a new theoretical basis and unifying framework for machine learning applied to both return- and volatility-timing.