We consider online learning of ensembles of portfolio selection algorithms and aim to regularize risk by encouraging diversification with respect to a predefined risk-driven grouping of stocks. Our procedure uses online convex optimization to control capital allocation to underlying investment algorithms while encouraging non-sparsity over the given grouping. We prove a logarithmic regret for this procedure with respect to the best-in-hindsight ensemble. We applied the procedure with known mean-reversion portfolio selection algorithms using the standard GICS industry sector grouping. Empirical Experimental results showed an impressive percentage increase of risk-adjusted return (Sharpe ratio).
We present a novel online ensemble learning strategy for portfolio selection. The new strategy controls and exploits any set of commission-oblivious portfolio selection algorithms. The strategy handles transaction costs using a novel commission avoidance mechanism. We prove a logarithmic regret bound for our strategy with respect to optimal mixtures of the base algorithms. Numerical examples validate the viability of our method and show significant improvement over the state-of-the-art.
We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to time-varying feedback graphs, as needed in this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and is adapted to time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a possible, but limited, extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as more standard baselines.
Preventing catastrophic forgetting while continually learning new tasks is an essential problem in lifelong learning. Structural regularization (SR) refers to a family of algorithms that mitigate catastrophic forgetting by penalizing the network for changing its critical parameters from previous tasks while learning a new one. The penalty is often induced via a quadratic regularizer defined by an emph{importance matrix}, e.g., the (empirical) Fisher information matrix in the Elastic Weight Consolidation framework. In practice and due to computational constraints, most SR methods crudely approximate the importance matrix by its diagonal. In this paper, we propose emph{Sketched Structural Regularization} (Sketched SR) as an alternative approach to compress the importance matrices used for regularizing in SR methods. Specifically, we apply emph{linear sketching methods} to better approximate the importance matrices in SR algorithms. We show that sketched SR: (i) is computationally efficient and straightforward to implement, (ii) provides an approximation error that is justified in theory, and (iii) is method oblivious by construction and can be adapted to any method that belongs to the structural regularization class. We show that our proposed approach consistently improves various SR algorithms performance on both synthetic experiments and benchmark continual learning tasks, including permuted-MNIST and CIFAR-100.
Transfer learning has been demonstrated to be successful and essential in diverse applications, which transfers knowledge from related but different source domains to the target domain. Online transfer learning(OTL) is a more challenging problem where the target data arrive in an online manner. Most OTL methods combine source classifier and target classifier directly by assigning a weight to each classifier, and adjust the weights constantly. However, these methods pay little attention to reducing the distribution discrepancy between domains. In this paper, we propose a novel online transfer learning method which seeks to find a new feature representation, so that the marginal distribution and conditional distribution discrepancy can be online reduced simultaneously. We focus on online transfer learning with multiple source domains and use the Hedge strategy to leverage knowledge from source domains. We analyze the theoretical properties of the proposed algorithm and provide an upper mistake bound. Comprehensive experiments on two real-world datasets show that our method outperforms state-of-the-art methods by a large margin.
Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavior. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state to make a decision and the primitive that requests the most information about the current state acts in the world. The primitives are regularized to use as little information as possible, which leads to natural competition and specialization. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in terms of generalization.