No Arabic abstract
Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we describe a case-based reasoning approach to predicting stock market returns using only historical pricing data. We argue that one of the impediments for case-based stock prediction has been the lack of a suitable similarity metric when it comes to identifying similar pricing histories as the basis for a future prediction -- traditional Euclidean and correlation based approaches are not effective for a variety of reasons -- and in this regard, a key contribution of this work is the development of a novel similarity metric for comparing historical pricing data. We demonstrate the benefits of this metric and the case-based approach in a real-world application in comparison to a variety of conventional benchmarks.
We describe the impact of the intra-day activity pattern on the autocorrelation function estimator. We obtain an exact formula relating estimators of the autocorrelation functions of non-stationary process to its stationary counterpart. Hence, we proved that the day seasonality of inter-transaction times extends the memory of as well the process itself as its absolute value. That is, both processes relaxation to zero is longer.
A well-interpretable measure of information has been recently proposed based on a partition obtained by intersecting a random sequence with its moving average. The partition yields disjoint sets of the sequence, which are then ranked according to their size to form a probability distribution function and finally fed in the expression of the Shannon entropy. In this work, such entropy measure is implemented on the time series of prices and volatilities of six financial markets. The analysis has been performed, on tick-by-tick data sampled every minute for six years of data from 1999 to 2004, for a broad range of moving average windows and volatility horizons. The study shows that the entropy of the volatility series depends on the individual market, while the entropy of the price series is practically a market-invariant for the six markets. Finally, a cumulative information measure - the `Market Heterogeneity Index- is derived from the integral of the proposed entropy measure. The values of the Market Heterogeneity Index are discussed as possible tools for optimal portfolio construction and compared with those obtained by using the Sharpe ratio a traditional risk diversity measure.
In this study, we have investigated factors of determination which can affect the connected structure of a stock network. The representative index for topological properties of a stock network is the number of links with other stocks. We used the multi-factor model, extensively acknowledged in financial literature. In the multi-factor model, common factors act as independent variables while returns of individual stocks act as dependent variables. We calculated the coefficient of determination, which represents the measurement value of the degree in which dependent variables are explained by independent variables. Therefore, we investigated the relationship between the number of links in the stock network and the coefficient of determination in the multi-factor model. We used individual stocks traded on the market indices of Korea, Japan, Canada, Italy and the UK. The results are as follows. We found that the mean coefficient of determination of stocks with a large number of links have higher values than those with a small number of links with other stocks. These results suggest that common factors are significantly deterministic factors to be taken into account when making a stock network. Furthermore, stocks with a large number of links to other stocks can be more affected by common factors.
Great research efforts have been devoted to exploiting deep neural networks in stock prediction. While long-range dependencies and chaotic property are still two major issues that lower the performance of state-of-the-art deep learning models in forecasting future price trends. In this study, we propose a novel framework to address both issues. Specifically, in terms of transforming time series into complex networks, we convert market price series into graphs. Then, structural information, referring to associations among temporal points and the node weights, is extracted from the mapped graphs to resolve the problems regarding long-range dependencies and the chaotic property. We take graph embeddings to represent the associations among temporal points as the prediction model inputs. Node weights are used as a priori knowledge to enhance the learning of temporal attention. The effectiveness of our proposed framework is validated using real-world stock data, and our approach obtains the best performance among several state-of-the-art benchmarks. Moreover, in the conducted trading simulations, our framework further obtains the highest cumulative profits. Our results supplement the existing applications of complex network methods in the financial realm and provide insightful implications for investment applications regarding decision support in financial markets.
Data augmentation methods in combination with deep neural networks have been used extensively in computer vision on classification tasks, achieving great success; however, their use in time series classification is still at an early stage. This is even more so in the field of financial prediction, where data tends to be small, noisy and non-stationary. In this paper we evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models. The results show that several augmentation methods significantly improve financial performance when used in combination with a trading strategy. For a relatively small dataset ($approx30K$ samples), augmentation methods achieve up to $400%$ improvement in risk adjusted return performance; for a larger stock dataset ($approx300K$ samples), results show up to $40%$ improvement.