No Arabic abstract
The recent surge in Deep Learning (DL) research of the past decade has successfully provided solutions to many difficult problems. The field of quantitative analysis has been slowly adapting the new methods to its problems, but due to problems such as the non-stationary nature of financial data, significant challenges must be overcome before DL is fully utilized. In this work a new method to construct stationary features, that allows DL models to be applied effectively, is proposed. These features are thoroughly tested on the task of predicting mid price movements of the Limit Order Book. Several DL models are evaluated, such as recurrent Long Short Term Memory (LSTM) networks and Convolutional Neural Networks (CNN). Finally a novel model that combines the ability of CNNs to extract useful features and the ability of LSTMs to analyze time series, is proposed and evaluated. The combined model is able to outperform the individual LSTM and CNN models in the prediction horizons that are tested.
Forecasting the movements of stock prices is one the most challenging problems in financial markets analysis. In this paper, we use Machine Learning (ML) algorithms for the prediction of future price movements using limit order book data. Two different sets of features are combined and evaluated: handcrafted features based on the raw order book data and features extracted by ML algorithms, resulting in feature vectors with highly variant dimensionalities. Three classifiers are evaluated using combinations of these sets of features on two different evaluation setups and three prediction scenarios. Even though the large scale and high frequency nature of the limit order book poses several challenges, the scope of the conducted experiments and the significance of the experimental results indicate that Machine Learning highly befits this task carving the path towards future research in this field.
Time series forecasting is a crucial component of many important applications, ranging from forecasting the stock markets to energy load prediction. The high-dimensionality, velocity and variety of the data collected in these applications pose significant and unique challenges that must be carefully addressed for each of them. In this work, a novel Temporal Logistic Neural Bag-of-Features approach, that can be used to tackle these challenges, is proposed. The proposed method can be effectively combined with deep neural networks, leading to powerful deep learning models for time series analysis. However, combining existing BoF formulations with deep feature extractors pose significant challenges: the distribution of the input features is not stationary, tuning the hyper-parameters of the model can be especially difficult and the normalizations involved in the BoF model can cause significant instabilities during the training process. The proposed method is capable of overcoming these limitations by a employing a novel adaptive scaling mechanism and replacing the classical Gaussian-based density estimation involved in the regular BoF model with a logistic kernel. The effectiveness of the proposed approach is demonstrated using extensive experiments on a large-scale financial time series dataset that consists of more than 4 million limit orders.
We investigate the statistical properties of the EBS order book for the EUR/USD and USD/JPY currency pairs and the impact of a ten-fold tick size reduction on its dynamics. A large fraction of limit orders are still placed right at or halfway between the old allowed prices. This generates price barriers where the best quotes lie for much of the time, which causes the emergence of distinct peaks in the average shape of the book at round distances. Furthermore, we argue that this clustering is mainly due to manual traders who remained set to the old price resolution. Automatic traders easily take price priority by submitting limit orders one tick ahead of clusters, as shown by the prominence of buy (sell) limit orders posted with rightmost digit one (nine).
Managing the prediction of metrics in high-frequency financial markets is a challenging task. An efficient way is by monitoring the dynamics of a limit order book to identify the information edge. This paper describes the first publicly available benchmark dataset of high-frequency limit order markets for mid-price prediction. We extracted normalized data representations of time series data for five stocks from the NASDAQ Nordic stock market for a time period of ten consecutive days, leading to a dataset of ~4,000,000 time series samples in total. A day-based anchored cross-validation experimental protocol is also provided that can be used as a benchmark for comparing the performance of state-of-the-art methodologies. Performance of baseline approaches are also provided to facilitate experimental comparisons. We expect that such a large-scale dataset can serve as a testbed for devising novel solutions of expert systems for high-frequency limit order book data analysis.
We study the analytical properties of a one-side order book model in which the flows of limit and market orders are Poisson processes and the distribution of lifetimes of cancelled orders is exponential. Although simplistic, the model provides an analytical tractability that should not be overlooked. Using basic results for birth-and-death processes, we build an analytical formula for the shape (depth) of a continuous order book model which is both founded by market mechanisms and very close to empirically tested formulas. We relate this shape to the probability of execution of a limit order, highlighting a law of conservation of the flows of orders in an order book. We then extend our model by allowing random sizes of limit orders, hereby allowing to study the relationship between the size of the incoming limit orders and the shape of the order book. Our theoretical model shows that, for a given total volume of incoming limit orders, the less limit orders are submitted (i.e. the larger the average size of these limit orders), the deeper is the order book around the spread. This theoretical relationship is finally empirically tested on several stocks traded on the Paris stock exchange.