No Arabic abstract
Financial time-series analysis and forecasting have been extensively studied over the past decades, yet still remain as a very challenging research topic. Since the financial market is inherently noisy and stochastic, a majority of financial time-series of interests are non-stationary, and often obtained from different modalities. This property presents great challenges and can significantly affect the performance of the subsequent analysis/forecasting steps. Recently, the Temporal Attention augmented Bilinear Layer (TABL) has shown great performances in tackling financial forecasting problems. In this paper, by taking into account the nature of bilinear projections in TABL networks, we propose Bilinear Normalization (BiN), a simple, yet efficient normalization layer to be incorporated into TABL networks to tackle potential problems posed by non-stationarity and multimodalities in the input series. Our experiments using a large scale Limit Order Book (LOB) consisting of more than 4 million order events show that BiN-TABL outperforms TABL networks using other state-of-the-arts normalization schemes by a large margin.
Recently, to account for low-frequency market dynamics, several volatility models, employing high-frequency financial data, have been developed. However, in financial markets, we often observe that financial volatility processes depend on economic states, so they have a state heterogeneous structure. In this paper, to study state heterogeneous market dynamics based on high-frequency data, we introduce a novel volatility model based on a continuous Ito diffusion process whose intraday instantaneous volatility process evolves depending on the exogenous state variable, as well as its integrated volatility. We call it the state heterogeneous GARCH-Ito (SG-Ito) model. We suggest a quasi-likelihood estimation procedure with the realized volatility proxy and establish its asymptotic behaviors. Moreover, to test the low-frequency state heterogeneity, we develop a Wald test-type hypothesis testing procedure. The results of empirical studies suggest the existence of leverage, investor attention, market illiquidity, stock market comovement, and post-holiday effect in S&P 500 index volatility.
The minute-by-minute move of the Hang Seng Index (HSI) data over a four-year period is analysed and shown to possess similar statistical features as those of other markets. Based on a mathematical theorem [S. B. Pope and E. S. C. Ching, Phys. Fluids A {bf 5}, 1529 (1993)], we derive an analytic form for the probability distribution function (PDF) of index moves from fitted functional forms of certain conditional averages of the time series. Furthermore, following a recent work by Stolovitzky and Ching, we show that the observed PDF can be reproduced by a Langevin process with a move-dependent noise amplitude. The form of the Langevin equation can be determined directly from the market data.
A classic problem in physics is the origin of fat tailed distributions generated by complex systems. We study the distributions of stock returns measured over different time lags $tau.$ We find that destroying all correlations without changing the $tau = 1$ d distribution, by shuffling the order of the daily returns, causes the fat tails almost to vanish for $tau>1$ d. We argue that the fat tails are caused by known long-range volatility correlations. Indeed, destroying only sign correlations, by shuffling the order of only the signs (but not the absolute values) of the daily returns, allows the fat tails to persist for $tau >1$ d.
We study tick-by-tick financial returns belonging to the FTSE MIB index of the Italian Stock Exchange (Borsa Italiana). We can confirm previously detected non-stationarities. However, scaling properties reported in the previous literature for other high-frequency financial data are only approximately valid. As a consequence of the empirical analyses, we propose a simple method for describing non-stationary returns, based on a non-homogeneous normal compound Poisson process. We test this model against the empirical findings and it turns out that the model can approximately reproduce several stylized facts of high-frequency financial time series. Moreover, using Monte Carlo simulations, we analyze order selection for this model class using three information criteria: Akaikes information criterion (AIC), the Bayesian information criterion (BIC) and the Hannan-Quinn information criterion (HQ). For comparison, we also perform a similar Monte Carlo experiment for the ACD (autoregressive conditional duration) model. Our results show that the information criteria work best for small parameter numbers for the compound Poisson type models, whereas for the ACD model the model selection procedure does not work well in certain cases.
Data normalization is one of the most important preprocessing steps when building a machine learning model, especially when the model of interest is a deep neural network. This is because deep neural network optimized with stochastic gradient descent is sensitive to the input variable range and prone to numerical issues. Different than other types of signals, financial time-series often exhibit unique characteristics such as high volatility, non-stationarity and multi-modality that make them challenging to work with, often requiring expert domain knowledge for devising a suitable processing pipeline. In this paper, we propose a novel data-driven normalization method for deep neural networks that handle high-frequency financial time-series. The proposed normalization scheme, which takes into account the bimodal characteristic of financial multivariate time-series, requires no expert knowledge to preprocess a financial time-series since this step is formulated as part of the end-to-end optimization process. Our experiments, conducted with state-of-the-arts neural networks and high-frequency data from two large-scale limit order books coming from the Nordic and US markets, show significant improvements over other normalization techniques in forecasting future stock price dynamics.