Home | Finn's publishing

Modelling volatility persistence: integrating and applying Hawkes processes to investigate order book activity and its integral dynamics.

The problem

Within the trading industry, we often refer to volatility trading. The VIX index is a great proxy for encapsulating the implied market sentiment, yet it captures only a fraction of inherent volatility. The hurdle lies in dynamically forecasting latent volatility with high precision and at a greater time scale. This task is complicated and requires a decomposition of market dynamics and trader behaviours. Our goal is to elucidate some of this activity, across time horizons. We do this by using the multiscale Heterogeneous Autoregressive (HAR) model coupled with the Hawkes process and its extensions, such as the Marked Hawkes process (MH). The study will thereby focus on assessing the combined efficacy of the HAR-RV-MH model relative to traditional ARCH conventions. We aim to bridge this latent volatility gap by capturing the clustering of endogenous events within the context of event-driven market dynamics.

Theoretical framework

We present a novel method for forecasting long-term volatility using a combination of the Hawkes process and the Heterogeneous Autoregressive Model of Realised Volatility (HAR-RV). We unravel the univariate framework in the domain of point processes, where the likelihood of arrival has its baseline intensity and a self-exciting mechanism. The memory function captures the temporal clustering of subsequent event arrivals, each weighted by a kernel.

The sum of contributing increments in the kernel function determines whether the process is stabilised, a property associated with the Poisson process wherein the likelihood of future events is constant and independent. The Hawkes process will exhibit instability during periods of increased activity, where event arrivals progressively or exponentially increase. This likelihood cascade resembles the feedback loop, which will help us infer the clustering of market events and increase our forecasting accuracy.

In existing literature, people have examined the heterogeneous autoregressive model to investigate its effectiveness in modelling volatility persistence over time. The model has revealed significant success, exhibiting long-memory characteristics across stock indices and currency markets. The standalone HAR model dissects autocorrelation across multiple lags of the time series to infer the underlying patterns in traders’ behaviours. It depicts the hierarchical form within the heterogeneous model, where daily and weekly components also inherit the properties of periods of longer duration. This implies that the dynamics observed on a monthly scale are manifested within the shorter-term volatilities, effectively cascading long-memory properties.

Methodology

The family of kernels are subdivided into parametric and non-parametric kernels. Parametric kernels, such as the exponential and the power-law kernel are traditional methods in statistical theory and are estimated using maximum likelihood estimation.

Empirical tests show that the power-law kernel yields long memory behaviour, suitable for slowly-decaying events. This transformation will fit the leptokurtic distribution of the true data-generating process. In other words, we expect to capture aftershocks and co-jumps at extreme frequencies, as well as other long-term dependencies. The long-tailed distribution is also an assumption we make when fitting the HAR-RV model to the data.

The kernel we select will continuously update our forecast, capturing those signals to replicate the temporal dynamics of order arrivals. We will depict the periods of signal clustering using the Fourier transform and power spectral density (PSD) to differentiate between periods of high and low frequencies associated with the self-exciting process.

These methods are shared between the Hawkes process and the HAR-RV model which makes the joint framework more appealing. Hence in the context of realised volatility, we aim to reproduce clustering patterns along with volatility shocks.

Data analysis

One extension to the Hawkes process we did not consider is non-parametric kernels. This methodology is supreme in fitting any given model underpinning its data-driven robustness. The ideal framework is constrained by interpretability and data accessibility which is why we favour parametric kernels which are trivial and can be deciphered analytically.

The dimensionality of our chosen model requires high-quality intraday data with time reference to get the best estimates for realised volatility and the parameters of the kernel chosen a priori. Marked Hawkes process requires additional information about each event besides its timestamp such as the relative size of the event.

We obtain our high-frequency data from the NYSE Trade and Quote (TAQ) database supported by Wharton Research Data Services (WRDS). This exhaustive database provides transaction data at a millisecond interval which enables us to dynamically update the intensity of the chosen sequence of events to boost the accuracy of our model.

Finn Smith