Tampere University of Technology

TUTCRIS Research Portal

Mid-Price Movement Prediction in Limit Order Books Using Feature Engineering and Machine Learning

Research output: Book/ReportDoctoral thesisCollection of Articles


Original languageEnglish
PublisherTampere University
Number of pages80
ISBN (Electronic)978-952-03-1288-6
ISBN (Print)978-952-03-1287-9
Publication statusPublished - 25 Oct 2019
Publication typeG5 Doctoral dissertation (article)

Publication series

NameTampere University Dissertations
ISSN (Print)2489-9860
ISSN (Electronic)2490-0028


The increasing complexity of financial trading in recent years revealed the need for methods that can capture its underlying dynamics. An efficient way to organize this chaotic system is by contracting limit order book ordering mechanisms that operate under price and time filters. Limit order book can be analyzed using linear and nonlinear models.

The thesis develops novelmethods for the identification of limit order book characteristics which provide traders and market makers an information edge in their trading. A good proxy for traders and market makers is the prediction of mid-price movement, which is the main target of this thesis. The contributions of this thesis are categorized chronologically into three parts. The first part refers to the introduction in the literature of the first publicly available limit order book dataset for high-frequency trading for the task of mid-price movement prediction. This dataset comes together with the development of an experimental protocol that utilizes methods inspired by ridge regression and a single layer feed-forward neural network as classifiers. These classifiers use state-of-the-art limit order book features as inputs for the target task.

The next contribution of this thesis is the use and development of a wide range of technical and quantitative indicators for the task of mid-price movement prediction via an extensive feature selection process. This feature selection process identifies which features improve predictability performance. The results suggest that the newly introduced quantitative feature based on an adaptive logistic regression model for online learning was selected first according to several criteria. These criteria operate according to entropy, linear discriminant analysis, and least mean square error.

The third contribution is the introduction of econometric features as inputs to deep learning models for the task of mid-price movement prediction. An extensive comparison against other state-of-the-art hand-crafted features and fully automated feature extraction processes is provided. Furthermore, a new experimental protocol is developed for the task of mid-price prediction, to overcome the problem of time irregularities, which characterizes high-frequency data. Results suggest that advanced hand-crafted features such as econometric indicators can predict movements of proxies, such as mid-price.

Field of science, Statistics Finland