Tampere University of Technology

TUTCRIS Research Portal

Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error

Research output: Contribution to journalArticleScientificpeer-review


Original languageEnglish
Pages (from-to)521-551
JournalMachine Learning and Knowledge Extraction
Issue number1
Publication statusPublished - 22 Mar 2019
Publication typeA1 Journal article-refereed


When performing a regression or classification analysis, one needs to specify a statistical model. This model should avoid the overfitting and underfitting of data, and achieve a low generalization error that characterizes its prediction performance. In order to identify such a model, one needs to decide which model to select from candidate model families based on performance evaluations. In this paper, we review the theoretical framework of model selection and model assessment, including error-complexity curves, the bias-variance tradeoff, and learning curves for evaluating statistical models. We discuss criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities. To make the theoretical concepts transparent, we present worked examples for linear regression models. However, our conceptual presentation is extensible to more general models, as well as classification problems.

Publication forum classification

Field of science, Statistics Finland