Tampere University of Technology

TUTCRIS Research Portal

Model selection for linear classifiers using Bayesian error estimation

Research output: Contribution to journalArticleScientificpeer-review

Details

Original languageEnglish
Pages (from-to)3739-3748
Number of pages10
JournalPattern Recognition
Volume48
Issue number11
DOIs
Publication statusPublished - Nov 2015
Publication typeA1 Journal article-refereed

Abstract

Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model dis determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be very time consuming, and, for small sample sizes, the accuracy of the model selection is limited by the large variance of CV error estimates. In this paper, we study the applicability of a recently proposed Bayesian error estimator for the selection of the best model along the regularization path. We also propose an extension of the estimator that allows model selection in multiclass cases and study its efficiency with L-1 regularized logistic regression and L-2 regularized linear support vector machine. The model selection by the new Bayesian error estimator is experimentally shown to improve the classification accuracy, especially in small sample-size situations, and is able to avoid the excess variability inherent to traditional cross-validation approaches. Moreover, the method has significantly smaller computational complexity than cross-validation. (C) 2015 Elsevier Ltd. All rights reserved.

Keywords

  • Logistic regression, Support vector machine, Regularization, Bayesian error estimator, Linear classifier, MULTINOMIAL LOGISTIC-REGRESSION, SUPPORT VECTOR MACHINES, CLASSIFICATION, PERFORMANCE, BOUNDS

Publication forum classification

Field of science, Statistics Finland