Tampere University of Technology

TUTCRIS Research Portal

An Optimized k-NN Approach for Classification on Imbalanced Datasets with Missing Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Details

Original languageEnglish
Title of host publicationAdvances in Intelligent Data Analysis XV
Subtitle of host publication15th International Symposium, IDA 2016, Stockholm, Sweden, October 13-15, 2016, Proceedings
PublisherSpringer
Pages387-392
ISBN (Electronic)978-3-319-46349-0
ISBN (Print)978-3-319-46348-3
DOIs
Publication statusPublished - 2016
Publication typeA4 Article in a conference publication
EventINTERNATIONAL SYMPOSIUM ON INTELLIGENT DATA ANALYSIS -
Duration: 1 Jan 1900 → …

Publication series

NameLecture Notes in Computer Science
Volume9897
ISSN (Print)0302-9743

Conference

ConferenceINTERNATIONAL SYMPOSIUM ON INTELLIGENT DATA ANALYSIS
Period1/01/00 → …

Abstract

In this paper, we describe our solution for the machine learning prediction challenge in IDA 2016. For the given problem of 2-class classification on an imbalanced dataset with missing data, we first develop an imputation method based on k-NN to estimate the missing values. Then we define a tailored representation for the given problem as an optimization scheme, which consists of learned distance and voting weights for k-NN classification. The proposed solution performs better in terms of the given challenge metric compared to the traditional classification methods such as SVM, AdaBoost or Random Forests.

Publication forum classification

Field of science, Statistics Finland