AN ANALYSIS OF FOUR MISSING DATA TREATMENT METHODS FOR SUPERVISED LEARNING

Buy Article:

Authors: BATISTA, GUSTAVO E. A. P. A.; MONARD, MARIA CAROLINA

Source: Applied Artificial Intelligence, Volume 17, Numbers 5-6, Numbers 5-6/May-July 2003, pp. 519-533(15)

DOI: https://doi.org/10.1080/713827181

One relevant problem in data quality is missing data. Despite the frequent occurrence and the relevance of the missing data problem, many machine learning algorithms handle missing data in a rather naive way. However, missing data treatment should be carefully treated, otherwise bias might be introduced into the knowledge induced. In this work, we analyze the use of the k-nearest neighbor as an imputation method. Imputation is a term that denotes a procedure that replaces the missing values in a data set with some plausible values. One advantage of this approach is that the missing data treatment is independent of the learning algorithm used. This allows the user to select the most suitable imputation method for each situation. Our analysis indicates that missing data imputation based on the k-nearest neighbor algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data, and can also outperform the mean or mode imputation method, which is a method broadly used to treat missing values.

Document Type: Research Article

Affiliations: University of Sa¯o Paulo, Sa¯o Carlos, SP, Brazil

Publication date: 01 May 2003

More about this publication?

Information for Authors
Subscribe to this Title
Ingenta Connect is not responsible for the content or availability of external websites

Access Key
Free content
Partial Free content
New content
Open access content
Partial Open access content
Subscribed content
Partial Subscribed content
Free trial content

AN ANALYSIS OF FOUR MISSING DATA TREATMENT METHODS FOR SUPERVISED LEARNING

Buy Article:

Sign-in

Tools

Share Content