A Short Note on Safest Default Missingness Mechanism Assumptions

Authors: Qinbao Song1; Martin Shepperd2; Michelle Cartwright3

Source: Empirical Software Engineering, Volume 10, Number 2, April 2005 , pp. 235-243(9)

Publisher: Springer

Buy & download fulltext article:

OR

Price: $47.00 plus tax (Refund Policy)

Abstract:

A very common problem when building software engineering models is dealing with missing data. To address this there exist a range of imputation techniques. However, selecting the appropriate imputation technique can also be a difficult problem. One reason for this is that these techniques make assumptions about the underlying missingness mechanism, that is how the missing values are distributed within the data set. It is compounded by the fact that, for small data sets, it may be very difficult to determine what is the missingness mechanism. This means there is a danger of using an inappropriate imputation technique. Therefore, it is necessary to determine what is the safest default assumption about the missingness mechanism for imputation techniques when dealing with small data sets. We examine experimentally, two simple and commonly used techniques: Class Mean Imputation (CMI) and k Nearest Neighbors (k-NN) coupled with two missingness mechanisms: missing completely at random (MCAR) and missing at random (MAR). We draw two conclusions. First, that for our analysis CMI is the preferred technique since it is more accurate. Second, and more importantly, the impact of missingness mechanism on imputation accuracy is not statistically significant. This is a useful finding since it suggests that even for small data sets we can reasonably make a weaker assumption that the missingness mechanism is MAR. Thus both imputation techniques have practical application for small software engineering data sets with missing values.

Keywords: Software effort prediction; missing data; data imputation; missingness mechanism

Document Type: Research article

DOI: http://dx.doi.org/10.1007/s10664-004-6193-8

Affiliations: 1: Empirical Software Engineering Research Group, School of Design, Engineering and Computing, Bournemouth University, UK, Email: qsong@bmth.ac.uk 2: Empirical Software Engineering Research Group, School of Design, Engineering and Computing, Bournemouth University, UK, Email: mshepper@bmth.ac.uk 3: Empirical Software Engineering Research Group, School of Design, Engineering and Computing, Bournemouth University, UK, Email: mcartwri@bmth.ac.uk

Publication date: 2005-04-01

Related content

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page