WSEAS Transactions on Information Science and Applications
Print ISSN: 1790-0832, E-ISSN: 2224-3402
Volume 9, 2012
A Genetic Algorithm Based Approach for Imputing Missing Discrete Attribute Values in Databases
Authors: ,
Abstract: Missing values create a noisy environment in almost all engineering applications and is always an unavoidable problem in data management and analysis. Many techniques have been introduced by researchers to impute these missing values. Most of the existing methods would be suitable for numerical attributes. For handling discrete attributes, only very few methods are available and there is still a necessity for good and sophisticated method. The proposed approach provides a solution for this need by introducing a new technique based on Genetic Algorithm and Bayes’ Theorem to impute missing discrete attributes which often occurs in real world applications. The experimental results clearly show that the proposed approach significantly improves the accuracy rate of imputation of the missing values. It works better for datasets even with missing rates as high as 50% when compared with other existing methods. Rather than using highly complex statistical software, we use a simple procedure which does not demand much expertise of the user and still capable of achieving much better performance. The proposed approach not only imputes the missing values, it also provides information about the cases which behave similar to those with missing values.
Search Articles
Keywords: Missing values, Numerical attributes, Discrete attributes, Genetic Algorithm, Bayes’ Theorem, Imputation