Weighted Maximum Likelihood Correlation Coefficient to Handle
Missing Values and Outliers in Data Set
Author(s): Juthaphorn Sinsomboonthong, Saichon Sinsomboonthong
Abstract: The proposed estimator, namely weighted maximum likelihood (WML) correlation coefficient, for measuring the relationship between two variables to concern about missing values and outliers in the dataset is presented. This estimator is proven by applying the conditional probability function to take care of some missing values and pay more attention to values near the center. However, outliers in the dataset are assigned a slight weight. These using techniques will give the robust proposed method when the preliminary assumptions are not met data analysis. To inspect about the quality of the proposed estimator, the six methods—WML, Pearson, median, percentage bend, biweight mid, and composite correlation coefficients—are compared the properties in two criteria, i.e. the bias and mean squared error, via the simulation study. The results of generated data are illustrated that the WML estimator seems to have the best performance to withstand the missing values and outliers in dataset, especially for the tiny sample size and large percentage of outliers regardless of missing data levels. However, for the massive sample size, the median correlation coefficient seems to have the good estimator when linear relationship levels between two variables are approximately over 0.4 irrespective of outliers and missing data levels.
Keywords: maximum likelihood, correlation coefficient, missing, outliers, bias, mean squared error
DOI: 10.37394/23206.2021.20.43WSEAS Transactions on Mathematics, ISSN / E-ISSN: 1109-2769 / 2224-2880, Volume 20, 2021, Art. #43