<doi_batch xmlns="http://www.crossref.org/schema/4.4.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="4.4.0"><head><doi_batch_id>724af6c3-722a-428b-a635-fb96a5824f67</doi_batch_id><timestamp>20250214041306984</timestamp><depositor><depositor_name>wseas:wseas</depositor_name><email_address>mdt@crossref.org</email_address></depositor><registrant>MDT Deposit</registrant></head><body><journal><journal_metadata language="en"><full_title>WSEAS TRANSACTIONS ON COMPUTER RESEARCH</full_title><issn media_type="electronic">2415-1521</issn><issn media_type="print">1991-8755</issn><archive_locations><archive name="Portico"/></archive_locations><doi_data><doi>10.37394/232018</doi><resource>http://wseas.org/wseas/cms.action?id=13372</resource></doi_data></journal_metadata><journal_issue><publication_date media_type="online"><month>1</month><day>10</day><year>2025</year></publication_date><publication_date media_type="print"><month>1</month><day>10</day><year>2025</year></publication_date><journal_volume><volume>13</volume><doi_data><doi>10.37394/232018.2025.13</doi><resource>https://wseas.com/journals/cr/2025.php</resource></doi_data></journal_volume></journal_issue><journal_article language="en"><titles><title>Handling Missing Data Techniques: A Meta-Analysis</title></titles><contributors><person_name sequence="first" contributor_role="author"><given_name>Raed</given_name><surname>Alazaidah</surname><affiliation>Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Zarqa University, Zarqa, JORDAN</affiliation></person_name></contributors><jats:abstract xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1"><jats:p>The predictive performance of any classification or regression model highly depends on the quality of the collected data. Most of time datasets suffer from the problem of missing values, and hence, several techniques have been proposed to handle the problem of missing values. Consequently, this paper aims to quickly survey the most well-known techniques that handle missing data, and identify the best one to use concerning several issues such as the ratio of missing values, type of attributes in the dataset, number of instances, and number of class labels. Hence, seven different and well-known missing values handling techniques have been evaluated and compared using five datasets with different characteristics concerning the Accuracy metric. The results revealed that the K- Means technique is the most appropriate technique to handle the problem of missing data and the SMO classifier is the best choice to use as a classification model in case of missing data.</jats:p></jats:abstract><publication_date media_type="online"><month>2</month><day>14</day><year>2025</year></publication_date><publication_date media_type="print"><month>2</month><day>14</day><year>2025</year></publication_date><pages><first_page>178</first_page><last_page>186</last_page></pages><publisher_item><item_number item_number_type="article_number">18</item_number></publisher_item><ai:program xmlns:ai="http://www.crossref.org/AccessIndicators.xsd" name="AccessIndicators"><ai:free_to_read start_date="2025-02-14"/><ai:license_ref applies_to="am" start_date="2025-02-14">https://wseas.com/journals/cr/2025/a365118-237.pdf</ai:license_ref></ai:program><archive_locations><archive name="Portico"/></archive_locations><doi_data><doi>10.37394/232018.2025.13.18</doi><resource>https://wseas.com/journals/cr/2025/a365118-237.pdf</resource></doi_data><citation_list><citation key="ref0"><doi>10.34028/iajit/20/1/5</doi><unstructured_citation>E. Elbasi, &amp; A. I. Zreikat, (2023). Heart Disease Classification for Early Diagnosis based on Adaptive Hoeffding Tree Algorithm in IoMT Data. International Arab Journal of Information Technology, 20(1), 38-48. </unstructured_citation></citation><citation key="ref1"><unstructured_citation>E. Gibaja, &amp; S. Ventura, S.(2015).A tutorial on multi label learning. ACM Computing Surveys (CSUR), 47(3), 1-38. </unstructured_citation></citation><citation key="ref2"><doi>10.3390/app13085081</doi><unstructured_citation>R. Alazaidah, G. Samara, S. Almatarneh, M. Hassan, M. Aljaidi, &amp; H. Mansur, (2023). Multi-Label Classification Based on Associations. Applied Sciences, 13(8), 5081. </unstructured_citation></citation><citation key="ref3"><doi>10.1109/incit56086.2022.10067669</doi><unstructured_citation>E. Al Daoud, &amp; G. Samara, (2022, November). Improving the Face Recognition Performance Using Gabor and VGGFace2 Features Concatenation. In 2022 6th International Conference on Information Technology (InCIT) (pp. 187-190). IEEE. </unstructured_citation></citation><citation key="ref4"><doi>10.5455/jjcit.71-1615297634</doi><unstructured_citation>R. Alazaidah, &amp; M. A. Almaiah, (2021). Associative classification in multi-label classification: An investigative study. Jordanian Journal of Computers and Information Technology, 7(2). </unstructured_citation></citation><citation key="ref5"><doi>10.5267/j.ijdns.2023.10.006</doi><unstructured_citation>M. Alzyoud, R. Alazaidah, M. Aljaidi, G. Samara, M. Qasem, M. Khalid, &amp; N. AlShanableh, (2024). Diagnosing diabetes mellitus using machine learning techniques. International Journal of Data and Network Science, 8(1), 179-188. </unstructured_citation></citation><citation key="ref6"><doi>10.1016/j.compbiolchem.2022.107809</doi><unstructured_citation>E. A. Alhenawi, R. Al-Sayyed, A. Hudaib, &amp; S. Mirjalili, (2023). Improved intelligent water drop-based hybrid feature selection method for microarray data processing. Computational Biology and Chemistry, 103, 107809. </unstructured_citation></citation><citation key="ref7"><doi>10.1016/j.ins.2022.12.022</doi><unstructured_citation>X. Zhu, J. Li, J. Ren, J. Wang, &amp; G. Wang, (2023). Dynamic ensemble learning for multi-label classification. Information Sciences, 623, 94-111. </unstructured_citation></citation><citation key="ref8"><doi>10.1109/icngn59831.2023.10396670</doi><unstructured_citation>M. S. Alzboon, A. F. Bader, A. Abuashour, M. K. Alqaraleh, B. Zaqaibeh, &amp; M. AlBatah, (2023, November). The Two Sides of AI in Cybersecurity: Opportunities and Challenges. In 2023 International Conference on Intelligent Computing and Next Generation Networks (ICNGN) (pp. 1- 9). IEEE. </unstructured_citation></citation><citation key="ref9"><doi>10.21203/rs.3.rs-535520/v1</doi><unstructured_citation>T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, &amp; O. Tabona, (2021). A survey on missing data in machine learning. Journal of Big Data, 8(1), 1-37. </unstructured_citation></citation><citation key="ref10"><doi>10.1109/access.2019.2910287</doi><unstructured_citation>S. Wang, M. Li, N. Hu, E. Zhu, J. Hu, X. Liu, &amp; J. Yin, (2019). K-means clustering with incomplete data. IEEE Access, 7, 69162- 69171. </unstructured_citation></citation><citation key="ref11"><doi>10.1007/11548706_36</doi><unstructured_citation>J. W. Grzymala-Busse, L. K. Goodwin, W. J. Grzymala-Busse, &amp; X. Zheng, (2005). Handling missing attribute values in preterm birth datasets. In Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing: 10th International Conference, RSFD GrC2005, Regina, Canada, August 31-September 3, 2005, Proceedings, PartII10 (pp.342-351). Springer Berlin Heidelberg. </unstructured_citation></citation><citation key="ref12"><doi>10.1007/3-540-54563-8_100</doi><unstructured_citation>J. W. Grzymala-Busse, (1991). On the unknown attribute values in learning from examples. In Methodologies for Intelligent Systems: 6th International Symposium, ISMIS'91 Charlotte, NC, USA, October 16– 19,1991Proceedings 6 (pp.368-377).Springer Berlin Heidelberg. </unstructured_citation></citation><citation key="ref13"><doi>10.1093/bioinformatics/btg287</doi><unstructured_citation>S. Oba, M. A. Sato, I. Takemasa, M. Monden, K. I. Matsubara, &amp; S. Ishii, (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16), 2088-2096. </unstructured_citation></citation><citation key="ref14"><doi>10.1093/bioinformatics/bth499</doi><unstructured_citation>H. Kim, G. H. Golub, &amp; H. Park, (2005). Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics, 21(2), 187-198. </unstructured_citation></citation><citation key="ref15"><doi>10.1093/bioinformatics/17.6.520</doi><unstructured_citation>O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, &amp; R. B. Altman, (2001).Missing value estimation methods for DNA microarrays. Bio informatics, 17(6), 520-525. </unstructured_citation></citation><citation key="ref16"><doi>10.2991/ijcis.10.1.82</doi><unstructured_citation>I. Triguero, S. González, J. M. Moyano, S. García López, J. Alcalá Fernández, J. Luengo Martín, &amp; F. Herrera Triguero, (2017). KEEL 3.0: an open source software for multistage analysis in datamining. </unstructured_citation></citation><citation key="ref17"><doi>10.20944/preprints202307.1619.v1</doi><unstructured_citation>Z. Salah &amp; E. A. Elsoud, (2023). Toward Effective Framework for Wireless Intrusion Detection System in Detecting Krack and kr00k attacks in IEEE 802.11. </unstructured_citation></citation><citation key="ref18"><doi>10.11591/ijeecs.v34.i3.pp1627-1642</doi><unstructured_citation>Z. Salah, K. Salah &amp; E. Elsoud, (2024). Spatial domain noise removal filtering for low-resolution digital images. Indonesian Journal of Electrical Engineering and Computer Science, 34(3), 1627-1642. </unstructured_citation></citation><citation key="ref19"><unstructured_citation>G. Samara, "Intelligent reputation system for safety messages in VANET." Int J Artif Intell., 9, no. 3 (2020): 439-447.. </unstructured_citation></citation><citation key="ref20"><doi>10.3991/ijoe.v19i08.39433</doi><unstructured_citation>H. A. Owida, B. A. H. Moh’d, N. Turab, J. Al-Nabulsi, &amp; S. Abuowaida, (2023). The Evolution and Reliability of Machine Learning Techniques for Oncology. International Journal of Online &amp; Biomedical Engineering, 19(8). </unstructured_citation></citation><citation key="ref21"><doi>10.11591/eei.v11i5.3802</doi><unstructured_citation>H. A. Owida, O. S. M. Hemied, R. S. Alkhawaldeh, N. F. F. Alshdaifat, &amp; S. F. A. Abuowaida,(2022). Improved deep learning approaches for covid-19 recognition in ct images. Journal of theoretical and applied information technology, 100(13). </unstructured_citation></citation><citation key="ref22"><doi>10.1007/s11042-023-14347-8</doi><unstructured_citation>A. Mughaid, I. Obeidat, S. AlZu’bi, E. A. Elsoud, A. Alnajjar, A. R. Alsoud, &amp; L. Abualigah, (2023). A novel machine learning and face recognition technique for fake accounts detection system on cyber social networks. Multimedia Tools and Applications, 82(17), 26353-26378. </unstructured_citation></citation><citation key="ref23"><doi>10.1109/acit50332.2020.9300065</doi><unstructured_citation>G. Samara, (2020, November). Wireless sensor network MAC energy-efficiency protocols: a survey. In 2020 21st International Arab Conference on Information Technology (ACIT) (pp. 1-5). IEEE. </unstructured_citation></citation><citation key="ref24"><doi>10.1109/acit53391.2021.9677341</doi><unstructured_citation>I. Hussain, G. Samara, I. Ullah, and N. Khan, 2021, December. Encryption for end-user privacy: a cyber-secure smart energy management system. In 2021 22nd International Arab Conference on Information Technology (ACIT) (pp. 1-6). IEEE. </unstructured_citation></citation><citation key="ref25"><doi>10.1109/access.2021.3110586</doi><unstructured_citation>A. Ghaben, M. Anbar, I. H. Hasbullah, and S. Karuppayah, Mathematical Approach as Qualitative Metrics of Distributed Denial of Service Attack Detection Mechanisms, In September 2021, date of current version September 13, 2021. IEEE Access. DOI: 10.1109/ACCESS.2021.3110586. </unstructured_citation></citation><citation key="ref26"><doi>10.37394/23209.2025.22.10</doi><unstructured_citation>M. R. Al-Mousa, A. S. Al-Sherideh, A. Ghaben, M. Arabiat, M. Alqudah, H. Almimi, A. Al-Shaikh. 2024, “Applicability of Iot-Aware Models In Health-Care Systems: Potential and Challenges”, Journal of System and Management Sciences, 2024. </unstructured_citation></citation><citation key="ref27"><doi>10.34028/iajit/21/3/7</doi><unstructured_citation>A. Sheta,W. El-Ashmawi, A. Baareh, "Heart Disease Diagnosis Using Decision Trees with Feature Selection Method", The International Arab Journal of Information Technology (IAJIT), Vol. 21, Number 03, pp. 427 - 438, May 2024, doi: 10.34028/iajit/21/3/7. </unstructured_citation></citation><citation key="ref28"><doi>10.34028/iajit/21/2/8</doi><unstructured_citation>M. Maree, M. Eleyat, E. Mesqali, "Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques", The International Arab Journal of Information Technology (IAJIT), Vol. 21, Number 02, pp. 257 - 270, March 2024, doi: 10.34028/iajit/21/2/8. </unstructured_citation></citation><citation key="ref29"><doi>10.34028//iajit/21/1/11</doi><unstructured_citation>J. Li, R. Wang, "An Anomaly Detection Method for Weighted Data Based on Feature Association Analysis", The International Arab Journal of Information Technology (IAJIT), Vol. 21, Number 01, pp. 117 - 127, January 2024, doi: 10.34028/iajit/21/1/11. </unstructured_citation></citation><citation key="ref30"><doi>10.37394/23209.2023.20.16</doi><unstructured_citation>V. Gancheva, I. Georgiev, &amp; V. Todorova, (2023). X-Ray Images Analytics Algorithm based on Machine Learning. WSEAS Transactions on Information Science and Applications, vol. 20, pp.136-145, https://doi.org/10.37394/23209.2023.20.16. </unstructured_citation></citation><citation key="ref31"><doi>10.37394/23208.2023.20.16</doi><unstructured_citation>T. R. Rani, W. Srimal, A. Al Shibli, N. Z. S. Al Bakri, M. Siraj, &amp; T. S. L. Radhika, (2023). Quantile Loss Function Empowered Machine Learning Models for Predicting Carotid Arterial Blood Flow Characteristics. WSEAS Transactions on Biology and Biomedicine, vol. 20, pp.155-170, https://doi.org/10.37394/23208.2023.20.16.</unstructured_citation></citation></citation_list></journal_article></journal></body></doi_batch>