of them. In this work, we have checked the
statistical dependence by using three different public
datasets to see their statistical significance of found
functional dependencies and generated decision
trees after removing dependent attributes.
Experiments revealed that the found functional
dependencies have statistical dependence in real-
world datasets. Additionally, we could find out that
decision trees generated some better results after
dropping some redundant or dependent attributes,
especially when found functional dependencies have
a variety of values in the many-to-one relationships
and the value of Pearson chi-square is relatively
large for the attributes. Future works can be, for
generality, the extension of FDtool that can be
applied to datasets having attributes of less than
27 only.
Acknowledgement:
This work was supported by Dongseo University,
“Dongseo Frontier Project” Research Fund of 2021.
References:
[1] C.J. Date. Database Design and Relational
Theory: Normal Forms and All That Jazz, 2nd
ed., Apress, 2019.
[2] N. Asghar, A. Ghenai, Automatic Discovery of
Functional Dependencies and Conditional
Functional Dependencies: A Comparative
Study, University of Waterloo, April 2015.
[3] G.K. Kanji, 100 Statistical Tests, 3rd
ed., SAGE Publications Ltd, 2006.
[4] SPSS Tutorial: Chi-square test of
independence,
https://libguides.library.kent.edu/spss/chisquar
e, 2022.
[5] B.T. Jijo, A.M. Abdulazeez, Classification
Based on Decision Tree Algorithm for
Machine Learning, Journal of Applied Science
and Technology Trends, Vol.2, No.1, 2021,
pp. 20-28.
[6] M. Belkin, D. Hsu, S. Ma, S. Mandal,
Reconciling modern machine-learning
practice and the classical bias-variance trade-
off, PNAS, Vol. 116, No. 32, 2019, pp. 15849-
15854.
[7] P. Tare, S. Mishra, M. Lakhotia, K. Goyal,
Bias Variance Trade-off in Classification
Algorithms on the Census Income Dataset,
International Journal of Computer
Techniques, Vol. 6, Issue 3, 2019, pp. 1-5.
[8] M. Robnik-Sikonja, I. Kononenko, Attribute
dependencies, Understandability and Split
Selection in Tree-Based Models, Proceedings
of the Sixteenth International Conference on
Machine Learning, 1999, pp. 344-353.
[9] R. Elshwi, M.H. Al-Mallah, S. Sakr, On the
Interpretability of Machine Learning-based
Model for Predicting Hypertension, BMC
Medical Informatics, and Decision Making,
Vol.19, Article 146, 2019.
[10] J.R. Quinlan, C4.5: Programs for Machine
Learning, Elsevier, 2014.
[11] C.J. Date, An Introduction to Database
Systems, 8th ed., Pearson, 2003.
[12] L. Caruccio, S. Cirillo, V. Deufemia, and G.
Polese, Incremental Discovery of Functional
Dependencies with a Bit-vector Algorithm,
Proceedings of the 27th Italian Symposium on
Advanced Database Systems, 2019, pp. 146-
157.
[13] J. Liu, J. Li, C. Liu, and Y. Chen, Discover
dependencies from data – a review, IEEE
Transactions on Knowledge and Data
Engineering, Vol. 24, No. 2, 2012, pp. 251-
264.
[14] P. Bohannon, W. Fan, F. Geerts, X. Jia, A.
Kementsietsidis, Conditional Functional
Dependencies For Data Cleaning, IEEE 23rd
International Conference on Data
Engineering, 2007, DOI:
10.1109/ICDE.2007.367920
[15] R. Salem, A. Abdo, Fixing Rules for Data
Cleaning Based on Conditional Functional
Dependency, Future Computing and
Informatics Journal 1, 2016, pp. 10-26.
[16] F. Azzalini, C. Criscuolo, L. Tanca, FAIR-DB:
Functional Dependencies to Discover Data
Bias, Proceedings of the EBDT/ICDT 2021
Joint Conference, 2021.
[17] D. Nguyen, L.T.T. Nguyen, B. Vo, W.
Pedrycz, Efficient Mining of Class
Association Rules with the Itemset Constraint,
Knowledge-Based Systems, Vol.103, 2016, pp.
73-88.
[18] M. Nasr, M. Hamdy, D. Hegazy, K. Bahnasy,
An Efficient Algorithm for Unique Class
Association Rule Mining, Expert Systems with
Applications, Vol. 164, 113978, 2021,
https://doi.org/10.1016/j.eswa.2020.113978
[19] S. García, J. Luengo, J. Sáez, V. López, F.
Herrera, A Survey of Discretization
Techniques: Taxonomy and Empirical
Analysis in Supervised Learning, IEEE
Transactions on Knowledge and Data
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.23