WSEAS Transactions on Information Science and Applications
Print ISSN: 1790-0832, E-ISSN: 2224-3402
Volume 19, 2022
A Comparison of Statistical Dependency and Functional Dependency between Attributes Based on Data
Author:
Abstract: Chi-squared test is a standard statistical test to ascertain independence between categorical variables. So, it is recommended to do the test for the attributes in the datasets, and remove any redundant attributes before we supply the datasets to machine learning algorithms. But, if we have many attributes that are common in real-world datasets, it is not easy to choose two attributes to do the independence test. On the other hand, several automated algorithms to find functional dependencies based on data have been suggested. Because functional dependencies show many-to-one relationships between values of attributes, we could conjecture that there might be statistical dependence in the found functional dependencies. For us to overcome the problem of choosing appropriate attributes for statistical dependency tests, we may use some algorithms for automated functional dependency finding. We want to confirm that the found functional dependencies can show statistical dependence between attributes in real-world datasets. Experiments were performed for three different real-world datasets using SPSS to confirm the statistical dependence of functional dependencies that are found by an open-source tool called FDtool, where we can use FDtool for automated functional dependency discovery. The experiments confirmed that there exists statistical dependence in the found functional dependencies and showed improvements in decision trees after removing dependent attributes.
Search Articles
Keywords: Artificial intelligence, machine learning, classification, statistical independence, functional dependency, knowledge modeling, preprocessing, relations, data tables
Pages: 225-236
DOI: 10.37394/23209.2022.19.23