need to collect data for attributes like density and
residual_sugar for the white wine data set, thus we
could save our efforts in data collection.
5 Conclusion
A lot of research work to find out functional
dependencies algorithmically from data has been
published, and several efficient algorithms have
been suggested. But, when we have continuous or
numerical attributes, it is highly possible that some
parts of functional dependencies found by the
algorithms may not be real functional dependencies,
because continuous attributes can have a variety of
values, and because a functional dependency has to
satisfy many-to-one relationship between attributes
values only, and the algorithms do not check all the
possible attribute values that are not present in the
data. In this paper, we show how we can determine
whether the found functional dependencies have the
statistical significance of explanatory power by
doing a multivariate linear regression test for each
algorithmically found functional dependency to
compensate for the weakness of the algorithmic
methods. We can check their explanatory power by
calculating adjusted R-squared, and we also
considered other statistics like multicollinearity, the
Durbin-Watson test for independence, and the F
value for suitability of the regression model. For our
experiment, we used the wine quality data set of
Vinho Verde in the UCI machine learning library,
and we found that only 48.7% of functional
dependencies found by the algorithm called FDtool
have explanatory power for the red wine data set,
while only 30.7% of functional dependencies have
explanatory power for the white wine data set. From
these findings, we can conclude that we should be
careful when we want to take advantage of the
functional dependencies found by the algorithm
because the algorithm finds too much functional
dependency on numerical attributes.
In addition, as a possible application of found
explanatory functional dependencies in the
conditional attributes, we have generated random
forests by dropping redundant attributes that appear
at the right-hand side of the explanatory functional
dependencies and acquired good results. So, we can
also conclude that we can reduce our efforts by not
collecting redundant attribute values to check wine
quality because we can use samples of as few
attribute values as possible in mass-produced wines
like Vinho Verde. Further developments for more
practical applications can be the discretization of
numerical attributes to reduce the number of found
functional dependencies by the algorithm, and the
task can be challenging because there are many
discretization methods available, and selecting the
best one is difficult.
Acknowledgment:
This work was supported by Dongseo University,
“Dongseo Frontier Project” Research Fund of 2022.
References:
[1] S.E. Ebeler, Linking Flavor Chemistry to
Sensory Analysis of Wine. In: Teranishi, R.,
Wick, E.L., Hornstein, I. (eds) Flavor
Chemistry. Springer, Boston, MA., 1999.
https://doi.org/10.1007/978-1-4615-4693-
1_35.
[2] C.E. Butzke, S.E. Ebeler, Survey of analytical
method and winery laboratory proficiency,
American Journal of Enology and Viticulture,
Vol.50, pp.461-465,
DOI: 10.5344/ajev.1999.50.4.461.
[3] K.R. Dahal, J.N. Dahal, H. Banjade, S. Gaire,
Prediction of wine quality using machine
learning algorithms, Open Journal of
Statistics, Vol.11, No.2, 2021, pp.278-289,
DOI: 10.4236/ojs.2021.112015.
[4] P. Cortez, A. Cerderia, F. Almeida, T. Matos,
J. Reis, Modeling wine preferences by data
mining from physicochemical properties,
Decision Support Systems, Vol. 47, Issue 4,
2009, pp.547-553.
[5] C.J. Date. Database Design and Relational
Theory: Normal Forms and All That Jazz, 2nd
ed., Apress, 2019.
[6] N. Asghar, A. Ghenai, Automatic Discovery of
Functional Dependencies and Conditional
Functional Dependencies: A Comparative
Study, University of Waterloo, April 2015.
[7] T.Z. Keith, Multiple Regression and Beyond:
An Introduction to Multiple Regression and
Structural Equation Modeling, 3rd ed.,
Routledge, 2019.
[8] C. J. Date, An Introduction to Database
Systems, 8th ed., Pearson, 2003.
[9] N. Asghar, A. Ghenai, Automatic Discovery of
Functional Dependencies and Conditional
Functional Dependencies: A Comparative
Study, University of Waterloo, April 2015
[10] L. Caruccio, S. Cirillo, V. Deufemia, and G.
Polese, Incremental Discovery of Functional
Dependencies with a Bit-vector Algorithm,
Proceedings of the 27th Italian Symposium on
Advanced Database Systems, 2019, pp.146-
157.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.30