Comparative Analysis of Nonlinear Models Developed using Machine
Learning Algorithms
MAJA ROŽMAN1, ALEN KIŠIĆ2, DIJANA OREŠKI3
1Faculty of Economics and Business,
University of Maribor,
Razlagova ulica 14, 2000 Maribor,
SLOVENIA
2Vern University,
Palmotićeva 82/1, 10 000 Zagreb,
CROATIA
3Faculty of Organization and Informatics,
University of Zagreb,
Pavlinska 2, Varazdin,
CROATIA
Abstract: - Machine learning algorithms are increasingly used in a vast spectrum of domains where statistical
approaches were previously used. Algorithms such as artificial neural networks, classification, regression trees,
or support vector machines provide various advantages over traditional linear regression or discriminant
analysis. Advantages such as flexibility, scalability, and improved accuracy in dealing with diverse data types,
nonlinear problems, and dimensionality reduction, compared to traditional statistical methods are empirically
demonstrated in many previous research papers. In this paper, two machine learning algorithms are compared
with one statistical method on highly nonlinear data. Results indicate a high level of effectiveness for machine
learning algorithms when dealing with nonlinearity.
Key-Words: - Machine learning, decision tree algorithm, artificial neural network, predictive models, data
characteristics, nonlinear data, artificial intelligence.
5HFHLYHG-XO\5HYLVHG$SULO$FFHSWHG0D\3XEOLVKHG-XQH
1 Introduction
This paper is written under the project SIMON:
Intelligent system for automatic selection of the
machine learning algorithms in the social sciences.
The main objective of the SIMON project is to
develop an intelligent system that can automatically
recommend machine learning algorithms in the
social sciences that work better on a particular data
set while taking into consideration the data
properties of educational and business datasets.
Comparative analysis of various machine learning
algorithms across a huge number of datasets is a key
component of the project's research. In this paper,
the focus is on educational data and two groups of
machine learning algorithms, machine learning
based on error and machine learning based on
information. Those algorithms are compared with an
advanced statistical approach of discriminant
analysis on the dataset which is characterized by
high level of nonlinearity. Data properties are
measured through meta-features. There are various
categories of meta-features. In this research, meta-
features that explain data linearity are employed.
Artificial intelligence and machine learning
development transformed all aspects of our lives.
Education is a field that is continuously adapting to
new technologies. Development and application of
artificial intelligence and machine learning in
education have opened new research paths and
possibilities for predicting student performance,
mostly in creating personalized learning approaches
and understanding the factors contributing to
educational success. The combination of powerful
machine learning algorithms and huge amounts of
data created and stored from learning management
systems (LMS) leads to significant scientific
achievements. This paper investigates the predictive
power of two machine learning algorithms of
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.29
Maja Rožman, Alen Kišić, Dijana Oreški
E-ISSN: 2224-3402
303
Volume 21, 2024
different approaches to learning and one advanced
statistical method for LMS data analysis and the
development of student success predictive models
which serve as the basis for personalized intelligent
systems in education. The research presented here
aims to evaluate the effectiveness and application of
machine learning algorithms on nonlinear data.
The paper is structured as follows. Section two
reviews previous related research papers. Section
three provides a description of data and a brief
overview of machine learning algorithms. Section
four gives insights into research results. Section five
concludes the paper and gives guidelines for future
research.
2 Theoretical Backgrounds
2.1 Machine Learning Algorithms for
Nonlinear Predictive Models’
Development
Machine learning algorithms are playing an
increasingly important role in analyzing datasets
that are characterized by nonlinearity. By employing
diverse approaches to the learning phase and
development of descriptive and predictive models,
machine learning algorithms strive to enhance
accuracy and minimize biases. These algorithms are
particularly beneficial in the educational context, as
they can provide more reliable insights regarding
student performance and instructional strategies,
especially when the data are for technology-
enhanced platforms and/or learning management
systems. Moreover, machine learning algorithms
contribute to the identification of patterns and trends
within educational data, enabling educators and
policymakers to make data-driven decisions to
improve teaching and learning outcomes. However,
the evaluation of machine learning algorithms on
social science datasets also presents some
challenges. It is crucial to examine the
interpretability and explainability of machine
learning algorithms to facilitate understanding of
models. Social science domains are characterized by
complexity and nonlinearity. Evaluation of machine
learning algorithms on educational datasets has the
potential to revolutionize the education sector, but it
requires careful consideration of various factors to
maximize their impact. Hereinafter, an overview of
the machine learning algorithms is provided on
educational datasets.
2.2 Machine Learning for Predictive
Models’ Development
Machine learning algorithms used on learning
management system data for predicting student
success include bagging, boosting, stacking, and
voting, [1]. The proposed models in the study, [2],
integrate five traditional machine learning
algorithms (DT, RF, GBT, NB, and KNN) with
ensemble techniques, resulting in improved
prediction performance. Another study, [3], explores
the use of bagging algorithms like random forest
and boosting algorithms like adaptive boosting,
stochastic gradient boosting, and extreme gradient
boosting for predicting student performance.
Additionally, the study, [4], compares and analyzes
five ensemble classifiers, including bagging
decision trees, for modeling student behavior from
e-learning data. Authors in [5], performed a
thorough exploration and analysis of two
educational datasets. Proposed ensemble models
achieve high accuracy and low false positive rates.
There are various papers using machine learning
algorithms for student performance prediction, such
as [6] and [7]. E.g. [7], tested several models for
predicting student success and the support vector
machines algorithm achieved the best results with a
prediction rate of 87.32%. Project SIMON
researchers also contributed to the research topic by
examining machine learning algorithms applications
in social sciences in general [8], focusing on
educational data and developing predictive models
by using different approaches such as machine
learning based on probability or by comparing
various machine learning approaches, [9].
This research takes a step forward by comparing
machine learning algorithms with algorithms based
on statistical learning, and discriminant analysis.
3 Research Methodology
3.1 Learning Management System Data
Educational datasets include a wide range of
information such as demographic details, academic
performance, behavioral patterns, and more. These
datasets provide a comprehensive view of the
student learning process, making them a basis for
applying machine learning algorithms. Nowadays,
LMS data are the primary source due to the large
level of LMS usage in education.
The dataset used in this research is extracted
from Knowledge Discovery in Data course, taught
at the Faculty of Organization and Informatics,
University of Zagreb. The course was offered as an
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.29
Maja Rožman, Alen Kišić, Dijana Oreški
E-ISSN: 2224-3402
304
Volume 21, 2024
elective at the undergraduate level of study program
Information and Business Systems. The course was
taught as a blended learning approach, combining
traditional classroom-based lectures and laboratory
exercises with the various tasks and activities at the
Learning Management System Moodle.
Data extraction was carried out from the
Moodle platform for two separate student
generations, thereby making a sample size of 83
students. Raw data about students' activities were
extracted and measured by the number of students'
logs to specific resources and activities. These
included files, forums, student reports, folders,
choices, file submissions, overview reports, pages,
systems, tests, and assignments.
The final course grade was incorporated into the
dataset as a dependent variable, thus enabling the
application of supervised machine learning
algorithms and the development of predictive
models of student success.
3.2 Machine Learning Algorithms
There are numerous research studies investigating
artificial neural networks and decision tree usage for
predictive models in education to predict student
performance, dropout rates, and misconduct
locations, with varying levels of accuracy.
Artificial neural networks and decision trees
were used to predict the performance of students in
a computer science course at Al-Muthanna
University, with a classification accuracy of
77.04%, [10].
Both Decision Trees and Artificial Neural
Networks were used to develop classification
models and generate rules to classify and predict
students' behavior and the location of misconduct on
college campuses, [11].
Artificial Neural Network algorithms and
Decision Tree algorithms were used for constructing
a prediction model of student achievement in
business computer disciplines at the School of
Information and Communication Technology, the
University of Phayao, [12].
Artificial neural networks are error-based
machine learning approach that learns by adjusting
weights between neurons and thus minimizing the
error of the model. The whole idea is based on the
biological neurons and the way they function.
Decision trees are information-based machine
learning approaches which learn by identifying the
most informative variables from the data set and
constructing a decision tree model by using the most
informative variables.
4 Research Results
There are numerous approaches to evaluate and test
predictive model accuracy. In this case, k-fold
cross-validation is used. Using k-fold cross-
validation, the data set is divided into k subgroups.
One of the k subsets is always the test set, while the
other k-1 subsets are always the training set. In this
study, ten folds are employed.
Table 1. Predictive models’ accuracy
Algorithm
Accuracy
Artificial neural network
79.83 %
Decision tree
78.25 %
Discriminant analysis
71.09 %
Using the performance metric from Table 1, we
may draw several inferences. It is to be: can the
results be generalized, or are they the results of
chance? Determining how accurately evaluation
measures reflect classifier behavior is the aim of
statistical significance testing. We tested the
algorithms on one domain and compared them using
two matched sampling t-tests. At the significance
threshold of 0.05, the mean difference's significance
is examined. The assumption in Table 2 is that there
is no difference in the mean values of algorithm
performances, and this is done to see if we can
reject.
Table 2. T-test results
Hypothesis
T-test
H0: Artificial neural
network =
discriminant analysis
0.004
H0: Decision tree
= discriminant
analysis
0.007
H0: Artificial
neural network =
Decision tree
Artificial
neural
network
0.05
Decision tree
As seen in Table 2 there are statistically
significant differences in the performances of the
two artificial neural networks and discriminant
analysis, as well as in decision tree performances
when comparing it with discriminant analysis.
However, there are no differences in performances
between two machine learning approaches: artificial
neural networks and decision trees. Results of
statistical testing indicate the superiority of machine
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.29
Maja Rožman, Alen Kišić, Dijana Oreški
E-ISSN: 2224-3402
305
Volume 21, 2024
learning approaches over statistical learning
approach to developing predictive models.
5 Conclusion
In this paper, we have proposed two machine
learning-based student predictive models and one
statistical learning-based student predictive model.
The proposed model adopts two different
approaches to machine learning-based development
of predictive models: machine learning based on
error (artificial neural network) and machine
learning based on information (decision tree
algorithm). The results of the performance
evaluation reveal there are statistically significant
differences between machine learning and statistical
learning approaches, but there are no statistically
significant differences between the two different
machine learning approaches.
This paper gives two scientific contributions: i)
in the field of machine learning, by investigating
how different machine learning approaches handle
educational LMS data, (ii) ) in the field of statistical
learning, by investigating how handles educational
LMS data (iii) in student predictive models, by
comparing different machine and statistical learning
approaches and demonstrating which one achieves
the best predictive model in this domain.
There are several limitations of the research
presented here. First, only one dataset is used in
algorithm comparison. In future research, we will
upgrade several datasets including several courses at
several study programmes and different faculties
and different countries. Also, the LMS data will be
subjected to various machine learning algorithms,
and their performances will be compared to
determine the results.
Findings from this research could help to tailor
teaching and learning strategies, particularly in
virtual learning environments.
Declaration of Generative AI and AI-assisted
technologies in the Writing Process
During the preparation of this work, the authors
used Paperpal to improve the language of the
manuscript. After using this tool, the authors
reviewed and edited the content as needed and took
full responsibility for the content of the publication.
References:
[1] Saleem, F., Ullah, Z., Fakieh, B., & Kateb, F.
(2021). Intelligent decision support system for
predicting student’s E-learning performance
using ensemble machine learning.
Mathematics, 9(17), 2078, pp. 1-22,
https://doi.org/10.3390/math9172078.
[2] Kumar, M., & Jeet Singh, A. (2022,
September). Process-Based Multi-level
Homogeneous Ensemble Predictive Model for
Analysing Student’s Academic Performance.
In International Conference on Innovative
Computing and Communications:
Proceedings of ICICC 2022, Vol. 1 (pp. 139-
159). Singapore: Springer Nature Singapore.
[3] Lenin, T., & Chandrasekaran, N. (2021).
Learning from Imbalanced Educational Data
Using Ensemble Machine Learning
Algorithms. Webology, 18(SI01), 183-195.
[4] Hassan, H., Ahmad, N. B., & Sallehuddin, R.
(2021). An Empirical Study to Improve
Multiclass Classification Using Hybrid
Ensemble Approach for Students’
Performance Prediction. In Computational
Science and Technology: 7th ICCST 2020,
Pattaya, Thailand, 2930 August, 2020 (pp.
551-561). Springer Singapore.
[5] Injadat, M., Moubayed, A., Nassif, A. B., &
Shami, A. (2020). Systematic ensemble model
selection approach for educational data
mining. Knowledge-Based Systems, 200,
105992,
https://doi.org/10.48550/arXiv.2005.06647.
[6] Ahamad, M., & Ahmad, N. (2021). Students’
knowledge assessment using the ensemble
methods. International Journal of Information
Technology, 13(3), 1025-1032.
[7] Ouatik, F., Erritali, M., Ouatik, F., &
Jourhmane, M. (2022). Predicting student
success using big data and machine learning
algorithms. International Journal of Emerging
Technologies in Learning, 17(12), 236.
[8] Oreski, D. (2023). Application of Machine
Learning Methods for Data Analytics in
Social Sciences. WSEAS Transactions on
Systems, 22, 69-72,
https://doi.org/10.37394/23202.2023.22.8.
[9] Oreški, D., & Hajdin, G. (2019). Development
and comparison of predictive models based on
learning management system data. WSEAS
Transactions on Information Science and
Applications, 16, 192-201.
[10] Altabrawee, H., Ali, O., & Ajmi, S. (2019).
Predicting Students’ Performance Using
Machine Learning Techniques. Journal of
University of Babylon for Pure and Applied
Sciences, 27(1), 194-205,
https://doi.org/10.29196/JUBPAS.V27I1.2108
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.29
Maja Rožman, Alen Kišić, Dijana Oreški
E-ISSN: 2224-3402
306
Volume 21, 2024
[11] Blasi, A., & Alsuwaiket, M. (2020). Analysis
of Students' Misconducts in Higher Education
using Decision Tree and ANN Algorithms.
Engineering, Technology & Applied Science
Research, 10, 6510-6514,
https://doi.org/10.48084/etasr.3927.
[12] Nuankaew, P., Nuankaew, W., Teeraputon,
D., Phanniphong, K., & Bussaman, S. (2020).
Prediction Model of Student Achievement in
Business Computer Disciplines. Int. J. Emerg.
Technol. Learn., 15, 160-181,
https://doi.org/10.3991/ijet.v15i20.15273.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
This paper is supported by Croatian science
foundation under the project SIMON: Intelligent
system for automatic selection of machine learning
algorithms in social sciences, UIP-2020-02-6312.
Conflict of Interest
The authors have no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.29
Maja Rožman, Alen Kišić, Dijana Oreški
E-ISSN: 2224-3402
307
Volume 21, 2024