aspects on the part of the students, which may
contribute to any difficulties when it comes to
learning when maladaptation occurs.
Reasons for transferring may be linked to various
factors. Academic performance has a huge impact
on students’ retention, and transfer, [7], [8], [9].
Transferring schools or institutions can affect access
to most degree programs. This is common among
students who are unsure of what they want to study
in college and choose any degree program on a
whim. Another reason is financial difficulties or the
availability of scholarships. Students look for a
college or university where they can transfer, that
offers free tuition, scholarships, or student loans. It
is an undeniable fact that earning a degree requires a
huge amount of money, but due to free tuition at
state or local colleges and universities, the expenses
are reduced but still require a significant amount of
money. Many other factors significantly contribute
to the decision of students to transfer, and it is a
challenge for every academic institution to know
these in order to propose and create solutions for
student mobility.
The university loses its accomplishments as
students transfer, and the students who transfer or
drop out sacrifice the benefit of the continuity of the
services offered by the university. Low completion
rates due to transferring and dropping out affect not
only the student, but also the systematic changes
projected by university or school reform policies
[10]. Hence, early detection of student risk is
necessary, and should be used for policy making,
particularly in the admission of students to ensure
higher completion rates within the university. Using
the data from the Pangasinan State University
Urdaneta City Campus, this study aims to provide a
model that will describe the students' completion
based on their profiles and provide
recommendations based on the result of the model.
The main objective of this paper is to find a
model that possibly best describes student mobility
at the Pangasinan State University Urdaneta City
Campus as a basis for predicting students' success.
In particular, this study sought to:
1. to describe the nature and characteristics of the
collected data;
2. to generate a model that possibly best describes
the students’ mobility in Pangasinan State
University Urdaneta City Campus using:
a. Decision Tree Model; and
b. Binary Logistic Regression Model;
3. to compare the generated models using Decision
Tree and Binary Logistic Regression based on the
following criteria:
a. Accuracy;
b. Area Under the Curve (AUC); and
c. Sensitivity.
2 Methodology
The classification algorithms were implemented
using RapidMiner and RStudio, both of which are
open-source software primarily used for data
science. The decision tree model is applied to
further understand patterns in students’ mobility.
This is named a "decision tree" because the result
after using this model is a collection of nodes
intended to create a decision that is akin to a tree
when represented as a graph. The process of
creating decision tree models depends on the
purpose, whether for classification or regression. In
this study, the decision tree model for classification
was applied because the target attribute assigned as
a label, which is the student's status (whether the
student will transfer or graduate), is not numerical.
Thus, the decision tree rule is utilized to separate the
values belonging to different categories or classes.
The criterion used for splitting in this study is the
information gain criterion. The gini index criterion
was also considered, but it is more applicable for
larger distributions. The accuracy and gain ratio
criteria were also tried, but based on the accuracy,
precision in predicting the transferred class, and area
under the curve (AUC) values, the information gain
method for splitting is more applicable. Also, the
information gain method is perfect for smaller
partitions with a variety of mixed and diverse
values. The application of the information gain
method requires the splitting of the dataset into
training and testing data sets. The rule of thumb in
assigning percentages for the training and testing
datasets was implemented, that is, 70% for training
and 30% for testing. Stratified random sampling is
used to preserve the distribution of the label (status)
in both training and testing datasets,[11].
Table 1. Comparison of the four Criteria for
Splitting
Precision
(Transferred
Class)
Note: Values are derived based on the actual dataset. The
highest values are in boldface. In AUC, a value closest to
1.00 is the best.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.29
Paulo V. Cenas, Jennifer M. Parrone,
Daniel Bezalel A. Garcia, Frederick F. Patacsil