Unveiling the Power: A Comparative Analysis of Data Mining Tools

through Decision Tree Classification on the Bank Marketing Dataset

ELIF AKKAYA1, SAFIYE TURGAY2

1Department of Electric and Electronic Engineering,

Sakarya University,

54187, Esentepe Campus Serdivan-Sakarya,

TURKEY

2Department of Industrial Engineering,

Sakarya University,

54187, Esentepe Campus Serdivan-Sakarya,

TURKEY

Abstract: - The importance of data mining is growing rapidly, so the comparison of data mining tools has

become important. Data mining is the process of extracting valuable data from large data to meet the need to

see relationships between data and to make predictions when necessary. This study delves into the dynamic

realm of data mining, presenting a comprehensive comparison of prominent data mining tools through the lens

of the decision tree algorithm. The research focuses on the application of these tools to the BankMarketing

dataset, a rich repository of financial interactions. The objective is to unveil the efficacy and nuances of each

tool in the context of predictive modeling, emphasizing key metrics such as accuracy, precision, recall, and F1-

score. Through meticulous experimentation and evaluation, this analysis sheds light on the distinct strengths

and limitations of each data-mining tool, providing valuable insights for practitioners and researchers in the

field. The findings contribute to a deeper understanding of tool selection considerations and pave the way for

enhanced decision-making in data mining applications. Classification is a data mining task that learns from a

collection of data to accurately predict new cases. The dataset used in this study is the Bank Marketing dataset

from the UCI machine-learning repository. The bank marketing dataset contains 45211 instances and 17

features. The bank marketing dataset is related to the direct marketing campaigns (phone calls) of a Portuguese

banking institution and the classification objective is to predict whether customers will subscribe to a deposit

(variable y) in a period. To make the classification, the machine learning technique can be used. In this study,

the Decision Tree classification algorithm is used. Knime, Orange, Tanagra, Rapidminerve, Weka yield mining

tools are used to analyze the classification algorithm.

Key-Words: - Data Mining Tools, BankMarketing Dataset, Feature Selection, Performance Evaluation,

Decision Trees, Evaluation Metrics.

Received: August 31, 2023. Revised: February 5, 2024. Accepted: March 7, 2024. Published: May 13, 2024.

1 Introduction

In a data mining environment where there is a

limitless response in terms of data size and data

sources, usefulness of the toolset you select is a

primary tool for extracting valuable

information. The investigation turns on the

examination of the features of the variety of data

mining tools by the means of decision tree

methodology. And notably, the backdrop for this

exploration is the BankMarketing dataset, which is a

kind of treasure box of daily financial interactions

that gives the right context for predictive analytics

in banking.

Data mining is the realization from a complex

dataset that is created from patterns, trends, and

knowledge. Flooded with many of them as there are,

you need to pick the most appropriate one. Decision

tree classification, which is very commonly used in

data mining, is popular for its interpretability and

utility. The capacities of the AI include identifying

complex associations in data, which can provide the

most appropriate solution for this comparative

analysis. The main target of the study is a careful

analysis and evaluation of numerous data mining

tools, by using the decision tree classifier on the

BankMarketing dataset. By using this filter,

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

Volume 23, 2024

therefore, we would like to take a closer look at the

special advantages as well as disadvantages of each

tool. Particular performance metrics, for instance,

the accuracy, precision, recall, and F1-score, will be

used to judge the effectiveness of the method. The

main goal of this in-depth study is to feed the

existing data mining tool choice domain with

informed perspectives and practical suggestions,

thus helping the professionals and researchers in

their decision-making processes.

The remainder of this paper unfolds as follows:

Section 2 presents a comprehensive literature

review, it proves the study in the research design,

and showcases significant findings. In part 3, a

detailed study of decision tree algorithm is

presented along with any specific customized

changes made for this experiment.

Section 4 provides an explanation for the

criteria and rationale behind selecting the

appropriate data mining tools used in that

experimentation methodology. At the same time, it

shows the available report metrics and compares the

results. The results in Section 1 and the challenges

and limits are given in Section 5.

2 Literature Survey

A study of data mining tools used in decision tree

classifications is a vital part of BankMarketing

dataset optimization. This literature analysis strives

to disseminate the various powers that each data

mining tool possesses as it is capable of identifying

the strengths and weaknesses of each tool in

addressing the complexities in this dataset. There

have been many studies focusing on the importance

of data mining tools in the different domains. [1],

carried out the iris data set evaluation with the use

of Weka, RapidMiner, ApacheSpark but they

reported that Weka provided the best accuracy value

with 98%. [2], evaluating iris data with 3513 records

using SPSS-Clementine, RapidMiner and

Weka AmritaNaika and LilavatiSamantb analysed

Indian Liver disease patients data using the Decision

tree, K-Nearest Neighbor, Naive Bayes algorithms

with the help of WEKA, Rapidminer, Tanagra,

Orange and Knime. Some researcher focused on the

day to day running into the benefits of data mining

tools in business and research, revealing the

credibility in identifying trends and patterns, [3],

[4], [5], [6], [7], [8], [9].

Decision tree classification has come of age as

one of the most popular algorithim which comes

with the feature of interpretability and usefulness,

[10], [11], [12]. The study does the in-depth analysis

of decision tree algorithms capacity to find

meaningful patterns from the BankMarketing

dataset of [13], [14], [15].

Through reviewing studies, the survey

determines the data mining tools that were used in

the study, the performance of the tools,

interpretability factors, and if the tools were scalable

enough. This tells us if these tools may have been

used in the past and what was wrong with them and

whether the tools are appropriate for use in this

study [16], [17], [18], [19], [20].

On the other hand, the literature survey

scrutinizes various integrated techniques and

techniques applied in decision tree classification

modeling for the prediction of banking datasets. The

field is portrayed as ever-growing and constantly

evolving with the development of new techniques;

the unseen and unexpected difficulties encountered

are recounted alongside the anticipated

advancements in data mining tools capacity of

forecasting [21], [22], [23], [24], [25], [26]. In the

end, the study will show the best strategy to be

applied for data mining BankMarketing data to find

the necessary information and eventually find ways

of improving the data mining methodologies in bank

analytics. Indeed, the previous study gave a basis of

the usage of data mining tools and decision tree

classification algorithms, yet this paper will

elaborate on a single dataset–BankMarketing– and

provide in-depth analysis and comparison of some

specific tools, [27], [28], [29]. The using of decision

tree classifier on financial datasets gives new angle

to the refrigerator which is relating to data mining

technologies, [30], [31], [32], [33], [34], [35].

In the next few sections, we examine at

BankMarketing data set, describe of decent tree

categorizing algorithm, and explicitly describe how

this comparison is conducted in detail. We target

this in the sense that besides current knowledge, it

becomes possible to have a practical view of the

approach to the toolbar in data mining.

3 Methodology

This part gives a bill of fare on the step-by-step

manner used to compare different Data Mining tools

by using the Decision Tree Classification Algorithm

on the BankMarketing data set. These features

include not only demographic but also financial

aspects, and a wide range of economic indicators

and cover all outcome info of the campaign. It is

very integral to have a complete grasp on the dataset

before moving on to the next stage of analysis

because this leaves no place to misunderstand the

data.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

Volume 23, 2024

The choice of parameters for decision tree

classifier, that are, the number of tree splits,

measure function and pruning strategy, had to be

brought in a balance so as not to jeopardize the

model for generalization.

Before launching the trainfulness of model, the

dataset undertake elaborated correction of any

deviance or missing element. Categorical variables

were properly encoded and Features marked with a

number were normalized by way of scaling of those

features so that the model could learn

correctly. Missing values were imputed using

various techniques, and the outliers were either

replaced and tagged for separate consideration or

not. Each solicit was prepared with a set of standard

options, specifications, and accessories; special

attention was paid to the product compatibility for

the Decision Tree Classification algorithm. The

description for the tools used is composed of factors

like the spread, ease of use, and accuracy, which are

mostly accepted and to be relied on by the data

mining community. The next step involves the

classification decision tree algorithm which is used

separately on the training set rather than as a

combined approach through all selected data mining

tools. To find this out, the models' generalization

performance was evaluated by cross-leaving, using

a specific routine, [e.g., k-fold cross-leaving].

To gauge the efficacy of each data mining tool, a

comprehensive suite of performance metrics was

employed. These metrics included accuracy,

precision, recall, F1-score, and area under the

receiver operating characteristic curve (AUC-ROC).

The choice of metrics aimed to provide a holistic

evaluation of each tool's predictive capabilities

across various dimensions.

In the study, the accuracy rates of WEKA,

Rapidminer, Tanagra, Orange and Knime

applications on the determined data sets were

compared. Visualization features, data structures,

platforms, transfer options of data mining tools were

realized and presented (Figure 1). Classification was

performed with the C4.5 algorithm of the decision

tree. By using the accuracy parameter for analysis, it

is aimed to determine how accurate unanalyzed data

sets will give accurate results in applications and to

help users choose the right data mining tool for data

predictions.

RapidMiner: It is a data mining program developed

by YALE University scientists in the USA as a

result of programming with Java Programming

Language. [www.rapidminer.com.].

Weka: It is a Data Mining program that was initially

started as a small project and started to be used by

many people all over the world. It is a product

developed with Java Programming language, [7].

Knime: Konstanz Information Miner (KNIME) is a

software developed by the visual data mining

research group of the University of Konstanz on the

EclipseRich Client Platform. It offers users a

software development kit, [8].

Orange: It is software developed by the artificial

intelligence research team of the Department of

Computer and Informatics Sciences, University of

Ljubljana, Slovenia, [4].

Tanagra: is an open source program that includes

supervised learning algorithms, especially focusing

on the visual and interactive construction of

decision trees, [9]. Classification and clustering are

among the most important methodologies used in

data mining.

Decision tree: Decision trees, which produce class

results by branching using the features in the dataset

and the values of the features, are a frequently used

method for trained learning, [10].

ID3 algorithm: The ID3 algorithm, an entropy-based

algorithm, is the most basic and widely used

algorithm of decisiontree. The goal of the ID3

algorithm is to keep the tree depth at a minimum

level while creating the tree structure. The attributes

whose complexity is determined to be minimum

with entropy are added to the tree, and the data

belonging to these attributes can be discrete data

that can be counted or continuous data that can be

measured, [1].

C4.5 algorithm: The algorithm J48 or C4.5 which

allows both continuous and discrete features and is

almost a similar to the algorithm ID3 and was

developed to overcome the issues of ID3 algorithm,

[1].

Fig. 1: Suggested study process

The outcomes of every data mining instruments

were categorically compared, with the strengths and

weaknesses of each belonging to their sphere

outlined. Statistical tests like paired t-tests or

Wilcoxon signed-rank tests were employed for

evaluating the significance of the observed

discrepancies in behavior performance. Along with

the general performance metrics, the effects of

variables selected or the features of the given dataset

on the tools were assessed by sensitivity analyses as

well. This approach was designed to reveal any

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

Volume 23, 2024

tools-caused sensitiveness or robustness in the

situations where different conditions affect them.

This research embodies ethics guidelines in data

gathering, interpretation, and analysis. The

BankMarketing dataset has the private data

amended and dealt with anonymity and the

maximum confidentiality. The project is committed

to abide by the rules and regulations associated with

the protection of data and privacy. As a result the

researchers can reproduce the findings and also the

transparency in the research is clearly put

forward. The following step is the comparison of

results which will be then discussed and guided by

the context of the tool selection in data mining.

3.1 Decision Tree

The decison tree predictor node generates

predictions of each instance in th input data

respectively. The predictions are a consequence of

the decision rules that have been acquired in training

the decision tree. The accuracy of the given output

can be determined using several performance

measurements, for instance, accuracy, precision,

recall, and F1-score. This processing delivers the

clarification of model's prediction ability on new

data. For certain tools or platforms, the interface

may contain features like visualizing decision tree

structure or showing decision-making process. This

visualization is data that users will understand the

model specifically related to how it is predicting and

explains the model’s behavior. Which iteration will

be applied depends on the evaluation results. The

next step may involve training of the model, and

refining the prediction performance. This repetitive

process allows for successive correction of the

model performance to perfection for the task

implemented.

Undoubtedly, the Decision Tree Predictor node

is a crucial part of the application of decision tree

regression which makes it possible to translate

learned patterns into planes of action that may be

used to forecast for new data instances.

Accuracy stands as a key measure in estimating

classification models' capability, including decision

trees. It means of the correctly predicted instances

that of the total instances in the dataset.

  

 

- One needs high precision to be sure that the model

makes right predictions for a considerable part of

instances. Accuracy is not the only thing because it

is not sufficient in imbalance sets.

- Low accuracy score indicates that the model has a

difficulty distinguishing between good and bad

predictions. Hence trainees could face problems like

overfitting, underfitting or the presence of noisy

data.

- If the classes that exist in the dataset are

imbalanced, accuracy is not the single criterion to

study. Being able to suggest other parameters, such

as precision, recall, and F1-score, can provide

deeper insights, for instance, in case of the program

trying to meet the needs of a specific class.

Accuracy needs to be reported on sufficiently,

considering also other relevant metrics (if

applicable) and keeping the context of specific

problem in mind.

3.2 Confusion Matrix

A confusion matrix is an administrative document

that many researchers use to denote the performance

of the classification model when working on test

data with known values. It is specifically good for

learning what types of mistakes a model may be

making.

- Instruction that asks for positive instances, and are

well represented in the model output.

- Negative instances whose predictions as negative

are correctly made by the model are all the real

cases.

- Instances that are in reality not positive but are,

however, incorrectly classified as positive by the

model. To err is human, and this is known as a Type

I error.

- Case of the positives that are actually good but are

labeled as negative incorrectly by the model. And it

is also referred to as Type II error.

The confusion matrix is placed in the Table 1

generally to make it easier to view.

Table 1. Confusion Matrix

From the confusion matrix, various performance

metrics can be derived, including:

   

   

  

 

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

Volume 23, 2024

  

 

  

 

        

  

The confusion matrix provides a clear and

concise summary of a model's performance,

allowing practitioners to assess both the overall

accuracy and the specific types of errors made by

the model. The idea is to develop a feature tree; the

edges of which represent a tree where each internal

node represent one decision based on a particular

feature, each branch represents the outcome of the

decision, and each leaf node represent the final

prediction.

3.3 Decision Tree Classification Algorithm

Implementation using KNIME Version

4.5.2

Decision tree classy algorithm was done by KNIME

platform which was version 4.5.2 used for this

study. KNIME, which is a free of charge data

analytics platform that permits by an intuitive

graphical interface a smooth and concurrent design

of workflows working together with the

data. Likewise, Bank Marketing was imported into

KNIME which, let me access any necessary dataset

or function as well as manipulate this dataset

efficiently. Values that were missing were addressed

properly through imputation techniques, the effect

of which was considered for what remained of the

data required for further analysis. A category paired

labels were set up and numerical columns were

standardized so as to maintain uniformity in the

input.

Decision tree learner was used within KNIME

to configure tree-based classification algorithm in

this specific case. Parameters including tree depth

and splitting criteria, and decisions on splitting

options were optimized with reference to the

analysis results and dataset characteristics. The

prepared decision tree algorithm was applied to the

training subset of BankMarketing dataset via the

Decision Tree Learner node of the pytreebank

package. Counterchecking of the model's

generalization capabilities, k-fold cross-validation

approach (with k = [Specify the number of folds])

was applied. Metrics protoformance adopted

examples of accuracy, precision, recall, F1-score,

and AUC-ROC were computed using dedicated

nodes of the KNIME. However, this common set of

steps was followed by the implementation of each

tool for the purpose of providing similar

experimental design and evaluation

measures. Parametric sensitivities were looked at to

assess the effect of overall performance of the

decision tree algorithm after it was run through

KNIME by changing specific parameters. KNIME

was used as a whole workflow development

environment, covering data preprocessing, decision

tree configuration, model training, and verification

of success, all with good transparency and

reproducibility.

Consequently, after adding metrics and visuals,

results were exported for a deeper analyses and data

representation.

For implementation purposes, this study is using

a version 4.5.2 of KNIME that provides a consistent

and user-friendly environment thus accommodating

both novice and experienced users to repeat the

approach. The final part, thus, presents and

discusses the resulting set of the findings, providing

more light on the comparison of data mining tools

that are the part of decision tree classification on the

BankMarketing dataset.

3.4 Mathematical Modeling

Formulating a mathematical model for a

comparative study of data mining tools and

classifying these into decision tree category

comprises identifying all the components that shall

be included in the investigation. While the specifics

of the model may vary based on the exact approach

and algorithms used, here's a conceptual

outline:While the specifics of the model may vary

based on the exact approach and algorithms used,

here's a conceptual outline:

- We have a dataset going by BankMarketing and it

is represented as D.

- D is represented as a set of instances ( 󰇜,

where is a vector of features, and is the

corresponding class label.

- D is split into training set  and testing set

 using a specified ratio.

- Let  represent the decision tree model using

tool t.

- The model is trained on using a decision tree

classification algorithm.

- predicts class labels for instances in .

Define metrics (precision),  (recall),  (F1-

score), and  (area under the ROC curve) for

tool t.

Compare the performance metrics of each tool

to determine their effectiveness in classification.

Perform statistical tests (e.g., paired t-tests) to assess

significant differences.

1. Accuracy (Acc)

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

Volume 23, 2024





2. Precision (P )





 

3. Recall (R )



 

4. F1 Score

  

 



 

Compute the AUC-ROC for the tool \(t\) based

on the true positive rate and false positive rate. Use

appropriate statistical tests to compare performance

metrics between different tools. The mathematical

model provides a systematic framework for

comparing the performance of data mining tools

through decision tree classification on the

BankMarketing dataset. This is the model that helps

to determine the degree of efficacy and reveals

statistical assessment of each tool, thus covering

holistic comparison.

4 Case Study

Summarise the essence of information extraction

and Decision tree enabling for a better

understanding of BankMarketing

dataset Indubitably set objectives, to name a few, a

comparison of data mining tools, a decision tree

classification, and a crucial metrics

assessment. Analysis of data mining tools, decision

tree classification, applications considering bank

datasets, and their utility. Thorough coverage of

BankMarketing dataset features and the target

variable ( Orchestra ). An overview of the data

preprocessing steps involved, including, but not

limited to, the handling missing values and decoding

of the categorical variables.

A (Deep) explanation of the decision tree

classification algorithm applied. Talk about

changing the parameters of the experiment and

tuning, if necessary. Data mining is the term for the

process of identifying and using methods and tools

that are used for data analysis and modeling. For

example, KNIME, Weka, and RapidMiner are used

as data mining tools. A discussion on how the

identified tools are able to accomplish tasks and

meet the unique demands of each user. An extensive

description of the experiment set up which includes

among others, data partition, decision tree modeling,

and evaluation of the performance. Consistency is

also important in this case since the paragraph has to

start with a reference to the KNIME Version

4.5.2. Presentation of performances of accuracy,

precision, recalls, and F1-score for each data mining

tool.

Comparative analysis of results, highlighting

strengths and weaknesses of each tool.

Interpretation of findings, exploring reasons for

variations in tool performance. Comparison of

decision tree models generated by each tool (Figure

2). Figure 2 and Figure 3 show the screenshots of

the Knime toolbox for the statistics and decision tree

modules. At the same time, Figure 4 shows the

Confusion Matrix result output.

Fig. 2: Knime Accuracy Statistics

Fig. 3: Knime Decision Tree

Discussion of challenges encountered during the

study. Acknowledgment of limitations in the

experimental design or dataset. This case study aims

to provide a holistic view of the comparative

analysis of data mining tools through decision tree

classification on the BankMarketing dataset,

offering insights into the performance and

applicability of each tool in a real-world scenario.

Decision trees are valuable tools for exploratory

data analysis and building understandable models.

However, practitioners often need to consider

potential overfitting and explore techniques like

pruning or using ensemble methods to enhance their

predictive capabilities.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

100

Volume 23, 2024

Fig. 4: Knime Performance Scores

Classification is a data mining task that learns

from a collection of cases to accurately predict new

cases.Many applications such as Weka, Tanagra,

RapidMiner, Knime, and Orange use classification

with decision trees.

The Bank marketing dataset used in the study,

taken from the UCI site, contains 45211 instances

and 17 features. The classification was performed

on the Bank marketing dataset with the C4.5

algorithm of the decision tree. By using the

accuracy parameter for analysis, it is aimed to

determine how accurate unanalyzed data sets will

give accurate results in applications and to help

users choose the right data mining tool for data

prediction. Neural Networks gave the best result

with 98.66667%. Figure 5 and Figure 6 show the

performance scores and confusion matrix screen

outputs of Orange Toolbox.

Fig. 5: Orange Performance Scores

Fig. 6: Orange Confusion Matrix

A brief summary of the key findings from the

comparative analysis of data mining tools through

decision tree classification on the BankMarketing

dataset. Presentation of accuracy, precision, recall,

and F1-score for each data mining tool.

Comparative analysis of these metrics to highlight

the strengths and weaknesses of each tool in

predictive modeling. Figure 7 shows the Tanagra

classifier performance results.

Fig. 7: Tanagra Classifier Performances

Detailed examination of accuracy statistics for

each tool, including true positives, true negatives,

false positives, and false negatives. Visualization of

the confusion matrices to provide a clear

understanding of the model's predictive

performance. Visualization of key decision nodes

and branches to showcase the interpretability and

complexity of the resulting models. Exploration of

the sensitivity of each data mining tool to variations

in parameters or dataset characteristics. Insights into

the robustness and adaptability of the models.

Figure 8 and Figure 9 show the Rapidminer Tree

Description and Rapidminer Decision Tree screen

outputs.

Application of statistical tests (e.g., paired t-

tests) to assess the significance of observed

differences in performance metrics. Identification of

tools that significantly outperform others. In-depth

discussion and interpretation of the analysis results.

Insightful explanations for observed variations in

performance and decision tree structures. Figure 10

shows the Weka Classifier Output results.

Application of statistical tests (e.g., paired t-

tests) to assess the significance of observed

differences in performance metrics. Identification of

tools that significantly outperform others. In-depth

discussion and interpretation of the analysis results.

Insightful explanations for observed variations in

performance and decision tree structures. Figure 10

shows the Weka Classifier Output results.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

101

Volume 23, 2024

Fig. 8: Rapidminer Decision Tree

Fig. 9: Rapidminer Tree Description

Fig. 10: Weka Classifier Output

Highlighting specific considerations for each

data mining tool. Recommendations for tool

selection based on the context of the BankMarketing

dataset and decision tree classification. Discussion

of the practical implications of the analysis results in

the context of the banking domain. How the

findings can inform decision-making processes and

improve predictive modeling in financial

institutions. Reflection on challenges encountered

during the analysis. Acknowledgment of any

limitations in the study and their potential impact on

the results C4.5 Decision Tree algorithm was

applied to the bank marketing dataset in Knime,

Orange, Tanagra, Rapidminer, and Weka tools.

When the accuracy values of the data tools were

compared, it was observed that the Orange data tool

gave the best result. According to the accuracy

results, it was concluded that the use of the Orange

data tool would be appropriate for datasets similar to

the Bank Marketing dataset to be classified with the

decision tree algorithm. A study that can help

researchers and users to choose the right tool and

technique for data analysis and prediction has been

created (Table 2).

Table 2. Data tools and accuracy results

Recommendations for future research based on

the insights gained. Suggestions for refining the

methodology or exploring additional dimensions in

subsequent studies. Putting in weight on the role of

research which is central to the appraisal of decision

tree tasks tool for data mining. By the mean of the

obtained results researchers and those who are in

practice get complete knowledge about the

performance of various tools of data mining with the

selected dataset. As a result, the practice can be

improved by practitioners and academicians of the

research field pertaining to data mining and

predictive modeling.

5 Conclusion

Our research developed around a mission to expose

the modus operandi of different data mining tools

via a thorough delving into a classifier we

encountered in a decision tree on the

BankMarketing dataset. The main purpose was to

determine how effective each method was in

developing predictor models and to disclose the

unique advantages and disadvantages of each

tool. Let's validate the major discoveries and draw

conclusions after our study has been over.

From my comparison analysis, it was

established that there were differing metrics of

performance across the data mining tools statistics

that included accuracy, precision, recall, and F1-

score. Unequal tree breeding has decision trees with

their own specific traits and as a result, they affect

model understandability and complexity. The

features of the selected tools were also elaborated,

having reminded us that the context of decision tree

classification must be taken into consideration. The

outcomes of the analysis are of pertinent

significance to the decision-making processes in the

banking domain given that informed predictive

modeling is input for these tasks. The advantages

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

102

Volume 23, 2024

that seem superior will be encouraged for

application in certain types of situations or

scenarios.

Practitioners are advised to choose data mining

tools by striking a balance between the kind of task

involved each time and the tool most appropriate for

executing it. Through details regarding tool-oriented

side notes such as those offered in the analysis,

stakeholders can be enabled to make informed

choices.

The research paper does not downplay the fact

that the research had certain limitations and

encountered some challenges during the analysis,

thus pointing to the need to consider them in the

conclusion of the findings. This shows new research

paths that further investigation of methods

improvement, development other algorithms,

numerous datasets, and what-all may be done. One

facility of this study is that it paves the way for

further improvement in the development of

predictive models.

This analysis therefore provides further insight

into the respective mechanisms and choices using a

decision tree classifier. Learning these lessons opens

to the end users a valuable view that they can use in

the process of choice of the tools readily

available. By the final part of the comparative

analysis, it becomes evident that the instrumentality

of data in classification is profound. In a certain

sense, this study is not only a theoretical but also a

practical gain as it helps us to understand the

situation better and gives practical recommendations

to people who are trying to operate in the predictive

modeling environment that surrounds them.

In the quest to unveil the power within the realm

of data mining, this study serves as a beacon,

illuminating the pathways toward informed

decision-making and enhanced predictive modeling

capabilities. The journey continues, beckoning

future researchers to build upon these insights and

propel the field toward new horizons.

References:

[1] Dušanka, D., Darko S., Srdjan, S., Marko, A.,

Teodora, L., “A Comparison of Contemporary

Data Mining Tools”, XVII International

Scientific Conference on Industrial Systems

(IS'17), Novi Sad, Serbia.

[2] Moghimipour, I., Ebrahimpour, M.,

“Comparing Decision Tree Method Over

Three Data Mining Software,” Int. J. Stat.

Probab., vol. 3, no. 3, pp. 147–156, 2014, doi:

10.5539/ijsp.v3n3p147.

[3] Naik A., Samant, L., “Correlation Review of

Classification Algorithm Using Data Mining

Tool: WEKA, Rapidminer, Tanagra, Orange

and Knime,” Procedia Comput. Sci., vol. 85,

pp. 662–668, Jan. 2016, doi:

10.1016/J.PROCS.2016.05.251.

[4] Hall, M., Frank, E., Holmes, G., Pfahringer,

B., Reutemann, P., Witten, I. H., “The WEKA

data mining software,” ACM SIGKDD

Explor. Newsl., vol. 11, no. 1, pp. 10–18,

2009, doi: 10.1145/1656274.1656278.

[5] Berthold M. R., “KNIME: The konstanz

information miner,” 4th Int. Ind. Simul. Conf.

2006, ISC 2006, vol. 11, no. 1, pp. 58–61,

2006, doi: 10.1145/1656274.1656280.

[6] Afifi, M. A., Ghazal, T. M., Afifi, M. A. M. ,

Kalra, D., “Data Mining and Exploration: A

Comparison Study among Data Mining

Techniques on Iris Data Set Linux Desktop

View project E-GOVERNANCE View

project Data Mining and Exploration: A

Comparison Study among Data Mining

Techniques on Iris Data Set,” Talent Dev.

Excell., vol. 12, no. 1, pp. 3854-3861, 2020.

[7] Duan, J., Wang, G., Hu, X., Xia, D., Wu, D.,

Mining Multigranularity Decision Rules of

Concept Cognition for Knowledge Graphs

Based On Three-Way Decision, Information

Processing & Management, Vol. 60, Issue

4, July 2023, 103365.

[8] Yi

ği

t, S., Turgay, S., Cebeci

, Ç., Kara, E.S.,

Time-Stratified Analysis of Electricity

Consumption: A Regression and Neural

Network Approach in the Context of Turkey",

WSEAS Transactions on Power Systems, vol.

19, pp. 96-104, 2024,

doi:10.37394/232016.2024.19.12.

[9] Kayali, S., Turgay, S., Predictive Analytics

for Stock and Demand Balance Using Deep

Q-Learning Algorithm. Data and Knowledge

Engineering, (2023) Vol. 1: 1-10, doi:

10.23977/datake.2023.010101.

[10] Towell, G. G., Shavlik, J. W., Noordeweir, M.

O., “Refinement of Approximate Domain

Theories by Knowledge-Based Neural

Networks,” Proc. Eighth Natl. Conf. Artif.

Intell., pp. 861–866, 1990, [Online].

https://www.aaai.org/Library/AAAI/1990/aaai

90-129.php (Accessed Date: May 2, 2024).

[11] Borges, L. C., Marques, V. M., Bernardino, J.,

“Comparison of data mining techniques and

tools for data classification,” ACM Int. Conf.

Proceeding, Ser., no. October 2014, pp. 113–

116, 2013, doi: 10.1145/2494444.2494451.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

103

Volume 23, 2024

[12] Charbuty, B., Abdulazeez, A., “Classification

Based on Decision Tree Algorithm for

Machine Learning,” J. Appl. Sci. Technol.

Trends, vol. 2, no. 01, pp. 20–28, 2021, doi:

10.38094/jastt20165.

[13] Jin, C., Li, F., Ma, S., Wang, Y., Sampling

Scheme-Based Classification Rule Mining

Method Using Decision Tree İn Big Data

Environment, Knowledge-Based Systems, Vol.

244, 23 May 2022, 108522.

[14] Shoo, T.R., Patra, Vipsita, S., Decision Tree

Classifier Based on Topological

Characteristics of Subgraph for The Mining of

Protein Complexes from Large Scale PPI

Networks, Computational Biology and

Chemistry, Vol. 106, October 2023, 107935

[15] Munoz-Rodriguez, J.M.P., Alonso, Pessoa, T.,

Martin-Lucas, J., Identity Profile Of Young

People Experiencing A Sense Of Risk On The

Internet: A Data Mining Application Of

Decision Tree With Chaid Algorithm,

Computers & Education, Vol. 197, May 2023,

104743.

[16] Reddy, R., Girija, S.P., Venkatramulu, S.,

Dorthi, K., Rao, V.C.S.V., A Gradient

Boosted Decision Tree with Binary Spotted

Hyena Optimizer for Cardiovascular Disease

Detection and Classification, Healthcare

Analytics, Vol. 3, November 2023, 100173

[17] Rahman, R.M., Hasan, F.R., Using And

Comparing Different Decision Tree

Classification Techniques for Mining Icddr,B

Hospital Surveillance Data, Expert Systems

with Applications, Vol. 38, Issue 9, September

2011, pp.11421-11436

[18] Naik, A., Samant, L., Correlation Review of

Classification Algorithm Using Data Mining

Tool: WEKA, Rapidminer, Tanagra, Orange

and Knime, Procedia Computer Science,

Vol. 85, 2016, pp.662-668.

[19] Macuacua, J.C., Centeno, J.A.S., Amisse, C.,

Data Mining Approach for Dry Bean Seeds

Classification, Smart Agricultural

Technology, Vol. 5, October 2023, 100240.

[20] Jurczuk, K., Czajkowski, M., Kretowski, M.,

Adaptive in-memory representation of

decision trees for GPU-accelerated

evolutionary induction, Future Generation

Computer Systems, Vol. 153, April 2024,

pp.419-430.

[21] Koulinas, G., Paraschos, P., Koulouriotis, D.,

A Decision Trees-based knowledge mining

approach for controlling a complex

production system, Procedia Manufacturing,

Vol. 51, 2020, pp.1439-1445.

[22] Manzella, F., Pagliarini, G., Sciavicco, G.,

Stan, I.E., The voice of COVID-19: Breath

and cough recording classification with

temporal decision trees and random forests,

Artificial Intelligence in Medicine, Vol.

137, March 2023, 102486.

[23] Ramakrishnan, J., Liu, T., Zhang, F.,

Seshadri, K., Yu, R., Gou, Z., A decision tree-

based modeling approach for evaluating the

green performance of airport buildings,

Environmental Impact Assessment Review,

Vol. 100, May 2023, 107070.

[24] Ghiasi, M.M., Zendehboudi, S., Application

of decision tree-based ensemble learning in

the classification of breast cancer, Computers

in Biology and Medicine, Vol. 128, January

2021, 104089.

[25] Ghane, M., Ang, M.C., Nilashi, M.,

Sorooshian, S., Enhanced decision tree

induction using evolutionary techniques for

Parkinson's disease classification,

Biocybernetics and Biomedical Engineering,

Vol. 42, Issue 3, July–September 2022,

pp.902-920.

[26] Mariano, A.M., Ferreira, A.M.L., Santos,

M.R., Castilho, M. L., Bastos, A.C.F.L.C.,

Decision trees for predicting dropout in

Engineering Course students in Brazil,

Procedia Computer Science, Volume

214, 2022, pp.1113-1120.

[27] Hamdi, M., Hilali-Jaghdam, I., Elnaim, B.E.,

Elhag, A.A., Forecasting and classification of

new cases of COVID 19 before vaccination

using decision trees and Gaussian mixture

model, Alexandria Engineering Journal, Vol.

62, January 2023, pp.327-333.

[28] Martinez-Rojas, A., Jimenez-Ramirez, A.,

Enriquez, J.G., Reijers, H.A., A screenshot-

based task mining framework for disclosing

the drivers behind variable human actions,

Information Systems, Vol. 121, March 2024,

102340.

[29] Fa, H., Shuai, B., Yang, Z., Niu, Y., Huang,

W., Mining the accident causes of railway

dangerous goods transportation: A Logistics-

DT-TFP based approach, Accident Analysis &

Prevention, Vol. 195, February 2024, 107421.

[30] Naik, D.A., Burunda, C.J., Seea, S.D., A

Feasible Dashboard to predict Patent Mining

Using Classification Algorithms, Procedia

Computer Science, Vol. 167, 2020, pp.2011-

2021.

[31] Varra, M.O., Husakova, L., Patocka, J.,

Ghidini, S., Zanard,, E., Classification of

Transformed Anchovy Products based on the

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

104

Volume 23, 2024

Use of Element Patterns and Decision Trees

to Assess Traceability and Country of Origin

Labelling, Food Chemistry, Vol. 360, 30

October 2021, 129790

[32] Ganti, P.K., Naik, H., Barada, M.K.,

Environmental impact Analysis and

Enhancement of Factors Affecting the

Photovoltaic (PV) Energy Utilization in

Mining Industry by Sparrow Search

Optimization Based Gradient Boosting

Decision Tree Approach, Energy, Vol. 244,

Part A, 1 April 2022, 122561

[33] Rutkowski, L., Jaworski, M., PiPietruczuk, L.,

Duda, P., The CART Decision Tree for

Mining Data Streams, Information Sciences,

Volume 266, 10 May 2014, pp.1-15.

[34] Quash, Y., Kross, A., Jaeger, J.A., Assessing

the impact of Gold Mining on Forest Cover in

the Surinamese Amazon from 1997 to 2019:

A Semi-Automated Satellite-Based Approach,

Ecological Informatics, Vol. 80, May 2024,

102442.

[35] Dash, C.S:K., Behera, A.K., Dehuri, S.,

Ghosh, A., An Outliers Detection and

Elimination Framework in Classification Task

of Data Mining, Decision Analytics Journal,

Vol. 6, March 2023, 100164

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

- E.Akkaya, S.Turgay – investigation,

- E.Akkaya- validation and

- S.Turgay writing & editing.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.9

Elif Akkaya, Safiye Turgay

E-ISSN: 2224-2872

105

Volume 23, 2024