Improved Naive Bayes Classification for Joint Investment Plan
MUFDA JAMEEL ALRAWASHDEH
Department of Mathematics
Qassim University, Buraidah,
SAUDI ARABIA
Abstract: - Large scale investments are mostly done by joint investors in different countries. Most of these
investments involve collaboration with financial institutes of different countries. As the aspiration of
governments to development their countries, they encourage investments. Financial institutes, at the same time,
will set a guideline to decide with whom they will share the investment and collaborate based on profit
maximization target. In this paper we are considering individual investors to collaborate with the financial
institutes. Naïve Bayes is an ideal approach to aid the approval or rejection of this collaboration by the decision
maker. The approach assumes independencies among the variables. However, this assumption may not always
be realistic. Hence, this paper uses a method to improve the accuracy of Naïve Bayes approach by using a
learning structure of feature variables in the model and apply it to joint investment plan applications. The
introduction and use of new applied problem is not only helpful to show the application of the field but also
attract researchers from social science to apply and use Bayes based methods which in turn contribute the
development of the field with new insights.
Key-Words: - Joint Investment; Classification; Naïve Bayes Classifier; Structure Learning.
Received: March 27, 2021. Revised: December 19, 2021. Accepted: January 14, 2021. Published: February 8, 2022.
1 Introduction
With globalization in trades and investment,
investors become interested in investing in different
countries based on the willingness of governments
and the collaboration of financial institutes to
support the investment. For a country, the
investment from abroad or within the country will
bring huge advantages including job opportunities
and development of the country. Investment
destination countries, which are favorite for
investors due to their investment rules and
resources, may attract many joint investment
proposals and hence decision making on approving
or rejecting the proposal falls as a decision making
problem by the financial institutes. Due to the
uncertainty involved statistical approaches coined
with artificial intelligence methods become useful in
this regard. Data mining is the application of
machine learning method to large databases [3]. A
large volume of data is extracted to construct a
simple model with a meaningful use, such as to
provide high prediction accuracy by identifying
patterns, anomalies and correlations. This
application had been used in many fields such as in
finance [13], manufacturing [10], medicine [15] and
telecommunication [19]. Data mining can basically
be classified into two types, which are predictive
and descriptive models. Data mining builds models
from data, using tools that vary both by the type of
model built and by the type of algorithm used. In
this study, predictive model is used to analyze the
data. A model that predicts the value of a particular
attribute and relates to class membership is called
classification model or simply a classifier [9].
Classification is a data mining technique that
assigns items in a collection to target categories or
classes. The use of classification allows a good
prediction of the target class for each given case in
the data. It can, for example, be used to identify
different level of classes of list of investment
proposals as high profit, medium and low profit in
our context. Classification is a task that can be used
to identify the class labels for instances based on a
set of features or attributes. A classification task
usually begins with a data set in which the class
assignments are known. For instance, a
classification model that is used to predict
investment risk can be developed based on observed
data for many investment applicants over a time
period. In addition to the past investment rating, the
data might pursue company history, property
ownership, years of existence, number and type of
existing and past investments, and so on. The rating
would be the target, the other attributes would be the
predictors, and the data for each company would
constitute a case. The properties of classification are
discrete and do not imply order. A numerical target
would be indicated from the continuous, floating-
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.6
Mufda Jameel Alrawashdeh
E-ISSN: 2224-2880
37
Volume 21, 2022
point values. The simplest type of classification
problem is binary classification. In a binary
classification, the target attribute has only two
possible outcomes, that is, approved or not
approved. Multiclass targets have more than two
outcomes: for example, low, medium, high, or
unknown rating [9]. The Bayesian Network is a
representation of probabilistic knowledge. It is also
known as Bayes Network or directed acyclic
graphical model. Bayesian network is a probabilistic
graphical model that is used to represent a set of
random variables and their conditional dependencies
via a directed acyclic graph. A Bayesian network
can be denoted by N(G, P). G is a directed acyclic
graph, whose nodes are random variables X = {X1,
X2, ..., Xn} and P is the set of conditional
probability tables associated with each Xi [7, 12]. A
special case of the Bayesian Network is denoted
Naïve Bayes. Due to the simplicity of Naïve Bayes
structure, it is easy to implement, and is thus
appealing, provided that it gives good results.
Simplicity and speed make Naïve Bayes an ideal
exploratory tool. It provides a flexible way for
dealing with quantitative and discrete data, besides
allowing any number of attributes or classes.
Compare with other predictive techniques such as
decision tree and neural network, Naïve Bayes is an
asymptotically fast and space efficient tool that
examines all its input [26]. Furthermore, small
amount of bad data do not perturb the result
significantly. It is not sensitive to irrelevant features.
This could be good when handling many dumb
variables [22]. Naïve Bayes has become a popular
tool in data mining and other applications due to its
simple, fast and easy nature for implementation
[27].
Due to frequent uncertainty in the real world
data set, misclassification patterns from the input
samples generally restricted the utility of the
classifier built [11]. The main problem confronting
naive Bayes is its attribute conditional independence
assumption. Therefore, learning the structural of the
attributes in Naïve Bayes relationship is
unavoidable. In this paper, study the
interrelationship of the variables and construct a
model based on the relationship that we observed
from a hill-climbing algorithm. Hill-climbing (HC)
is one of the available score-based learning
algorithms by using directed arcs to explicitly
represent variable dependencies [8]. The discussed
method then will be used in the joint investment
approval application. Investment is an important
part of trade in a country’s economy.
Using mathematical, statistical and computational
methods produce a viable result for decision makers
in the sector. The introduction of these approaches
to solve problems in economics, trade and
negotiation produce a new avenue of applications.
This paper focuses on using statistical tools in
proposing to solve an investment problem which
can be further studied and used in real problems.
2 Background
International Investment is one of the investment
strategies in which an investor diversifies his
portfolio by purchasing various financial
Instruments like shares, mutual funds, etc. or
investing to acquire ownership or collaboration in
different companies across the globe in order to
maximize the return. In some countries, state owned
industries may open up for international investors
for collaborative investment with the government.
In other cases financial institutes will try to attract
investors local as well as international for joint
investment with the aim of maximizing their profit.
With many proposals for this joint investment
options, they need to select the promising
collaboration based on the specification and
experience of the investor partner with the aim of
maximizing the expected return. Upon the call from
the financial institute or under a normal application
process, a bank or a financial institute will list and
classify application based on some classification of
each application by their details as investment
capacity, years of existence in the business average
education level of the management and so on.
Conditional probabilities of scenarios will then be
studied to approve or select best investment options
with promising return. Bayesian network is an ideal
approach for this kind of applications based on
conditional probabilities. It has been used in
different applications including in finance [20],
medical applications [18], transportation industry
[21], energy sector [5], water studies [12] and
agriculture [6]. It is found to be a competitive option
when compared to neural network [24], credit
scoring model [14] and principal component
analysis [25]. With uncertainty involved in financial
institutes, the use of Bayesian networks becomes
vital in financial applications. Naïve Bayes classifier
is widely used in the finance market, such as
bankruptcy prediction [1], credit risk assessment [4],
business failure prediction [16] and so on.
This research, on the other hand, studies the
application in joint investment approval. It further
attempts and show that a learning in between
independent nodes as assumed by Naïve Bayes
classifier approach produce a better result.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.6
Mufda Jameel Alrawashdeh
E-ISSN: 2224-2880
38
Volume 21, 2022
3 The Proposed Approach
3.1 Data specification
This study use a score based structural learning
algorithm in Bayesian Network to explore the
dependencies among the variables. Based on this, it
is aimed to show that with the structure learned
among the attributes or variables in classification
data set we can have higher classification accuracy
than the Naïve Bayes classifier. We test the
improvement in the classification of the data that we
calculate the 10-fold cross validation. A data for this
study was synthesized based on an initial set of data
obtained from a financial firm (as given in Table (1)
and (2)). It is highlighted that a more complete data
can be used to test the proposed approach further as
a future work.
3.2 Design of Naïve Bayes Model
It is a classification technique based on Bayes’
Theorem with an assumption of independence
among predictors. Bayes’ Theorem finds the
probability of an event occurring given the
probability of another event that has already
occurred. Bayes’ theorem is stated mathematically
as the following equation:
where A and B are events and P(B) = 0. Basically,
we are trying to find probability of event A, given
the event B is true. Event B is also termed as
evidence. P(A) is the priori of A (the prior
probability, i.e. Probability of event before evidence
is seen). The evidence is an attribute value of an
unknown instance (here, it is event B). P(A|B) is a
posteriori probability of B, i.e. probability of event
after evidence is seen. In simple terms, a Naïve
Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the
presence of any other feature. In classification
learning, each instance in described by a vector of
attributes values. Training data is a set of instances
providing the known class.
Then, we can predict the class of the test instance
based on evidence provided from the training data.
From Fig (1), note that X1, X2, ..,Xn are
independent of each other. For Naïve Bayes, we
assume that n = 13 and all the 13 variables of our
data are independent of each other. Since Naïve
Bayes ignores the dependency among the variables,
this affects the accuracy of Naïve Bayesian when
the dependency among the variables is strong.
Hence, we use the structural learning for the data to
study the dependency among the feature variables.
3.2.1 Training
Let X be a training set of n records, C be the
corresponding class and C = (c1, c2, . . . , cm) where
m is the number of possibilities in class C be
represented as (X,C) where X =(x1, x2, . . . , xn).
Bayes classifier is a hybrid parameter probability
model in essence:
Table 1. Information and Variables of the Data
Table 2. Description of the data set (based on Table
(1))
where P(C) is prior probability of the appearing
probability of class C, P(X) is the probability from
the observations, P(X|C) is the distribution
probability of X in classes space. Assume the
components xi of X are independent of each other.
Since P(X|C) conditional probability cannot be
computed directly in practice. Thus,
3.2.2 Prediction
We make the prediction based on the evidence
provided from the training data, calculating the
posterior probability which is the final computed
beliefs after the evidences have been propagated
through the Bayes network. The posterior
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.6
Mufda Jameel Alrawashdeh
E-ISSN: 2224-2880
39
Volume 21, 2022
probability, P(C|X) is calculated as given in Eq. (4) (
[28]):
3.3 Design of Proposed Method
The proposed method has added relationship by
using Bayesian network structural learning. The task
of structure learning for Bayesian networks refers to
learning the structure of the directed acyclic graph
(DAG) from data. In other words, it determines
which variables influence each other, establishing
the dependency of variables (through arcs).
Learning Bayesian network is an NP-hard problem
and hence heuristic search is taken as an
advantageous approach. Any heuristic based
approach can be used for this purpose, including
genetic algorithm, particle swarm optimization and
so on. However, the hill climbing method is also
found to be computationally effective and easier in
terms of implementation.
It is also one the popular approach in this regard [8].
Hence, in this study, we use the Hill-Climbing (HC)
algorithm to induce search on the space of the
possible network. By using bnlearn package in R
language to help implement HC algorithm [23], it
can be determined whether there are relationships
among the variables. If there are some relationships,
then arcs will be added to show the dependencies as
shown in Fig. (2). From the Fig. (2), observed that
there are few additional arcs between nodes X1&X3;
nodes X2&Xn and nodes X3&Xn. This shows that the
presence of dependencies among these variables.
3.4 Cross-validation
A popular way to evaluate and compare models is
on their ability to make predictions for “out-of-
sample data”, that is, future or unseen observations,
using what we learned from the observed data. We
can use cross-validation to test which of the models
under consideration is able to learn the most from
our data in order to make the better predictions.
Hence, the last step of our analysis is the cross-
validation. Cross-validation involves using a subset
of a partition of the data as a test data set. The
remaining data are used to learn or train a model and
the test data set is used to validate the model [17]. In
order to get a stable estimate, we use 10-folds cross-
validation. 10-fold cross validation performs the
fitting procedure a total of ten times, with each fit
being performed on a training set consisting of 90%
of the total training set selected at random, with the
remaining 10% used as a hold out set for validation.
To do so, the data is first segmented into 10 equally
(or almost equally) sized segments or folds. Here,
the data set is repeatedly partitioned into two non-
overlapping parts; a training set and a hold-out set.
For each partitioning, the one fold of hold-out set is
used for testing, while the remainder 9 folds are
used for training as shown in the Fig. (3). Each
subset was used once in the test set and nine times
in the training set [2]. In the process of cross-
validation, one fold of data is trained and tested for
each partition and the accuracies averaged. Errors
obtained from all the partitions are averaged. So, we
can compare these two methods in term of the error
estimates.
Fig. 1: An Illustration of the Naïve Bayes network
structure
Fig. 2: A Naïve Bayes network structure with added
relationships after structural learning
4 Results and Discussion
4.1 Naïve Bayes model
The structure of the Naïve Bayes network is shown
in Fig. (4). From the graph, we can clearly see that
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.6
Mufda Jameel Alrawashdeh
E-ISSN: 2224-2880
40
Volume 21, 2022
class node directly points to the 13 attributes. These
13 variables are conditionally independent. There is
no relationship among the 13 variables. Naïve Bayes
network has the strong assumption that each of the
variables is not affected by another given the
approval of investment is known. Any changes in
each variable do not affect other parameters.
4.1.1 Performance Metric of the Naïve Bayes
Considering a binary classification problem with the
2 outcomes that is ‘Approved’ represented by ‘Yes’
and ‘Reject’ represented by ‘No’, a 2x2 confusion
or error matrix is obtained from the framework, as
given in Table (3). There are total of 111 instances
in the matrix. Proportions that are properly
classified is = 0.8108. This indicates that
the accuracy for the performance of the Naïve Bayes
is × 100% = 81.08%.
4.2 Structural Learning Model (The
Proposed Method)
From Fig. (5), we can observe that there is no
undirected arc shown in this graph. There are more
arcs among variables compared with Fig. (4) after
relationship is added into the model. This implies
that there exists more than one relationship among
variables.
From the graph, we know that capital of the
financial institute is influenced by the type of
investment and the investor’s debt. Besides,
relationship between investors and financial
institutes is easily influenced by the investment
amount they have proposed. Size of the company is
dependent on 3 reasons that are capital provided,
income of the applicant and income tax.
4.2.1 Performance Metric of the Proposed
Method
From Table (4), the performance of the proposed
method can be seen for the given 111 instances. For
the Naïve Bayes with added relationship, the
proportion that is properly classified is
= 0.8919.
This indicates that the accuracy for the performance
of the Naïve Bayes is 100% = 89.19%.
Compared with the accuracy percentage in Table
(3), there is an improvement of 8.11% for the
performance of Naïve Bayes method.
4.3 Cross-Validation for Naïve Bayes and
Proposed Method
Cross-validation used here is one the validation
techniques, where it partitions the data into 10
stratified folds, performing the testing on one fold,
while training on the other nine. This process is
repeated 10 times and finally the average accuracy
across the ten runs will be obtained. The proportions
and percentage of both methods are compared in
Table (5).
4.4 Discussion
In the study, a comparison between the Naïve Bayes
and the proposed method-an improved Naïve Bayes
was done. We analyze the data by using Naïve
Bayes model and obtain the classification accuracy
based on its assumption that the feature variables are
independent of one another. However, in most real
cases, this assumption is not realistic. Therefore, a
hill-climbing algorithm is used to study the
interrelationship among the variables, with that we
establish an improved Naïve Bayes model. We
improved the Naïve Bayes by learning the structure
and the relationship among the feature variables.
Using the bnlearn package
Fig. 3: An Illustration of the 10 fold cross-validation
Table 3. Confusion matrix table of the prediction by
using Naïve Bayes model
Table 4. The matrix table of the prediction by using
Naïve Bayes with added relationship
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.6
Mufda Jameel Alrawashdeh
E-ISSN: 2224-2880
41
Volume 21, 2022
Fig. 4: Naïve Bayes Structure for the 13 variables
Fig. 5: The Naïve Bayes network structure with
added relationship after structural learning using the
hill-climbing algorithm on the features from the data
set in R-language, we run the analysis for both
Naïve Bayes and the improved model.
Table 5. Cross-validation tables for Naïve Bayes
and Naïve Bayes with added relationship
Table 6. Comparison of the accuracy percentage
between Naïve Bayes and the proposed method
A set of data consisting of 111 samples with 13
variables is used for demonstrating the proposed
method. Continuous data has been discretized. This
process is usually carried out as a first step towards
making them suitable for numerical evaluation and
implementation on digital computers. The results
show that, the improved Naïve Bayes is better than
Naïve Bayes in terms of classification accuracy. For
Naïve Bayes, the classification accuracy is 81.08%.
However, the classification accuracy is higher for
the improved Naïve Bayes which is 89.19%.
Furthermore, we run a 10-fold cross-validation for
Naïve Bayes model and improved Naïve Bayes
model. Hence, the results how that the improved
Naïve Bayes performs better when compared to the
Naïve Bayes method.
5 Conclusion and Future works
Naïve Bayes is a very useful method applied in
different applications including investment approval
by financial institutes. Since it showed a good
performance for bank prediction.
It can also be applied and used for similar problems.
A better prediction system may help governments
and financial institutes to make decision. Its
application is not limited to financial related issues
but can be expanded and used in different fields
where uncertainty exists. In this study a partially
synthesized data is used to demonstrate the
proposed approach. The result shows that finding
interdependency of variables which normally are
assumed to be independent in the Naïve Bayes
method, actually produce a better accuracy.
As a future work, it is aimed to use a complete and
more complex data for testing the proposed
approach. Competition within multiple financial
institutes is another future direction to consider as
investors may apply to multiple collaborative
investments and decide based on the outcome of
their applications. Hence, game theoretic model can
also be coupled with the Bayesian prediction
proposed in this research. Applications in other
domain of problems are also another future direction
to explore more.
References:
[1] Aghaie A., Saeedi A., Using Bayesian Networks for
Bankruptcy Prediction: Empirical Evidence from
Iranian Companies, Proceeding of the 2009
International Conference on Information
Management and Engineering, Kuala Lumpur,
Maaysia, 3 - 5 April 2009, 450455.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.6
Mufda Jameel Alrawashdeh
E-ISSN: 2224-2880
42
Volume 21, 2022
[2] Airola A., Pahikkala, T., Waegeman W., Baets, B. D.,
Salakoski, T., An experimental comparison of cross
validation techniques for estimating the area under
the ROC curve, J. Comput. Stat. & Data Anal. 55, 4,
2011, 18281844.
[3] Alpaydin E., Introduction to machine learning, MIT
Press, USA, 2004, 5455.
[4] Antonakis, A. C., Sfakianakis M. E., A_ ssessing
Na¨ıve Bayes as a Method for Screening Credit
Applicants, J. App. Stat. 36, 5, 2009, 537545.
[5] Cinar, D., Kayakutlu G., Scenario analysis using
Bayesian networks: A case study in energy sector,
Knowledge-Based Systems, 23, 3, 2010, 267276.
[6] Drury B., Valverde-Rebaza J., Moura M. F., de
Andrade Lopes, A. A survey of the applications of
Bayesian networks in agriculture, Eng. App. Artif.
Intell., 65, 2017, 2942.
[7] Friedman N., Geiger D., Goldszmidt M.,Bayesian
network classifiers, Mach. learn. 29, 2, 1997, 131
163.
[8] Gamez J. A., Mateo J. L., Puerta J. M., Learning
Bayesian networks by hill climbing: efficient
methods based on progressive restriction of the
neighborhood, Data Min. Knowl. Discov., 22, 1,
2011, 106148.
[9] Hand D. J., Principles of data mining, Drug safety, 30,
7, 2007, 621622.
[10] Harding J. A., Shahbaz M., Kusiak A., Data mining
in manufacturing: a review, J. Manuf. Sci. Eng., 128,
4, 2006, 969976.
[11] Hsieh N.C., Hung L. P., A data driven ensemble
classifier for credit scoring analysis, J. Exp. Sys.
Appl.: An Int. J. 37, 1, 2010, 534545.
[12] Henriksen H. J., Rasmussen, P., Brandt G., Von
Buelow D., and Jensen F. V., Public participation
modelling using Bayesian networks in management
of groundwater contamination, Env. Model. and
Soft., 22, 8, 2007, 1101-1113.
[13] Kadam S., Raval, M., Data mining in finance, Int. J.
Eng. Trends Technol 16, 2014, 377381.
[14] Koh H. C., Tan W. C., Goh C. P., A Two-step
Method to Construct Credit Scoring Models with
Data Mining Techniques, Int. J. Bus. Inf., 1, 1, 2006,
96118.
[15] Lavrac N., Zupan B., Data mining in medicine. In
Data Mining and Knowledge Discovery Handbook-
Springer, Boston, USA, 2005, 11071137.
[16] Li H., Sun, J., Wu J., Predicting business failure
using classification and regression tree: An empirical
comparison with popular classical statistical methods
and top classification mining methods, Exp. Sys.
with Appl., 37, 8, 2010, 58955904.
[17] Maunder M. N., Harley S. J., Using cross validation
model selection to determine the shape of
nonparametric selectivity curves in fisheries stock
assessment models, Fisheries Res., 110, 2, 2011,
283288.
[18] McLachlan S., Dube K., Hitman G. A., Fenton N. E.,
Kyrimi E., Bayesian networks in healthcare:
Distribution by medical condition, Artif. Intell. Med.,
107, 2020, 17.
[19] Nadaf M., Kadam V., Data mining in
telecommunication, Int. J. Adv. Comput. Theory
Eng., 2, 2013, 926.
[20] Neil M., Fenton, N. Using Bayesian networks to
model the operational risk to information technology
infrastructure in financial institutions, J. of Financ.
Transform., 22, 2008, 131138.
[21] Ni D., Leonard J. D.,Markov chain Monte Carlo
multiple imputation using Bayesian networks for
incomplete intelligent transportation systems data,
Transp. Res. Rec., 1935, 1, 2005, 5767.
[22] Ratanamahatana C. A., Gunopulos D., Scaling up
the Naive Bayesian Classifier: Using Decision Trees
for Feature Selection, Proceedings of Workshop on
Data Cleaning and Preprocessing: in ICDM’02,
Maebashi, Japan, December 9 - 12, 2002.
[23] Scutari M. (2021). Bnlearn - An R Package for
Bayesian Network Learning and Inference, Available
online: (accessed on 9th July 2021).
[24] Shorouq F. E., Saad G. Y., Applying Neural
Networks for Loan Decisions in the Jordanian
Commercial Banking System, J. Comput. Sci. Net.
Secur. 10, 1, 2010, 209214.
[25] Sustersic M., Mramor D., Zupanm J., Consumer
credit scoring models with limited data, J. Exp. Sys.,
with Appl., 36, 3,2009, 47364744.
[26] Xhemali D., Hinde C. J., Stone R. G., Na¨ıve bayes
vs. decision trees vs. neural networks in the
classification of training web page. Inter. J. Comput.
Sci., 4, 1, 2009, 1623.
[27] Zhang C. L., Gui R. X., Yu Y., Zh, H. Y., Web-
Scale Classification with Naive Bayes, Proceedings
of the 18th International Conference on World Wide
Web, Madrid, Spain, 20-24 April 2009, 10831084.
[28] Zhang W., Gao F., An Improvement to Naive Bayes
for Text Classification, Procedia Engineering, 15,
2001, 21602164.
Conflict of Interest:
The author states no conflict of interest.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
Authors state no funding involved.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.6
Mufda Jameel Alrawashdeh
E-ISSN: 2224-2880
43
Volume 21, 2022