(CW) of a corporate entity or individual

is determined by using different credit scoring models. Hence,

a high credit score results in a high creditworthiness, this

score is determined on the basis of the wide customer database

created generally by banks over the years [1]. It is not possible

that all customers will act the same way when it comes to

financial performance, therefore, banks need to know their

good or bad customers, and they will need credit scoring (CS)

system to do so [2]. Article [3], defined CS as the means of

analyzing the likelihood of applicant to falter in their

repayments, or not in order to avoid financial losses. It is

important to collect information from bank customers and

other financial institutions to manage the credit risks, and at

the same time, to reach an important decision to lend some

money to their clients or not. In other words, this process can

help to separate good borrowers from bad ones. This means

that some borrowers have clean and good records; therefore,

banks can classify them as “good borrowers”. A few others,

not having such good records, can be considered as “bad

borrowers”. It is worth noting that such simple selection

process may not guarantee a correct classification. Hence, new

accurate automated systems reducing the prediction errors are

urgently needed in order to handle large and complex CS

datasets [4]. To deal with this challenge, IT systems have

become very popular among scientists and institutions in the

last several years. Over the past decades, several scientific

studies have attempted to assess the credit scoring potential of

bank customers using different predictive models [5]. A large

number of data mining (DM) and machine learning (ML)

techniques have been used for this purpose, including, support

vector machines (SVMs), neural network (NNs), decision

trees (DTs), logistic regression, fuzzy systems, etc. Each of

these studies analyzed different data sets to show the

effectiveness of their methods. In general, finding a

relationship between low and high credit risks is one of the

most popular research areas in the field of financial

forecasting, consisting of developing new predictive systems.

As to the main contribution and novelty of this work, we

introduce a new method for predicting financial distress

related to credit applicants called splitting the learning set into

two regions (SLS2Rs). The goal of our proposed method

consists on the construction of two regions from a learning

set, the first named "Solvency Region" that contains the

feature vectors of the elements, which are settled their credits

in term and the second one named "Non-Solvency Region",

which contains the feature vectors of the elements who failed

in the payment of their credits. Therefore, to predict the risk of

a customer default, it is enough to know which of the both

regions include his feature vectors; if it doesn’t correspond to

any region, the credit decision making requires so more

analysis. To evaluate the performance and to demonstrate the

effectiveness of our method, a series of experimental tests and

Creditworthiness

1. Introduction

A new machine learning method for bank credit risk analysis

ZAYNAB HJOUJI, MOHAMED M’HAMDI

Sidi Mohamed Ben Abdellah University, BP 42, Fez 30000

MOROCCO

Abstract—We present in this article a new approach to predict the creditworthiness of borrowers that we call

“Method of separating the learning set into two regions”. The goal of this approach is to build two regions from

a training set. Thus, to predict the solvency of borrowers, it suffices to identify which of the two regions has its

characteristic vectors; if it does not correspond to any region, credit decision-making requires further analysis.

To test our approach, a large set of real and recent credit data obtained from the UCI repository is used, we

trained also on a real credit database of a Moroccan bank and the creditworthiness of borrowers is analyzed at

using two performance measurement indicators such as classification accuracy and AUC of the ROC curve as

a robustness measurement criterion. The proposed model was compared with three traditional machine learning

algorithms: LR, RBF-NN and MLP-NN. The experimental results show the superiority of the proposed

approach.

Keywords— Artificial Intelligence, Systems Theory, Machine Learning, Credit risk prediction.

Received: June 25, 2021. Revised: March 19, 2022. Accepted: April 23, 2022. Published: May 20, 2022.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

108

Volume 21, 2022

a comparative study are applied. The obtained results suggest

that the proposed methodology is very promising in the bank

credit risk prediction field and it could be applied to any other

CS dataset as well.

Credit scoring (CS) model have been developed by banks and

researchers to improve the process of assessing credit

worthiness during the credit evaluation process. In this

section, we will review some widely used techniques for

predictive credit scoring applied in detecting credit worthiness

borrowers in order to create a baseline for the selection of an

appropriate tool for developing a banking creditworthiness

prediction models, note that this study only reviews the most

commonly used techniques as it would be almost impossible

to look at all techniques applied in credit scoring. Typically,

the existing literature surveys on creditworthiness borrowers

prediction or credit scoring models shows that most of these

models are either statistical [6] or artificial intelligence (AI)

[7] based methods.

A credit scoring solution can be built using Metrological

statistics or statistical models, including; Multiple

Discriminant Analysis (MDA)[8], Logistic Regression (LR)

[8]-[9], Bayesian approach[10]-[11], Probit analysis[12],

Multiple regression and more others. These models have been

proven to be quite effective, however, for solving relatively

less complex problems in prediction credit risk fields. Some of

these techniques are widely applied for prediction and

diagnosis in the banking credit risk assessment literature,

notably; Multiple Discriminant Analysis and Logistic

Regression tools[13]. MDA instrument was initially applied

by[14] to analyze the financial distress, bankruptcy and

default risks. However, the use of this method has frequently

been criticized because of its assumption of the categorical

nature of credit data and the fact that the covariance matrices

of good and bad credit are unexpected to be equal [15], [16].

In parallel with the MDA approach, LR instrument is

becoming a common alternative for making credit-scoring

models. Fundamentally, it was emerged as the better technique

of choice in anticipating dichotomous outcomes. It has been

concluded as one of the most appropriate techniques in the

credit risk assessment literature. Authors in article [17]

stressed that logistic regression algorithms perform best

among all statistical credit risk assessment algorithms. In this

context, several studies has shown the effectiveness of the

logistic regression approach versus the LDA approach in

detection of credit worthiness borrowers. As this model is

widely used, a large number of its application have been

reported in literature[18].

Against lot of statistical methods and in order to improve

prediction performance for detection banking (CW)

borrowers, artificial intelligence and soft computing

techniques have emerged. In fact, overall the main AI method

for prediction (CW) are Artificial Neural Networks (ANNs)

[19-20], Support Vector Machines (SVMs) [20]-[21], Fuzzy

Logic (Fuzzy) [21], Decision Tree(DT)[22], K-Nearest

Neighbor (K-NN)[23], Random Forests algorithms(RFs) [24],

Genetic Algorithm [25]-[26], and more others.

AI tools are computer-based techniques of which Artificial

Neural Network (ANN or NN) is the most common for

bankruptcy prediction simply because it have shown a greater

correctness of predictability than any others techniques in

(CW) models prediction or credit scoring models, due to its

associated memory characteristic and generalization

capability, flexibility, robustness, and higher classification

accuracy [27]. Many studies arbitrarily employed neural

networks algorithms for modelling credit risk compared to

others methods of (CW) prediction models [13], [28]. In their

study [29], compares Bayesian networks (NB) with Artificial

Neural Network (ANNs) algorithm based on back propagation

for predicting recovered value in a credit operation. They

finds that both the ANN and the NB models provide reliable

outcomes, but the ANN is more effective for predicting credit

risk with an average score of 82%. Further, Authors in article

[30], explore a new practical way based on the Neural

Networks that would help the banker to predict the non-

payment risk the companies asking for a loan. To evaluate the

performance of their technique, they compare it with those of

discriminant analysis, using a correlation test in a sample of

86 Tunisian companies and 15 financial ratios over the period

from 2005 to 2007. The results shows that the neural networks

techniques is more accurate in term of predictability. In the

same sphere of predicting CW , a research conducted in [31]

suggests an ensemble techniques bagging with neural network

for creditworthiness assessment. By using four measurement

criteria such as Accuracy, Specificity, Sensitivity and the

AUC of the ROC curve, the proposed model showed

promising results and outperforms other models for Bosnian

commercial bank dataset and feature selected datasets and also

for two real-world credit datasets German and Australian.

Authors demonstrate that the proposed model is empirically

proven to be suitable for further use in the assessment of the

creditworthiness of applicants.

In the same context, Lin et al [32] discussed in their work the

application of the classification function and artificial neural

networks such as (MLP) and (RBF) in identifying the risk

categories of the studied firms. The results showed that the

application of the artificial neural network and classification

function can effectively support the credit evaluation of

applicants. In their study, authors in [33] examined the credit

decision using logistic regression and neural network (RBF).

The results showed that the logistic regression model was

superior to the radial basis function (RBF) model in terms of

overall accuracy rate. However, the radial basis function was

better than the identification of likely defaulters. Recently, the

work of Yiping. G [34] present a credit risk assessment

algorithm based on BP neural network, and the simulation

results showed that, compared with the traditional LR

algorithm, the proposed model has higher classification

accuracy and can effectively reduce investors' risk.

2. Creditworthiness Banking Detection

Models: a Brief Review of Literature

2.1 Statistical Models for

Detection of Creditworthiness

2.2 Artificial Intelligence Models

for Detection of Creditworthiness

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

109

Volume 21, 2022

This model typically requires a quantity of data, which is

accumulated by the bank to form a larger learning set to

achieve performance gains through predictions; this set can be

divided into two categories. The learning set consists of the

bank's credit customers, which can be classified into two

groups with reference to the opinion of the bank's credit

manager: the set of successful customers is the category

containing all the cases that managed to repay their credit on

time i.e. they are considered as solvency customers; each

element of this category is denoted by 0 (Table 1 in blue) and

the set of unsuccessful customers is the set of elements that

failed to recover their credit i.e., they are considered as non-

solvency customers, each element of this set is denoted by 1

(Table 1 in grey).

Table 1.The repartition of the studied categories.

Classe1: The group, which contains S Solvency/successful

customers (matrix in gray). The centroid of this class is:

 













 s



(1)

Classe2: The group that contains non-Solvency / non-

successful customers (matrix in blue).. The centroid of this

class is:

 





















N

Si i



(2)

The worst element among successful customers is the element

, which is furthest from the centroid of this class (see

Fig.4). Therefore is defined as:

 

000 ,...,1),,(max),( rSiXCdXCd iw 

(3)

The best element among non-successful customers is the

element furthest from the centroid of this class. Therefore

is defined as:

 

111 ,...,1),,(max),( rNSiXCdXCd ib 

(4)

Where is the Euclidean distance defined by :





 P

iii yyYYd

2121 )(),(

(5)

We constitute the following two regions: the ball with center

and radius











 000000 ),(/),( rXCdRXrCR N

(6)

Fig.1.The region

),( 000 rCR

with center and radius

And the region with center and radius











 11111 ),(/),( rXCdRXrCR N

(7)

Fig.2. The region

),( 111 rCR

with center and radius .

1) The set of feature vectors of successful

customers is completely included in the region

),( 000 rCR

,i.e.

 

),(,,1, 000 rCRSiXi 

(8)

2) The set of feature vectors of non-

successful customers is completely included in the

region

),( 111 rCR

,i.e.

 

),(,,1, 111 rCRNSiXi 

(9)

Proof:

1) According to equation (6), to show that a vector

belongs to the region , it is enough to show that

3. Development of the Proposed Model

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

110

Volume 21, 2022

From the equation (3), is the distance that

maximizes the set of distances between and the feature

vectors of successful companies It’s means

that;

000 ),(),( rXCdXCd wi 

for all i = ,…,S (10)

Therefore,

 

),(,,1, 000 rCRSiXi 

2) In the same procedure of remark 1, according to

equation (4),

),( 1b

XCd

the distance that maximizes the

set of distances between

and the feature vectors of

non-successful clients

.,,1, NSiXi

This means

that:

111 ),(),( rXCdXCd bi 

for all

NSi ,,1 

(11)

Therefore,

 

),(,,1, 111 rCRNSiXi 

Points ● represent the feature vectors

successful customers.

Points ● represent the feature vectors

of non-successful

customers.

Region .

Point ● represents the centroid

of the class 0

Point ● represents the centroid

of the class 1

Point ● Represents the worst

element among successful customers.

Point ● Represents the best

element among non-successful

customers.

Region

Fig.3. The distribution of the learning set

 

NiXi,,1, 

into two b.

(a): separable learning set

(b): non-separable learning set

Fig.4. Representation of learning sets.

In summation, using the previous important remarks, we

deduce the procedure followed to predict the bank credit risk

of a customer based on a precise learning set accumulated by

the bank. We proceed so with the following phases:

Phase of splitting the learning set into two regions:

This phase allows to build two spherical zones by splitting the

learning set into two regions

),( 111 rCR

and

),( 000 rCR

the

first contains the risky elements and the second contains the

non-risky elements. Depending on the nature of the set under

consideration, we follow one of the following two cases :

Case 1: If the learning set, is separable (Fig. 4 (a)), we follow

the following steps:

- Step 1: We calculate the barycentre of all

Successful customers.

- Step 2: We calculate the barycentre of all non-

Successful customers.

- Step 3: We determine the worst element among

the Successful customers and the radius

- Step 4: We determine the best element among non-

Successful customers and the radius .

Case 2: If the learning set is non-separable (Fig.4 (b)), in this

case, to build the two regions we can use the following new

optimization problem:

Find

and

such that:

  

NSjSiXCdXCdXCdXCd jibw ,,1;,,1),,(),(max),(),( 1010  

Under constraint



 ),(),( 111000 rCRrCR

(12)

i.e.

Find

and

such that:

  

NSjSiXCdXCdXCdXCd jibw ,,1;,,1),,(),(max),(),( 1010  

Under constraint

),(),(),( 1010 CCdXCdXCd bw 

(13)

Remark: In any case, we can separate the database used into

two regions

),( 000 rCR

and

),( 111 rCR

such as

.),(),( 111000



 rCRrCR

Phase of prediction CW / credit risk for new customer:

Step 1: Feature vector extraction























X

of customers.

Step 2: We calculate the distances

),( 0XCd

and

),( 1XCd

Step 3: Verify that:

- If

),(,),( 00000 rCRXrXCd 

, it means that

there is no risk, the customer’s application is

strongly accepted.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

111

Volume 21, 2022

- If

11 ),( rXCd 

and

00 ),( rXCd 

, we will compare

),( 1XCd

and

),( 0XCd

, so:

 If

),(),( 01 XCdXCd 

, the credit application

is weakly rejected.

 If

),(),( 10 XCdXCd 

, the credit application

is weakly accepted.

It should be noted that whether the credit application is

rejected or weakly accepted means that the decision is made at

the discretion of the bank manager.

In this section, we will describe the bank credit databases on

the basis of which we will apply and implement our proposed

method (An international and a Moroccan credit database)

methods.

For this work, we use three real life credit datasets (obtained

from South German, Australia, and Taiwan banks) of which

are publicly available from the UCI machine-learning

repository [35]. We decided to use those three credit datasets

because they are very frequently used in the credit-scoring

field especially to test the performance of the classification

model, which conveniently allows us to use them to test the

classification performance of the proposed model and

compare the results to other reference models.

Moroccan dataset is provided by one of the commercial banks

in Morocco. This customer credit application dataset is used in

experiments. It consist of 1000 examples, of which 788

observations (78, 8%) are classified as creditworthy

borrowers, while 212 observations (21, 2%) are classified as

non-creditworthy borrowers and 14 predictive features. This

search used a dichotomous variable – Non-creditworthiness

– (Yes = 1, No = 0), as the outcome variable.

The classification goal is to predict the non-creditworthiness

of borrowers:

Dependent Variable: Creditworthiness borrowers

0 = Creditworthy borrowers.

1= Non-Creditworthy borrowers.

In order to evaluate the performance of creditworthiness

prediction models, various performance evaluation criteria can

be used such as the classification accuracy, Recall or

Sensitivity, Prediction rate, False Alarm rate, Specificity,

AUC of the ROC curve, the F-measure, the Kolmogorov-

Smirnov test, Gini- Coefficient, and among others.

Performance evaluation criteria used in this empirical study

are Classification accuracy, the AUC of the ROC curve with

adding the box plots of predicted pseudo-probabilities as a

powerful metric.

- The AUC value of the ROC curve:

The ROC curve (Receiver Operating Characteristic) is a

useful tool for evaluating the effectiveness of methods and

viewing their capabilities, particularly in the field of credit

risk assessment.

- The Classification Accuracy rate :

The classification accuracy is defined as

Accuracy (%) =

100% ×

NCT

NCCC

(14)

Where, NCCC is the number of correctly classified cases and

NCT is the number of cases used in the test.

In this section, we will discuss the methodology of

implementing the proposed model using some measurement

criteria such as presented in (subsection C), in order to

evaluate the performance of our proposed model with each

compared methods by reporting the results of implementing

our predictive proposed method on each International and

Moroccan data sets. This section is divided into two sections

International credit datasets results and Moroccan credit

datasets results.

Implementation Process for Comparative Analysis

We test the performance of our approach based on splitting

the learning set into two regions one is risky and the other is

not risky, we worked on three real life datasets (South

German, Australia and Taiwan). This real life datasets

classiﬁes credit applicants described by a set of attributes as

good or bad credit risks, has been successfully used for credit

scoring and evaluation systems in many previous works.

Thereafter, we divide each database into two sets, one for the

learning set and the other for the model validation set.

The validation set is also divided into five sub-sets of testing

data S1, S2,..., S5. We then provide a comparative study of

the performance of our predictive proposed model and other

well-known and widely used models in the field of

creditworthiness borrower’s prediction, such as Logistic

Regression (LR), Radial Basis Function Neural Networks

(RBF-NN), and Multilayer-Perceptron Neural Networks

(MLP-NN) as two robust neural network functions in the area

of credit risk prediction.

To measure the predictive ability of each method, we selected

the classification accuracy rate as an appropriate and a

powerful metric used in predicting creditworthiness of

borrowers.

It should be pointed out that, all our numerical experiments

are performed in Matlab 2017 on a PC HP, Intel(R) Core(TM)

I5-5200U CPU @ 2.20 GHz, 4GB of RAM, O.S w.7.

4. Data Collection and Variable Definition

4.1 International Bank Credit Datasets Description

4.2 Moroccan Bank Credit Datasets Description

4.3 Performance Metrics / Measurement Criterion

5. Experiment Results and Analysis

5.1 Experimental Tests and Comparative

Study on International Banks

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

112

Volume 21, 2022

Tables 2, 3 and 4 show the results of predicting borrowers

creditworthiness for the three databases. From these results,

we can see that our predictive proposed method based on

splitting the learning set into two regions outperformed the

tested methods for all the five tested sub-datasets.

Table 2. Comparison of the 4 methods of creditworthiness

prediction results using South German for the tested sets S1,

S2, S3 and S4.

Method

98.71%

93.19%

90.11%

79.12%

75.22%

RBF

99.63%

94.07%

90.77%

81.08%

76.33%

MLP

99.81%

94.43%

91.85%

83.12%

78.42%

Proposed

100%

96.84%

94.73%

91.54%

89.11%

Table 3. Comparison of the 4 methods of creditworthiness

prediction results using Australia Credit datasets.

Method

95.60%

90.18%

87.00%

76.01%

69.93%

RBF

96.52%

90.85%

87.66%

79.21%

73.22%

MLP

96.70%

91.32%

88.74%

80.01%

75.31%

Proposed

99.65%

98.67%

94.62%

91.43%

89.02%

Table 4. Comparison of the 4 methods of creditworthiness

prediction results using Taiwan Credit datasets.

Method

93.45%

88.36%

81.48%

71.61%

69.89%

RBF

94.63%

91.11%

88.07%

77.12%

70.47%

MLP

95.55%

90.96%

86.61%

80.42%

75.12%

Proposed

99.71%

98.72%

95.03%

90.98%

89.44%

Implementation Process for Comparative Analysis

To prove the practicability and the higher performance of our

predictive proposed approach of which its consists on splitting

of the learning set into two regions, a comparative analysis

with some widely and commonly used methods for

creditworthiness prediction models such as Artificial Neural

Networks, including Multilayer-Perceptron network (MLP),

Radial Basis Function (RBF) and Logistic Regression (LR) is

performed and presented in this section.

Prediction by RBF neural network model

The RBF classification results by partition and overall are

presented in Table 5. As shown, the RBF network correctly

classified 578 out of 694 clients in the training sample and

238 out of 306 clients in the test sample. Overall, 83.3% of

training cases and 77.8% of test cases were correctly

classified.

Table 5. RBF-NN classification.

Sample

Observed

Predicted

YES

Correct

Training

529

95,8%

YES

34,5%

Overall

89,6%

10,4%

83,3%

Testing

220

93,2%

YES

25,7%

Overall

88,9%

11,1%

77,8%

The box plots of the predicted pseudo-probabilities are

displayed in Fig.5. For the dependent variable outcome of

customer classification, the chart displays boxplots that

categorize the predicted pseudo-probabilities based on whole

the data set. The 1st boxplot, starting from the left, shows the

predicted probability of the observed creditworthy customer

being in the "Non-defaulting Customer" category. The 2nd

boxplot shows the probability of a creditworthy customer

being classified as a "Non-defaulting customer" when it was

actually in the "Defaulting customer" category. The 3rd

boxplot shows, for outcomes that observed the ''Defaulting

Customer'' category, the predicted probability of the ''Non-

defaulting Customer'' category. The right boxplot shows the

probability of a customer being reported in default when it is

actually classified in the correct ''Defaulting Customer''

category.

Fig.5. Predicted-by-observed chart for RBF-NN.

The ROC curve of the RBF network prediction method based

on the combined training and test samples is presented in

Fig.6. As can be seen the method performed better in terms of

its ROC curve.

Prediction by MLP neural network model

The classification findings for the MLP-NN model by

partition and overall are reported in Table 6. As shown, the

MLP network correctly classified 579 out of 694 clients in the

training sample and 245 out of 306 clients in the test sample.

Overall, 83.4% of training cases and 80.1% of test cases were

correctly classified.

International credit datasets results

5.2 Experimental Tests and Comparative

Study on Moroccan Bank

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

113

Volume 21, 2022

Fig.6. ROC curve for RBF-NN method.

Table 6. MLP-NN classification

Sample

Observed

Predicted

YES

Percent

correct

Training

519

94,0%

YES

42,3%

Overall %

86,6%

13,4%

83,4%

Testing

217

91,9%

YES

40,0%

Overall %

84,6%

15,4%

80,1%

Fig.7. shows box plots of predicted pseudo-probabilities. For

the dependent variable customer classification outcome, the

chart displays box plots that classify the predicted pseudo-

probabilities based on the whole dataset. The 1st from the left,

boxplot shows the predicted probability of the observed

creditworthy customer to be in the ''Non-defaulting

customer'' category. The 2nd boxplot shows, the probability

for a creditworthy customer to be classified in ''Non-

defaulting customer'' category although he really was in ''

Defaulting customer'' category. The 3rd boxplot shows, for

outcomes that have observed category ''Defaulting customer''

the predicted probability of ''Non-defaulting customer''

category. The right boxplot shows, the probability a customer

is declared defaulted who really be classified in the right

category of '' Defaulting customer''.

Fig.7. Predicted-by-observed chart for MLP-NN.

The ROC curve of the MLP network predictive model based

on both training and test samples together is shown in Fig.10.

It can be observed that the model performed better in terms of

ROC curve. If a customer in the category '' Defaulting

customer '' and a customer in the category '' Non defaulting

customer '' are randomly selected, there is 0.744 probability

that the pseudo-probability predicted by the model for the first

customer to be in the '' Non defaulting customer '' category is

greater than the pseudo-probability predicted by the model for

the second client to be in the '' Non defaulting customer ''

category.

Fig.8. ROC curve for MLP-NN method

Prediction by Regression Logistic model

The current study utilized 694 cases to build the logistic

Regression-Scoring model and 306 cases to assess the

developed model. The chi-square result testing the

significance of the LR model is presented in Table 14. It

provides statistical evidence that there is a relationship

between the selected variables and the dependent variable. It

shows that the chi-square probability (144.989) is less than

0.05. In additional, the classification ability of the LR model is

summarized in Table 7. The correct and right predictions are

reported in the diagonal cells, while the off-diagonal cells

contain the wrong and incorrect predictions. It is noticeable

that 87.1% of the non-defaulting clients were classified

correctly, 33.3% of the defaulting clients were classified

correctly, and overall, the correct classification rate of the LR

model was 78% with a threshold of 0.5.

Table 7. Logistic Regression classification results.

Observed

Predicted

Training

cases

Testing

cases

Yes

correct

Yes

correct

500

87,1%

196

83,1%

Yes

33,3%

71,4%

Overall

78%

80,4%

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

114

Volume 21, 2022

Furthermore, the developed method was tested using a testing

subset of 306 cases which of (236 No defaulting clients and

70 defaulting clients) that was not used to create the model.

The overall classification rate for the testing sample was

80,4%. In fact, the LR credit-scoring model performed better

when classifying No-defaulting clients (83,1%) than

classifying defaulting clients (71,4%). Similarly, to evaluate

the performance of the logistic regression model, we choose

the ROC curve of this model based on the combined learning

and testing samples illustrated in Fig.9. below. We can

observe that the model performed better in terms of the ROC

curve.

Fig.9. ROC curve for LR model.

Prediction by our predictive proposed method

By using, the same learning and testing sample applied in the

assessment of the three-credit risk prediction methods on our

proposed predictive method, we achieved the following

findings:

Table 8. Our predictive proposed method summary.

Training

Cross Entropy Error

286,684

Incorrect Predictions

16,5%

Stopping Rule Used

1 consecutive step(s) with no

decrease in error

Training Time

0:00:00,59

Testing

Cross Entropy Error

128,636

Incorrect Predictions

15,6%

Our predictive proposed method summary, presented in Table

8, contains information about the results of the training and

testing sample in which the percentage of incorrect prediction

in the training set was 16.5% and for the testing set was only

15.6%, or the least percentage of incorrect prediction of the

other methods evaluated. In fact, the small value (= 128.636)

of the cross-entropy error in the test sample signals the

robustness of our predictive proposed method in predicting

creditworthiness of borrowers.

As Table 9 illustrates, our predictive proposed method

correctly classified 579 out of 694 clients in the training

sample and 261 out of 306 clients in the test sample. Overall,

83.4% of training cases and 85.3% of test cases were correctly

classified.

Table 9. Our predictive proposed method classification

results.

Sample

Observed

Predicted

YES

correct

Training

499

94,3%

YES

48,5%

Overall

84,1%

15,9%

83,4%

Testing

240

92,7%

YES

44,7%

Overall

86,9%

13,1%

85,3%

As observed in the ROC plot presented in Fig.10. Our

predictive proposed method performed statistically better than

other credit risk assessment methods.

Fig. 10. ROC curve for the predictive proposed method.

Table 10. The summary table of the results of the compared

methods.

Methods

Overall accuracy

AUC value

RBF-NN

77,8%

0,712

MLP-NN

80,1%

0,744

80,4%

0,755

proposed method

85,3%

0,809

From the comparison analysis of predictive capability

conducted on the four creditworthiness borrowers prediction

methods, it is apparent that our proposed predictive method

provided better results in terms of predicting creditworthiness

as it is illustrated in Table 10. In fact, our predictive proposed

method correctly classified 85.3% of the tested cases, which is

better than the Radial Basis Function (77.8%), the Multilayer

Perceptron (80.1%), and the Logistic Regression method

(80.4%). Therefore, our proposed method is more accurate

than other credit risk assessment methods. Hence, Fig.11

shows the ROC curves of the classification models tested in

this study. One can see that our predictive proposed method

achieved better performance in terms of ROC curve (orange

curve) compared to the three others methods within our

dataset. We conclude that the proposed method obtained the

best performance on our Moroccan dataset.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

115

Volume 21, 2022

Fig.11. ROC curves obtained by the different compared

methods.

To summarize, we proposed a method for predicting

creditworthiness of borrowers, which we have called the

method of splitting the learning set into two regions, one risky

and one not risky. Three contemporary machine-learning

methods were compared, to identify the most efficient and

best performing model. After giving a description of the using

International and Moroccan datasets on the basis of which we

have applied our predictive proposed method, each model was

compared on the basis of two performance evaluation metrics:

Classification Accuracy and the AUC value of the ROC curve.

As observed in the experimental results, the ROC plot of the

proposed method is classifier performed statistically better

than other classifiers compared methods which is proven by it

AUC value which is equal to 0,809 and an accuracy of 85,3%.

Based on the test results, it was concluded that our proposed

method based on the splitting the learning set into two regions

is the most favorable classification model since it gives the

highest accuracy in forecasting and best performance in

identification of creditworthiness of borrowers.

The authors of the essay express their sincere gratitude to the

editor and other reviewers and acknowledge their valuable

comments and contributions.

[1] Motwani, A., Chaurasiya, P., and Bajaj, G. « Predicting

Credit Worthiness of Bank Customer with Machine

Learning Over Cloud», International journal of computer

sciences and engineering, vol. 6, pp. 1471‑1477, July

2018.

[2] Maher Ala’raj, Maysam F. Abbod & Munir Majdalawieh,

« Modelling customers credit card behaviour using

bidirectional LSTM neural networks », Journal of Big

Data, vol. 8, no 1, pp. 69, December. 2021.

[3] D. J. Hand and W. E. Henley, « Statistical Classification

Methods in Consumer Credit Scoring: a Review »,

Journal of the Royal Statistical Society. Series A

(Statistics in Society), Vol. 160, No. 3 (1997), pp. 523-

541.

[4] Pawel Pławiak, Mouloud Abdar, Joanna Pławiak,

Vladimir Makarenkov, and U. Rajendra Acharya,

« DGHNL: A new deep genetic hierarchical network of

learners for prediction of credit scoring », Information

Sciences, vol. 516, pp. 401‑418, April 2020.

[5] Cristián Bravo, Lyn C Thomas & Richard Weber,

« Improving credit scoring by differentiating defaulter

behaviour », Journal of the Operational Research Society,

vol. 66, no 5, pp. 771‑781, May 2015.

[6] Sofie Balcaena and Hubert Oogheb, « 35 years of studies

on business failure: an overview of the classic statistical

methodologies and their related problems », The British

Accounting Review, vol. 38, no 1, pp. 63‑93, March

2006.

[7] Hongkyu Jo and Ingoo Han, « Integration of case-based

forecasting, neural network, and discriminant analysis for

bankruptcy prediction », Expert Systems with

Applications, vol. 11, no 4, p. 415‑422, January. 1996.

[8] S. James Press and Sandra Wilson, « Choosing between

Logistic Regression and Discriminant Analysis », Journal

of the American Statistical Association, vol. 73, no 364,

pp. 699‑705, December 1978.

[9] Daniel Martin, « Early warning of bank failure », Journal

of Banking & Finance, vol. 1, no 3, pp. 249‑276,

November. 1977.

[10] Michael Thompson, Richard O. Duda, and Peter E. Hart,

« Pattern Classification and Scene Analysis », Leonardo,

vol. 7, no 4, pp. 370, 1974.

[11] Jacobs Michael and Nicholas M. Kiefer « The Bayesian

Approach to Default Risk: A Guide », no. 10-01, pp. 38.

February 2010.

[12] John. C. Wiginton, « A Note on the Comparison of

Logit and Discriminant Models of Consumer Credit

Behavior », The Journal of Financial and Quantitative

Analysis, vol. 15, no 3, pp. 757-770, September. 1980.

[13] Joshua Ignatius, Adel Hatami-Marbini, Amirah Rahman,

Lalitha Dhamotharan & Pegah Khoshnevis, « A fuzzy

decision support system for credit scoring », Neural

Computing & Applications, vol. 29, no 10, pp. 921‑937,

May 2018.

[14] Edward I. Altman, « FINANCIAL RATIOS,

DISCRIMINANT ANALYSIS AND THE PREDICTION

OF CORPORATE BANKRUPTCY », The Journal of

Finance, vol. 23, no 4, pp. 589‑609, September. 1968.

[15] Tian.-Shyug Lee, Chih-Chou Chiu, Chi-Jie Lu, and I-

Fei Chen, « Credit scoring using the hybrid neural

discriminant technique », Expert Systems with

Applications, vol. 23, no 3, pp. 245‑254, October. 2002.

[16] Edward B. Deakin. R work(s):, « Distributions of

Financial Accounting Ratios: Some Empirical

Evidence », The Accounting Review, vol. 51, no 1, pp.

90‑96, 1976.

[17] Thi Huyen Thanh Dinh and Stefanie Kleimeier, « A

6. Conclusion

Acknowledgment

References

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

116

Volume 21, 2022

credit scoring model for Vietnam’s retail banking

market », International Review of Financial Analysis, vol.

16, no 5, pp. 471‑495, January. 2007.

[18] Francisco Louzada, Anderson Ara, and Guilherme B.

Fernandes, « Classification methods applied to credit

scoring: Systematic review and overall comparison »,

Surveys in Operations Research and Management

Science, vol. 21, no 2, p. 117‑134, December. 2016.

[19] Brian. D. Ripley, Pattern Recognition and Neural

Networks. Cambridge University Press, 2007.

[20] Vladimir Vapnik, The Nature of Statistical Learning

Theory. Springer Science & Business Media, 1999.

[21] Joshua Ignatius, Adel Hatami-Marbini, Amirah Rahman,

Lalitha Dhamotharan and Pegah Khoshnevis, «A fuzzy

decision support system for credit scoring», Neural

Computing & Applications, 29 (10), 921-937.

[22] L. Breiman, J. H. Friedman, R. A. Olshen, et C. J. Stone,

Classification And Regression Trees. Boca Raton:

Routledge, 2017.

[23] W. E. Henley et D. J. Hand, « A k-Nearest-Neighbour

Classifier for Assessing Consumer Credit Risk », Journal

of the Royal Statistical Society. Series D (The

Statistician) vol. 45, no 1, p. 77, 1996.

[24] N. Ghatasheh, « Business Analytics using Random

Forest Trees for Credit Risk Prediction: A Comparison

Study », IJAST, vol. 72, p. 19‑30, November. 2014.

[25] Hossein Etemadi, Ali Asghar Anvary Rostamy, and

Hassan Faraizadeh Dehkordi, « A genetic programming

model for bankruptcy prediction: Empirical evidence

from Iran », Expert Systems with Applications, vol. 36, no

2, p. 3199‑3207, March 2009.

[26] Barbro Back, Teijal Laitinen, and Kaisa Sere, « Neural

networks and genetic algorithms for bankruptcy

predictions », Expert Systems with Applications, vol. 11,

no 4, p. 407‑413, January. 1996.

[27] Adnan Khashman, « Credit risk evaluation using neural

networks: Emotional versus conventional models »,

Applied Soft Computing, vol. 11, no 8, pp. 5477‑5484,

December. 2011.

[28] Rashmi Malhotra and D. K. Malhotra, « Evaluating

consumer loans using neural networks », Omega, vol. 31,

no 2, pp. 83‑96, April. 2003.

[29] Germanno Teles, Joel J. P. C. Rodrigues, Ricardo. A. L.

Rabê, et Sergei. A. Kozlov, « Artificial neural network

and Bayesian network models for credit risk prediction »,

Journal of Artificial Intelligence and Systems, vol. 2, no 1,

pp. 118‑132, 2020.

[30] Sihem Khemakhem and Younes Boujelbène, « Credit

risk prediction: A comparative study between

discriminant analysis and the neural network approach »,

Accounting and Management Information Systems, vol.

14, no 1, pp. 60-78, 2015.

[31] Adnan Dželihodžić, Dženana Đonko, et Jasmin Kevrić,

« Improved Credit Scoring Model Based on Bagging

Neural Network », International Journal of Information

Technology & Decision Making, vol. 17, no 06, pp.

1725‑1741, November. 2018.

[32] Feng Lin, Si-yuan Xie and Jing-ping Yang, «Semi-

analytical Formula for Pricing Bilateral Counterparty

Risk of CDS with Correlated Credit Risks », Acta

Mathematicae Applicatae Sinica, vol. 34, no. 2, pp. 209–

236, April, 2018.

[33] Hussain Ali Bekhet and Shorouq Fathi Kamel Eletter,

«Credit risk assessment model for Jordanian commercial

banks: Neural scoring approach », Review of

Development Finance, vol. 4, no. 1, pp. 20-28, January–

March, 2014.

[34] Yiping Guo, « Credit Risk Assessment of P2P Lending

Platform towards Big Data based on BP Neural Network

», Journal of Visual Communication and Image

Representation, vol. 71, pp. 102730, August, 2020.

[35] « UCI Machine Learning Repository ».

http://archive.ics.uci.edu/ml/index.php.

Zaynab Hjouji carried out the Simulation and Statistics of the

empirical study.

Mohamed M’hamdi was responsible for the planning of the

article.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.12

Zaynab Hjouji, Mohamed Mhamdi

E-ISSN: 2224-266X

117

Volume 21, 2022

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.