Building Machine Learning Models for Fraud Detection in Customs
Declarations in Senegal
DJAMAL ABDOUL NASSER SECK
Faculty of Science and Technology,
Cheikh Anta Diop University of Dakar,
BP 5005 Dakar-Fann,
SENEGAL
Abstract: - To improve the customs declaration control system in Senegal, we propose fraud risk prediction
models built with machine learning methods such as Neural Networks (MLP), Support Vector Machine (SVM),
Random Forest (RF) and eXtreme Gradient Boosting (XGBoost). These models were built from historical
customs declaration data and then tested on a part of the data reserved for this purpose to evaluate their
prediction performance according to the metrics of accuracy, precision, recall, and F1-Score. The RF model
proved to be the more performant model and is followed, in order, by the XGBoost model, and the MLP and
SVM models.
Key-Words: - fraud detection, customs declarations, machine learning, supervised method, model, prediction,
binary classification.
Received: June 14, 2023. Revised: January 25, 2024. Accepted: February 13, 2024. Published: March 26, 2024.
1 Introduction
Fraud detection in customs declarations is of
great importance for a country. Indeed, customs
fraud creates economic and security risks for a
country because it can lead to loss of financial
revenue or compromise national security
through the entry of illicit or dangerous goods.
In Senegal, the Customs Administration, which
is in charge of collecting revenue and
combating fraud, has set up an automated
system to manage the risks of fraud in
declarations during the importation of goods.
This system is essentially based on two
techniques: customs intelligence and targeting
of risky declarations. Intelligence is the process
of obtaining information about fraud activity
through certain people called informants. This
information is then used to obtain customs
intelligence. However, the use of these
techniques, which are essentially based on the
human perception of fraud, involves a lot of
subjectivity in the system and can distort the
targeting of risky transactions. Thus, to bring
more objectivity and accuracy to the current
fraud risk management system, we propose in
this paper a fraud detection solution based on
artificial intelligence with the use of machine
learning models.
In the rest of this paper, we will first talk
about the context and the problem of our
proposal. Next, we're going to talk about
machine learning, covering some general
information and presenting some of its methods.
Then, we will present the models for detecting
fraud in customs declarations that we propose
by explaining the methodology of their
construction and presenting the results of their
performance tests. We will finish with a
conclusion and perspectives.
2 Context and Issues
The fraud risk management system set up by
Senegalese customs is essentially based on the
intelligence and knowledge of the customs
officer in terms of fraud. It is a decision support
tool that directs verification officers to the
appropriate types of controls. Thus, the
collected information, combined with the
experience of the customs officer, makes it
possible to identify risk criteria. Subsequently,
these criteria are prioritized according to their
impact on fraud, and risk profiles are defined by
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
208
Volume 21, 2024
combining the values of the identified risk
criteria. The defined profiles will serve as the
basis for targeting risky declarations. Thus, the
declarations concerned by this targeting will
systematically be subject to a physical check,
while the declarations deemed to be safe will
only be subject to a documentary check. It
should be noted that the results of the checks
carried out by the auditors are entered into the
information system. As a result, a fraud
database is created. However, this database is
not sufficiently exploited to create, for example,
a model for calculating the risk of fraud on
customs declarations. Thus, the current risk
management system of the Senegalese Customs
presents a problem of objectivity in the
assessment of the risk of fraud insofar as the
targeting of risk declarations is essentially
based on the information collected by the
verification officers as well as their knowledge
of the history of fraud.
In this context, to overcome the inadequacies
of the Senegalese customs' fraud risk
management system, we propose to exploit the
collected data to build fraud detection models
using supervised machine learning methods.
3 Machine Learning Methods
In this section, we give a definition of machine
learning and present some of its methods that
we used to build the models for detecting fraud
in customs declarations.
3.1 Definition of Machine Learning
Machine Learning, [1], is a sub-field of
artificial intelligence whose goal is to give
machines the ability to discover patterns in data
for decision-making using learning methods
that can be supervised in the case of prediction
problems, [2], [3], or unsupervised in the case
of other types of problems.
A supervised machine learning method
learns from the data the mappings between
inputs represented by variables X1, X2, .., XP
and outputs represented by a variable Y to find
a function φ such that Y = φ(X1, X2, .., XP)
that will serve as a model to predict Y for any
new input data ω given its attributes X1(ω),
X2(ω), .., XP(ω).
The variable Y to be predicted is called the
dependent variable and the variables X1, X2, ...,
XP are called independent variables.
The prediction task is a classification if Y is
a qualitative variable and a regression if Y is a
quantitative variable.
In this paper, we propose binary
classification models to predict whether or not a
customs declaration is fraudulent. These models
are built from historical customs declaration
data with the following supervised machine
learning methods: Multilayer Perceptron,
Support Vector Machine, Random Forest, and
Extreme Gradient Boosting.
3.2 Multilayer Perceptron
The multilayer Perceptron, [4], is a type of
artificial neural network used for classification
or regression tasks. An artificial neural network
is a machine learning model that is inspired by
the functioning of the human brain and is
composed of artificial neurons organized in
layers and connected by weighted connections.
Each artificial neuron is a computational unit
that receives input data through its input
connections, adds a value called activation
threshold or bias, and applies an activation
function to give an output.
In a multilayer perceptron, there is an input
layer, one or more hidden layers, and an output
layer. Every neuron of a layer is connected to
all neurons in the next layer and there are no
connections between neurons in the same layer.
The connections between neurons are
characterized by weights that are real numbers.
The multilayer perceptron is a feed-forward
neural network, which means that information
flows from the input layer to the output layer.
Neurons in the input layer have no bias or
activation function. They only receive the input
data and send them to the neurons in the first
hidden layer that processes them with their
activation function, and then, in turn, send their
outputs to the neurons in the next layer, and so
on until the output layer. The outputs of the
neurons in this last layer are the result of the
prediction with the neural network and depend
on the weights of the connections between the
neurons in the network.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
209
Volume 21, 2024
Before using the multilayer perceptron to
predict the output for input data, it must first be
trained with a set of input-output data called a
training set to find the optimal weights that
enable to have good prediction results. This
training phase carried out with the back-
propagation algorithm, [4], [5] and the gradient
descent optimization method, consists of
iteratively modifying the weights of the
connections between neurons to minimize the
prediction error.
3.3 Support Vector Machines
Support Vector Machines, [6], [7] is a machine
learning method whose operating principle
consists of reducing the problem of
classification to that of finding a hyperplane
that will separate, in the space of characteristics,
the examples belonging to different classes and
maximize the distance between these classes.
Such a hyperplane is called the optimal
hyperplane and the distance between the classes
is called the margin. The closest examples,
which alone are used for the determination of
the optimal hyperplane, are called support
vectors.
Among the SVM models, we can distinguish
between linear SVMs and nonlinear SVMs.
Linear SVMs are the simplest because they
make it easy to find the optimal hyperplane. For
nonlinear SVMs, the idea is to achieve, via a
kernel function, [8], a nonlinear transformation
of the data space to allow a linear separation of
the examples in the new space.
3.4 Random Forest
Random Forest, [9], is a special case of bagging
(bootstrap aggregating), [10], which is an
ensemble method whose principle is to combine
forecasts from several independent models to
reduce the variance and therefore the error of
prediction. These models are built from
bootstrap samples obtained by random draw
with replacement from the same data set. In the
case of the Random Forest method, each of
these models is a decision tree, [1], [11], [12],
constructed by the recursive partitioning of a
bootstrap sample. Each partitioning is based on
a test on a cut-off variable chosen from a subset
of input variables selected at random. The final
prediction for a given example is the majority
of the predictions of the different trees in the
case of classification and the average in the case
of regression.
3.5 Extreme Gradient Boosting
Extreme Gradient Boosting, [13], is a special
case of boosting, [14], which is an ensemble
method whose principle is to sequentially
aggregate weak prediction models (weak
learners) into a performing model. These
models are built successively on different
weight distributions of the examples of the
training sample, each model being trained to
correct the errors of those preceding it. Extreme
Gradient Boosting builds decision tree models
and is an optimization of the gradient boosting
method which is a high-performance boosting
method that uses the gradient of the loss
function for the calculation of example weights
when building each new model.
4 Building of the Detection Models of
Customs Fraud
In this section, we first present the data we use
for the building of our fraud detection models in
customs declarations. We also describe the pre-
processing operations we apply to these data to
improve the quality of the models. Then, we
explain the building of these models on a part of
the data used as the training set. Then, using
performance metrics, we compare the
performance of these models on the other part
of the data used as a test set.
4.1 Presentation of the Data
The dataset we use to build our fraud detection
models consists of 25254 examples of customs
declarations characterized by the following 11
variables:
REGIME : the Customs procedure,
PRODUIT : the imported product,
MODE_DE_CONDITIONNEMENT :
the packaging mode of the product ,
CODE_ORIGINE : the origin of the
goods,
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
210
Volume 21, 2024
CODE_PROVENANCE : the
provenance of the goods,
VALEUR_CAF : the value of the goods,
POIDS_NET : the weight,
CODE_DESTINATAIRE : the importer
of the goods,
CODE_DECLARANT : the customs
declarant,
MODE_DE_PAIEMENT : the method
of payment,
FRAUDE : the result of the inspection.
FRAUDE variable, which represents the
result of the inspection, is the dependent
variable. It is a class variable with two distinct
values: 0 which means no fraud and 1 which
means fraud. Of the other variables, which are
the independent variables, only VALEUR_CAF
and POIDS_NET are quantitative variables.
The others are qualitative variables. Figure 1
shows the dataset.
Fig. 1: The dataset
The info() method of the pandas package
gives information about the dataset such as the
number of examples, the list of variables with
their types, and their respective non-zero value
counts.
For our dataset, there are 25254 examples
and as many non-zero values for each of the
variables, which means that there are no
missing values as shown in Figure 2.
Fig. 2: Information on the data
4.2 Data Sampling
To build our fraud detection models, we divide
the data into a training set and a test set. The
training set is used to build the models with the
chosen machine learning methods while the test
set is used to evaluate the prediction
performance of the models and thus evaluate
their ability to generalize to new data.
To do the data splitting, we use the
train_test_split( ) method of the
model_selection module included in the Scikit-
learn, [15], python package to select 67% of the
data to obtain a training set of 16920 examples
and 33% to obtain a test set of 8334 examples.
4.3 Balancing the Class Distribution in the
Training Set
Analyzing the distribution of classes (FRAUDE
= 0 and FRAUDE = 1) in the training set, we
observe a significant imbalance, with about
98.55% of the examples in class 0 and only
1.45% in class 1 as shown in Figure 3.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
211
Volume 21, 2024
Fig. 3: Class distribution in the training set
It is important to solve this class imbalance
problem before building our models from the
training sample because it can lead to a bias in
the models which will then tend to predict the
majority class (FRAUDE = 0).
The class imbalance can be corrected with
the oversampling technique, which consists of
artificially increasing the number of examples
of the minority class. For this, we use the
SMOTENC( ) method of the
imblearn.over_sampling module , which gives
us a learning set of 30015 examples, of which
55.56% are from class 0 and 44.44% are from
class 1 as shown in Figure 4.
Fig. 4: Class distribution in the training set
after over-sampling
4.4 Numerical Encoding of the Qualitative
Variables
The python implementations, [15], [16] of
the machine learning methods we use to build
our models mostly require the data to be
numerical. Thus, it is necessary to perform a
numerical encoding of the qualitative variables
of our dataset.
The type of numeric encoding that is
appropriate for the qualitative variables in our
dataset is one hot encoding because they are
nominal qualitative variables whose values are
unordered categories.
Applying one hot encoding to a nominal
qualitative variable that has n distinct categories
consists of replacing it with n corresponding
binary numeric variables. For each example in
the dataset, the binary variable corresponding to
the observed category is set to 1 and the other
binary variables are set to 0. For example, if a
nominal qualitative variable Xj has 3 distinct
categories a, b, and c, then it will be replaced by
3 binary variables Xj_a, Xj_b, Xj_c as shown in
Figure 5.
Fig. 5: One hot encoding of a qualitative
variable
But if we apply this encoding technique
directly to the qualitative variables in our
dataset, the number of binary variables
generated will be very high, because these
variables mostly have high numbers of distinct
categories.
To avoid this problem of exploding the
number of variables, we limit the one hot
encoding by replacing each of the qualitative
variables with a maximum of ten binary
variables corresponding to its ten most frequent
categories. The other categories will be grouped
under a new category that will be dropped. So,
for any example in the dataset, if the observed
category is one of the most frequent categories,
then the corresponding binary variable will be
set to 1 and the others will be set to 0. If the
observed category is one of the other categories,
then all binary variables will be set to 0.
4.5 Scaling of the Quantitative Variables
After the numerical encoding of the qualitative
variables, we need to scale all of our variables
to avoid having biases in the models we want to
build with our training set. To do this, we use
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
212
Volume 21, 2024
min-max normalization to transform the values
of these variables into values between 0 and 1.
4.6 Building of the Models
After the data pre-processing, we build our
fraud detection models using the machine
learning methods presented above, namely:
Neural Networks (MLP), Support Vector
Machine (SVM), Random Forest (RF), and
eXtreme Gradient Boosting (XGBoost).
We use the Scikit-learn, [15], python
package and XGBoost, [16], library to sample
the data and build and test the models.
Sampling consists of dividing the data into
two samples: a training sample for building the
models and a test sample for testing the models.
For the construction of the models, the
FRAUDE characteristic is the dependent
variable. It is a qualitative variable whose
values are the classes to be predicted: 0 and 1.
The other characteristics are the independent
variables.
Each built model is tested to evaluate its
prediction performance.
4.7 Prediction Performance of the Models
To evaluate the prediction performance of our
models, we use the metrics of accuracy,
precision, recall, and F1-score.
The accuracy of a model is the proportion of
correct predictions over all the predictions of
the classes for the examples of the test set.
Accuracy, recall, and F1-score measure the
predictive performance of a model relative to
one of the classes in the test sample that is
considered the positive class. Examples in this
positive class are positive examples, while those
in the other classes are negative examples.
The precision of a model is the proportion of
positive examples predicted as positive (true
positives) over all the examples predicted as
positive. It measures the model's ability not to
predict the positive class for a negative
example.
The recall of a model is the proportion of
examples predicted as positive over all the
positive examples. It measures the model's
ability to predict the positive class for a positive
example.
A good model is one with high values of
accuracy and recall. However, as the accuracy
increases, the recall decreases and vice versa.
To solve such a dilemma, we use the F1-score,
the harmonic mean of the accuracy and the
recall. An optimal value of the F1-score
corresponds to both an optimal value of the
precision and an optimal value of the recall.
Accuracy, recall, and F1-score are calculated
relatively to each of the classes of the test set.
After, we find the averages of the results
weighted by the number of examples in the
classes to get the overall values of these
metrics.
Table 1 shows the precision, recall and F1-
score values of our fraud detection models
relative to the classes 0 (no fraud) and 1 (fraud)
and the supports (numbers of examples) of
those classes.
Table 1. Precision recall and F1-score of the
models relative to the classes
Class
Support
SVM
XGBoost
MLP
Precision
0
8217
0.99
0.99
0.99
0.99
1
117
0.14
0.06
0.12
0.06
Recall
0
8217
0.97
0.90
0.95
0.90
1
117
0.37
0.46
0.50
0.44
F1-score
0
8217
0.98
0.94
0.97
0.94
1
117
0.20
0.10
0.19
0.10
Table 2 shows the overall prediction
performance of the models on the test set,
represented by the accuracy and the averages of
the precision, of the recall and of the F1- score.
Table 2. Global prediction performance of the
models
Model
Precision
Recall
F1-score
Accuracy
RF
0.98
0.96
0.97
0.96
SVM
0.98
0.89
0.93
0.89
XGBoost
0.98
0.94
0.96
0.94
MLP
0.98
0.89
0.93
0.89
According to the results in Table 1 and Table
2, the random forest (RL) model is the most
efficient, followed by the Extreme Gradient
Boosting (XGB) model. Then, the SVM
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
213
Volume 21, 2024
(Support Vector Machines) model and the
Perceptron Multilayer (MLP) model follow
with the same performance.
5 Conclusion
In this paper, we proposed machine learning
models for the detection of fraud in customs
declarations in Senegal. These models were
built using data from customs declarations and
using the following supervised learning
methods:: Multilayer Perceptron, Support
Vector Machines, Random Forest, and Extreme
Gradient Boosting. The model obtained with
Random Forest was found to perform the best
according to the performance measures we
used, namely precision, recall, F1-score, and
accuracy. Then follow, in order, the model
obtained with Extreme Gradient Boosting and
the models obtained with Multilayer Perceptron
and Support Vector Machines.
In perspective, it would be interesting to
combine these models to form an ensemble
model that would be very efficient in fraud
detection.
These models could be integrated into the
Senegalese Customs' fraud risk management
system to improve the efficiency of controls,
and facilitate the work of customs officers.
References:
[1] T. Mitchell, “Machine learning”, McGraw
Hill, 1997.
[2] D. A. N. Seck and F. B. R. Diakité,
"Supervised Machine Learning Models for the
Prediction of Renal Failure in Senegal," 2023
International Conference on Control,
Artificial Intelligence, Robotics &
Optimization (ICCAIRO), Crete, Greece,
2023, pp. 94-98, DOI:
10.1109/ICCAIRO58903.2023.00022.
[3] N. Paranoan, S. Y. Sabandar, A. Paranoan, E.
Pali, I. Pasulu, "The Effect of Prevention
Measures, Fraud Detection, and Investigative
Audits on Efforts to Minimize Fraud in The
Financial Statements of Companies, Makassar
City Indonesia," WSEAS Transactions on
Information Science and Applications, vol. 19,
pp. 54-62, 2022,
https://doi.org/10.37394/23209.2022.19.6.
[4] P. J. Werbos, “Beyond Regression: New Tools
for Prediction and Analysis in the Behavioral
Sciences”, Doctoral Dissertation, Harvard
University, Cambridge, 1974.
[5] D. E. Rumelhart, G. E. Hinton, R. J. Williams
(1986) , “Learning representations by back-
propagating errors”, Nature, Vol 323, 533-
536.
[6] B. E. Boser, I. M. Guyon, and V. N. Vapnik,
“A training algorithm for optimal margin
classifiers”, In Proceedings of the fifth annual
workshop on Computational learning theory,
pages 144–152, 1992.
[7] C. Cortes and V. Vapnik, “Support-vector
networks”, Machine learning, 20(3):273–297,
1995.
[8] N. Cristianini and J. Shawe-Taylor, “An
introduction to support vector machines and
other kernel-based learning methods”,
Cambridge University Press, 2000, DOI:
10.1017/CBO9780511801389.
[9] L. Breiman, “Random forests”, Machine
learning, 45(1):5–32, 2001.
[10] L. Breiman, “Bagging predictors”, Machine
Learning 24(2), 123-140, 1996.
[11] J. Quinlan, “C4.5: Programs for Machine
Learning”, Morgan Kaufman, San Mateo,
California, 1993.
[12] L. Breiman, J. Friedman, R. Olshen, C. Stone,
“Classification and Regression Trees”,
Wadsworth, Belmont, California, 1984.
[13] T. Chen and C. Guestrin, “Xgboost: A
scalable tree boosting system”, In
Proceedings of the 22nd Acm sigkdd
international conference on knowledge
discovery and data mining, pages 785–794,
2016.
[14] R. Schapire, “The strength of weak
learnability”, Machine Learning, 5(2):197–
227, 1990.
[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V.
Michel, B. Thirion, O. Grisel, M. Blondel, P.
Prettenhofer, R. Weiss, V. Dubourg, J.
Vanderplas, A. Passos, D. Cournapeau.,
“Scikit-learn: Machine Learning in Python”,
JMLR 12, pp. 2825-2830, 2011.
[16] T. Chen, C. Guestrin, “XGBoost: A Scalable
Tree Boosting System”, In: Proceedings of
the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and
Data Mining, New York, NY, USA: ACM;
2016, p. 785–94, (KDD '16), Available from:
http://doi.acm.org/10.1145/2939672.2939785.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
214
Volume 21, 2024
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.20
Djamal Abdoul Nasser Seck
E-ISSN: 2224-3402
215
Volume 21, 2024