The Construction of a Model for Predicting Users' Repeat Purchase
Behavior and its Impact on the Economic Efficiency of Enterprises
QIAN LYU
Lyceum of the Philippines University Manila Campus,
Manila 1002,
PHILIPPINES
*Corresponding Author
Abstract: - Aiming at the shortcomings in efficiency and accuracy of the current prediction methods of user
repeat purchase behavior in e-commerce enterprises, an intelligent prediction model of user repeat purchase
behavior based on machine learning was proposed. In order to enhance the quality of the experimental data,
Kernel Principal Components Analysis (KPCA) and the synthetic Minority oversampling technique (SMOTE)
were first used to preprocess the data. After that, repeat purchase behavior is predicted using a Support Vector
Machine (SVM). Then, the Sparrow Search Algorithm (SSA), based on multi-strategy optimization, is suggested
to overcome the SSVM's drawbacks. The Smooth Support Vector Machine (SSVM) is employed as the feature
classifier for classification. On this basis, an intelligent prediction model of user repeat purchase behavior based
on ISA-SSVM is constructed to achieve efficient prediction of user repeat purchase behavior. The results showed
that the fitness value of the ISA-SSVM algorithm was always higher than other algorithms as the number of
iterations increases. And its convergence speed is fast, when the number of iterations is 13, the fitness value
reaches 94.6%. The error value of this model is 0.14, the loss value is 0.20, the F1 value is 0.957, the recall value
is 0.965, the MAE value is 8.52, the fit degree is 0.992, the prediction accuracy is 97.92%, and the AUC value is
0.995, all of which are better than the other two models. As a result, the ISA-SSVM developed in this work
outperforms previous models in terms of its ability to forecast customers' recurrent purchasing behavior. The
research approach is helpful for e-commerce businesses to implement precision marketing, which has a good
effect on the advantages of e-commerce businesses.
Key-Words: - Repeat purchase behavior; Smooth support vector machine; Sparrow search algorithm; Economic
efficiency; E-commerce; machine learning
Received: August 15, 2022. Revised: July 14, 2023. Accepted: August 16, 2023. Published: September 21, 2023.
1 Introduction
E-commerce has accelerated with the development of
information and Internet technologies and is now one
of the most popular ways to shop. Data relating to user
behavior have been saved in e-commerce platforms in
recent years as the number of businesses using them
grows and the e-commerce industry matures. By
analyzing these data, it is possible to obtain the
consumption behavior habits of users, and then
analyze their consumption patterns. In this context,
user repetitive purchase behavior has always been a
hot spot for research. The prediction of users'
repetitive purchase behavior can effectively help
e-commerce enterprises to achieve accurate
marketing, improve product sales and increase the
economic benefits of enterprises. In the current
research, intelligent prediction of users' repeat
purchase behavior is generally based on machine
learning techniques such as Support Vector Machine
(SVM), Random Forest (RF), and LightGBM model.
However, the prediction accuracy of existing machine
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
303
Volume 11, 2023
learning-based user repetitive purchase behavior
prediction models is low, so they cannot help
e-commerce companies achieve accurate marketing.
To address this problem, the study proposes to use the
Relief algorithm for feature selection and Smooth
Support Vector Machine (SSVM) as a feature
classifier to classify features. In addition, an improved
Sparrow Search Algorithm (SSA) is used for SSVM
optimization for its unsatisfactory performance, and
finally, an ISSA-SSVM-based intelligent prediction
model of users' repetitive purchase behavior is
constructed to achieve high-precision prediction of
users' behavior and thus improve the economic
efficiency of e-commerce enterprises. Two major
innovations are made in this study: the first is the
implementation of user behavior feature classification
based on SSVM, which predicts user repetitive
purchase behavior; the second is the proposal of an
improved SSA algorithm, which is then used to
optimize SSVM for model accuracy, [1], [2], [3]. The
main structure of the study has four parts: the first part
is to review and analyze the recent related research;
the second part is to construct an intelligent prediction
model of user repetitive purchase behavior using
ISSA-SSVM; the third part is for the model
performance analysis; and the last part is to
summarize the research content.
2 Related Works
A huge number of e-commerce businesses and a
large number of users have been drawn to rapidly
developing e-commerce platforms. The rapid
development of e-commerce has had a certain impact
on traditional shopping methods and has brought
development opportunities to many e-commerce
enterprises. The prediction of users' repetitive
purchase behavior can help e-commerce enterprises
accurately control users' preferences and purchasing
habits to realize precise marketing and improve
e-commerce enterprises’ benefits, [4], [5]. Therefore,
user repetitive purchase behavior prediction has
received attention from many researchers. To
determine the variables impacting consumer
purchase decisions in omnichannel retailing and to
forecast customer recurrent buying behavior, Mishra
R et al. conducted a thorough and in-depth
examination of the pertinent literature in recent years,
[6]. Sheth analyzed the impact of neocrown
pneumonia on consumer behavior and explored
whether neocrown, [7]. Shahab et al. proposed a
refined likelihood model and applied it to the
analysis and prediction work of consumer purchase
behavior. Results showed that the model predicted
consumer behavior more accurately, [8]. Sheth Jan
analyzed the correlation between marketing
strategies and consumer behavior, thus exploring the
factors influencing consumer repeat purchase
behavior and providing new ideas for predicting
consumer repeat purchase behavior, [9]. Han
reviewed the latest research successes, thus
analyzing the patterns of consumer behavior in the
tourism and hospitality industries and exploring the
influences of consumer repeat purchase behavior
factors on the development of the tourism and
hospitality industry, [10]. To forecast the market
demand for refurbished gadgets, Suma and Hills
employed data mining techniques to analyze and
predict the repeat purchasing behavior of customers
in the Indian area, [11]. Sarkar and De Bruyn
concluded that deep learning techniques can
effectively replace feature engineering and are based
on a long short-term memory network (LSTM)
response model to construct a marketing analytics
model. Results showed that the model can accurately
analyze and predict consumer purchase behavior,
thus enhancing marketing effectiveness, [12]. Anitha
and Patil constructed an RFM (Recency, Frequency,
Monetary) model and predictive analysis of
customers' repetitive purchase behavior to develop
targeted marketing strategies to improve corporate
turnover, [13].
SVM is a generalized linear binary classifier
with wide applications in classification problems like
image recognition and text classification. SVM can
well avoid dimensional disasters and can solve both
linear and nonlinear problems, but SVM also has
certain drawbacks that affect the classification
efficiency and accuracy of the model. For this reason,
many scholars have proposed strategies to optimize
SVMs and apply optimized SVMs to various fields.
Cervantes et al. conducted a comprehensive survey
of current research results related to SVMs and
discussed the optimization improvements,
application paths, challenges, and development
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
304
Volume 11, 2023
trends of SVMs, [14]. Yu et al. conducted a
comprehensive econometric analysis of relevant
research in China and research trends of SVM and
analyzed the effects of different SVM improvement
methods, providing theoretical support for the
improvement and application of SVM, [15]. Fan and
Sharma combined SVM with least squares support
vector machine (LSSVM) to design an engineering
cost prediction model, to achieve efficient and
accurate intelligent prediction of engineering costs
and provide data support for cost control of
engineering projects, [16]. Neelakandan and Paulraj
proposed an automatic learning model of SVM based
on the optimization of balanced meta-automata (CA)
to achieve intelligent and efficient data prediction,
which provides a reference for the improvement of
SVM optimization as well as practical applications,
[17]. Hamdan designed a statistical SVM-based
intelligent recognition model by which an efficient
and intelligent recognition of handwritten characters
can be achieved and the efficiency of character
recognition can be improved, [18]. [19], used an
improved SVM as a classifier to recognize
dermoscopic images and used the recognition results
to assist in the diagnosis of melanoma. The approach
has a good recognition accuracy and can help
clinically diagnose melanoma, according to the
results, [19]. To predict stock price, Xiao et al.
enhanced the SVM and suggested a least squares
support vector machine integrated model. The
model's great prediction accuracy was demonstrated
by the results, [20]. [21], designed an improved SVM
model based on an unbalanced data set to enhance
the SVM model performance and used the improved
SVM for autonomous vehicle fault diagnosis. The
findings demonstrated that the model may
significantly increase the safety of self-driving
automobiles and has a high diagnostic accuracy, [21].
To sum up, it can be seen that the prediction of
users' repeated purchase behavior is of great
significance to improving enterprises' economic
benefits. According to the findings of the most recent
research, repeat purchase intention has been the
subject of numerous studies, although it cannot be
quantified. Moreover, because this kind of intention
can not be measured by subjective factors and is
unstable, the efficiency and accuracy of current
methods for predicting user repeat purchase behavior
are not high enough. To solve the above problems,
the research from the measurable user behavior into
the data hand, mining the characteristics of users and
merchants to predict the user repeat purchase
behavior. With the SSA method, the SVM algorithm
is enhanced with higher performance and the feature
selection process is given more consideration for a
smaller set of samples. From the standpoint of
feature selection, the features chosen in this manner
can more accurately differentiate such samples and
address the issue of data imbalance. A multi-strategy
optimization SSA was proposed and utilized to
optimize SVM to build an intelligent prediction
model of user repeated purchase behavior of
ISA-SVM to increase prediction accuracy. This
research is helpful for merchants to identify users
with repeat purchase intentions to achieve precision
marketing. At the same time, to promote the
establishment of more connections between
merchants and users, the economic benefits of
e-commerce enterprises have positive significance.
3 Intelligent Prediction Model
Construction of User's Repeat
Purchase Behavior Using ISSA-SVM
3.1 Data Preprocessing based on KPCA
and SMOTE
After communication and consent, some real user
behavior data are obtained from an e-commerce
enterprise management system to build a dataset. The
analysis of these data enables us to obtain the basic
characteristics of users and analyze their
consumption behavior patterns. However, in the
constructed dataset, there are phenomena such as
missing information and data redundancy, which
increase the difficulty of data analysis, so it is
necessary to pre-process them. First, for the problem
of too many features and too high dimensionality of
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
305
Volume 11, 2023
user behavior data, Kernel Principal Components
Analysis (KPCA) is proposed for the data
dimensionality reduction, [22], [23]. The principle of
KPCA is that based on PCA, a kernel function is
introduced to make the dimensionality of the data to
be processed higher. The effect of KPCA on data
classification is shown in Figure 1.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0.0
x2
x1x1
0.4 0.6 0.80.2-0.8 -0.6 -0.4 -0.2
-0.8 -0.6 -0.4 -0.2 0.0 0.4 0.6 0.80.2
x2
(a) PCA (b) KPCA
Fig. 1: The classification effect of KPCA on data
Let there exist a sample in the dataset to be
processed
, 1,2, ,
i
x i n
, which is dimensioned
using KPCA and denoted as
, 1,2, ,
i
x i n
.
Calculate the dimensionalized sample covariance
matrix
, as shown in Equation (1).
1
1nT
ii
i
C x x
n

(1)
Equation (1),
i
x
is the mapping vector of
the sample
in the high-dimensional space with
dimension
*1D
(
Dd
) and
d
represents the
number of features of the sample covariance matrix
. By decomposing Equation (1) into features,
Equation (2) is obtained.
, 1,2,
T
j j j
X X j d

(2)
In Equation (2),
j
is the eigenvalue of the
sample covariance matrix
;
j
is the vector
corresponding to
j
after the dimensionality is
raised to
*1D
;
X
is calculated according to
Equation (3).
12
,,n
X x x x
(3)
It is possible to compute and acquire
j
in the
example below using the linear form of the samples'
high-dimensional mapping vector.
jX
(4)
In Equation (4),
is a column vector with the
dimension
*1n
. Combining Equation (2) and
Equation (4), the equation can be obtained(5)
T T T
jX X X X X X

(5)
In Equation (5), introducing the kernel function
K
, then Equation (6) can be constructed.
j
K
(6)
The solution of
12
,, n
is
performed according to Equation (6). For a certain
sample in the dataset
, the projection of
i
x
in the direction of
j
enables to obtain the nonlinear
dimensionality reduction result of
new
i
x
, as
shown in Equation (7).
new T
i i i
xx

(7)
Combining the above, the sample data of the
dataset is realized to reduce the dimensionality. In
addition to the high dimensionality of the sample
features of the dataset, which affects the results of
user behavior analysis, the dataset constructed by the
study also has an unbalanced data sample. This
situation also affects the accuracy of the subsequent
user behavior analysis. It is simple to over-fit the
classification model, which has an impact on the
classification accuracy when the dataset contains
many data samples and the amount of data samples
varies significantly across various categories. The
samples in the experimental dataset are
pre-processed using the Synthetic Minority
Oversampling Technique (SMOTE) to avoid this
effect and ensure that the samples are roughly
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
306
Volume 11, 2023
balanced. The principle of SMOTE algorithm can be
represented in Figure 2.
f1
f2
(a)
xi
yi
f1
f2
(b)
xi
yiGenerated Composite Instance
Fig. 2: The principle of SMOTE algorithm
In Figure 2, the distance between the samples of
a lesser number of species and other samples is first
calculated, and this step is implemented using
Euclidean distance. The Euclidean distance is
calculated below.
22
11
( , ) ( ) ... ( )
nn
d x y y x y x
(8)
After obtaining the distance between the
samples of a lesser number of species and other
samples, the imbalance ratio of samples in the
experimental dataset is calculated. Then the sampling
ratio is set by the calculation result
N
. After
repeated sampling operations by SMOTE algorithm,
it makes all the sampled data in the dataset achieve a
balanced ratio of 1:1. For any few samples
x
,
within a certain range around it,
*Nx
samples are
selected to construct a new sample, and the above
process is shown below.
(0,1)*( )
new
x x rand x x
(9)
Equation (9),
rand
is a random function.
Based on the above, the oversampling of the samples
in the dataset is completed. KPCA-SMOTE argues
that there is a center for every class, so artificially
constructed few classes should also move towards
the center as the data is expanded. Thus, this method
replaces the SMOTE neighbor of the sample in
SMOTE with the center of the few-class sample. It is
hoped that the expanded small class sample will tend
toward the center of the small class sample. Figure 3
shows the data preprocessing process based on the
KPCA-SMOTE algorithm.
Start
Calculate a few sample centers
The data of less class
samples are extended
The Euclidean distance between the center point of the small
class sample and the small class sample is calculated, and
the small class sample points that are far away from the
center point are deleted according to the imbalance ratio
The expanded data is fused with
the original sample data
Whether the
imbalance ratio
is reached
Finish
Get classification
result
The classification
model is trained
and tested
Update the total
data set
No
Yes
Fig. 3: Smote data preprocessing process based on
the KPCA-SMOTE algorithm
In Figure 3, the KPCA-SMOTE algorithm first
calculates the SMOTE sample center. Then the
sample is expanded and the new few class samples
are obtained by random linear interpolation. The new
sample points that are relatively far away from the
center of the few samples in the expanded sample are
also deleted. According to the imbalance ratio, the
expanded few-class sample data are merged with the
original data and classified by the classification
algorithm. Some users' data are missing information,
such as gender, age, etc. However, the number of
users with missing age information and gender
information is very small, 0.38% and 0.42%,
respectively, so it will not have a more obvious
impact on the overall classification results. For users
with missing age information, the information was
filled with the average age value; for users with
missing gender information, the information was
filled with the plural. In the study, a total of 4
integers from 0 to 3 are used to represent users'
consumption behaviors and their gender information,
including click behavior, add to cart, purchase
behavior, and collection behavior, etc. Any value that
is partially outside this range is considered abnormal
data. A total of 0.65% of abnormal data was included
in the dataset constructed for the study. The
abnormal data will affect the model’s accuracy, so
these abnormal data are removed. The final table of
user behavior information is obtained (Table 1).
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
307
Volume 11, 2023
Table 1. Information on User Behavior
Field
Describe
Merchant_id
Unique ID code for e-commerce
User_id
User's unique ID encoding
Age_ range
User age range
Brand_id
Trademark identification of goods
Cat_id
Identification of the category to which the
product belongs
Gender
User gender, 0 is female, 1 is male
Label
Does the user have a repeat purchase
behavior identifier
Action_type
User behavior category
Time_stamp
Time (month, day)
3.2 Construction of SSVM based on
Multi-strategy Improved SSA Optimization
To profile user behavior, feature mining is performed
in three dimensions: user, enterprise, and
user-enterprise, and the Relief algorithm can select
features to obtain feature attributes that are more
correlated with user repeat purchase, including user
purchase frequency, purchase conversion rate, and
repeat purchase frequency in that enterprise. These
features are input into the SVM for classification and
identification, and the user's repeat purchase
behavior can be predicted. The basic structure of the
support vector machine is shown in Figure 4.
Input X
X(1)
X(2)
X(n)
K(X,X1)
K(X,X2)
K(X,Xn)
Bias b
Output Y
Fig. 4: The Basic Structure of Support Vector
Machines
The basic principle of SVM for classification of
features is to locate a hyperplane within the feature
space such that the distance between the nearest
points on either side of this hyperplane is maximized,
[24]. To further improve SVM’s classification
accuracy and efficiency, this study improves the
SVM objective function as shown in Equation (10).
222
1
,,
1
min 22
N
i
wb
wb

(10)
In Equation (10),
is the weight;
b
is the
bias value, which is a constant;
is the relaxation
variable of the model; and
is the penalty
parameter set manually. In Equation (10), the
positive sign function simulated by the Sigmoid
function is introduced and represented in integral
form, and a smoothness factor is introduced to make
the SVM have smoothness, at which time the model
can be represented by Equation (11).
2
22
1
,,
1
min 1 ,
22
N
ii
i
b
w b p y wx b a

(11)
In Equation (11),
,
ii
xy
is the
i
th input vector
feature and the
i
th classification category,
respectively. The linear classification problem can be
solved by introducing the approximation function in
Equation (11). If one wants to solve a nonlinear
classification problem, the approximation function is
introduced followed by a kernel function, thus
up-dimensioning the nonlinear classification problem,
in a high-dimensional space, so that the problem
maps to a linear problem, [25]. This process can be
represented in Figure 5.
Feature
mapping
Low dimensional
nonlinearity High dimensional
linearity
Fig. 5: Ascending Dimension Transformation for
Nonlinear Classification Problems
To increase the classification accuracy of the
model, an SSVM model is built based on the
information above. The performance of the SSVM
model can be considerably influenced by the proper
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
308
Volume 11, 2023
parameter setup because the penalty factor and
kernel function parameters have a stronger impact on
it. The settings of these two parameters are generally
based on multiple experiments to select the
parameters that make the model performance optimal.
However, this method takes a lot of time and the
selection of the best parameters is often not
satisfactory. To address this problem, a multi-strategy
collaborative improvement SSA is proposed to
optimize the SSVM. The SSA swarm intelligence
optimization technique is used to determine the best
parameter values for the SSVM because it has the
advantages of a simple structure and high
performance in finding the optimal parameters. It
imitates the foraging and anti-predation behaviors of
sparrows. For the defects that SSA can easily fall into
local optimal solutions and weak convergence ability,
the study proposes a Chebyshev chaotic mapping to
initialize the SSA population, as shown in Equation
(12).
tt xtx 1
1coscos
(12)
Equation (12),
is the location of the
sparrow individual
t
in the initial population. The
Chebyshev chaos mapping strategy was able to
increase the diversity of the initial population of SSA,
thus avoiding the phenomenon of premature
maturation in SSA. During the iterative process, the
positions occupied by the discoverers in the SSA
have high fitness values to better guide the
individuals toward the food. In traditional SSA, there
is often a lack of communication between different
discoverer individuals, which may lead the algorithm
to fall into local optimal solutions. The study offers
the golden sine technique to address this issue, which
enhances the discoverers' SSA position update
strategy as seen in Equation (13).
, 1 2 1 , 2
1
,
,2
sin sin sin 1 2
t t t
i j P i j
t
ij t
ij
X r r r c X c X R ST
XX QL R ST

(13)
In Equation (13),
21,rr
is two random numbers
in the range of [0, 2
] and [0,
], which mainly
affect the flight distance and direction of the
individual sparrow;
21,cc
is two golden mean
coefficients, which can guide the individual to the
direction of the optimal solution and can effectively
converge the algorithm;
tji
X,
indicates individuals
current position on the dimension;
i
j
t
P
X
is the
position with the largest fitness value occupied by all
the discoverer individuals
2
R
ST
is a normally
distributed random number;
L
is a matrix of all
elements of
1B
and
is the dimension of the
problem to be solved. Based on the whale
optimization algorithm (WOA), a convergence
factor
a
is introduced to regulate the convergence
speed and the merit-seeking ability of SSA in the
iterative process. An improved nonlinear
convergence factor adjustment strategy is proposed,
as shown in Figure 7, to address the issue that the
linear convergence strategy of WOA causes the
algorithm to enter a local optimization search in the
early stage, which affects the convergence of the
algorithm in the later stage. The Nonlinear
Convergence Factor Adjustment Strategy is
presented in Figure 6.
Epoch
1000 300200 500400
0
0.4
0.8
1.2
1.4
2.0 Original strategy
Improvement strategy
a
Fig. 6: Nonlinear Convergence Factor Adjustment
Strategy
Based on the above algorithm, the intelligent
prediction process of users' repeat purchase behavior
obtained by ISSA-optimized SVM mainly includes
the following steps. The SVM optimization process
for ISSA is shown in Figure 7.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
309
Volume 11, 2023
Start
Initialize SSA parameters and
SVM parameters
Updated fitness values and reverse
individual fitness values were calculated
Update the discoverers according to fitness
ranking and calculate the warning value
Update finder and follower
locations
Update aware of dangerous
sparrow location
Calculate the fitness value of the
updated location Update the current optimal fitness
value and optimal position
Whether the
maximum number
of iterations is
reached
Optimal output parameterization
The optimal SVM model is
obtained
Finish
No
Normalization
Yes
Fig. 7: Intelligent prediction process of user repeat
purchase behavior based on ISA-SSVM algorithm
The intelligent prediction model of user
repeated purchasing behavior's input and output are
first established. The characteristic amount of the
retrieved data serves as both the model's input signal
and the sort of behavior it will produce. For
normalization processing, the data is separated into
training samples and test samples. The sparrow
search method and SVM's initialization parameters
are then set. The training samples were categorized,
and the cross-validation accuracy was used to
determine each sparrow's fitness score. This was later
regenerated into an inverted sparrow population. The
fitness of all individuals in the sparrow population
was sorted and the individuals with higher fitness
values were the discoverers and the others were the
entrants. In a safe state, the sparrows can search
extensively, and if the population is greater than the
warning value, it will have anti-predator behavior.
Next, update the location of the participant. If the
individual fitness value is relatively low, these users
need to search other locations to improve their fitness.
The global optimal information is updated after
comparing this iteration's fitness value to the best
fitness value available at the time. Update
reconnaissance sparrows' locations, and enhance the
population's capacity for worldwide search. Check
the number of iterations to see if it satisfies the
termination criteria. If not, go back to Step 3. When
the maximum number of iterations is reached, the
system stops, outputs the optimal parameters, and
builds the user repeat purchase behavior intelligent
prediction model based on ISA-optimized SVM.
Integrate the above, complete the improvement of
SSA, and construct the ISSA-SSVM user repeat
purchase behavior, intelligent prediction model.
4 Performance Analysis of ISSA-SSVM
User Repeat Purchase Behavior
Prediction Model
After communication and agreement, some real user
behavior data from an e-commerce enterprise
management system was obtained to construct a
dataset. According to a 7:3 ratio, the dataset was split
into a training set and a test set. The test set
evaluated the model's performance after the training
set trained the model. The Genetic Algorithm
(GA)-based optimized XGBoost model
(GA-XGBoost) and the Stacking integrated fusion
model (Stacking) are now the two cutting-edge
models for forecasting customers' repetitive buying
behavior. For user repeat purchase behavior, the
prediction impacts of the ISSA-SSVM model, the
Stacking model, and the GA-XGBoost model are
compared. First, the ISSA-SSVM model, Stacking
model, and GA-XGBoost model are trained by a
training set. Figure 8 displays how the models' error
and loss numbers changed throughout the training
phase. The ISSA-SSVM model converges earlier
compared to the other two models, requiring only 82
iterations, which is 23 and 53 times less than the
Stacking model and GA-XGBoost model. This
proves that the ISSA-SSVM model has superior
convergence compared to the other two models. In
Figure 8(a), the error value of the model no longer
changes after the model fully converges. In the above,
compared with the other two models, the
ISSA-SSVM model requires fewer iterations to fully
converge during the training process, and the error
value and Loss of the model after fully converging
are lower. Values are lower, which indicates that the
ISSA-SSVM model has good convergence and can
complete the training faster, thus improving the
prediction efficiency of users' repeat purchase
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
310
Volume 11, 2023
behavior.
0100 200 30050 150 250 350
Epoch
Error
(a) Error
2.0
1.5
1.0
0.5
0.0 0100 200 30050 150 250
Epoch
Loss
(b) Loss
4.2
3.2
2.2
350
1.2
0.2
ISSA-SSVM
Stacking
GA-XGBoost ISSA-SSVM
Stacking
GA-XGBoost
Fig. 8: Changes in error and loss values
Based on the training samples' greatest
classification accuracy, the goal function is created.
The results of the SSA, PSO, and GWO
optimizations of SVM parameters were compared to
those of the ISSA optimization. Figure 9 shows the
optimization iteration curve of the cross-validation
accuracy of each algorithm.
50
60
70
80
90
100
Fitness /%
Number of iterations
020 40 60 80 100
GWO
PSOSSA
ISSA-SSVM
Fig. 9: Optimization curve of classification accuracy
Figure 9 illustrates how the fitness of the GWO
algorithm declines after more iterations than that of
other algorithms. When there had been 20 iterations,
the fitness had not altered (82.3%). It demonstrates
that, during the optimization process, the GWO
method is most likely to reach the local optimal. The
PSO algorithm's inadequate treatment of discrete
optimization issues prevents it from achieving
improved diagnostic accuracy. The fitness value of
the ISA-SSVM algorithm is consistently greater than
that of other algorithms as the number of iterations
rises. Additionally, its convergence occurs quickly; at
13 iterations, the fitness value reaches 94.6%. Due to
the addition of reconnaissance and early warning
mechanisms and dynamic reverse learning
mechanisms, the algorithm has a high classification
accuracy for training samples. Using the ISA-SSVM
algorithm's parameters to create a prediction model
of repeat customer behavior is possible. Two metrics,
F1 and Recall, are used to analyze the performance
in Figure 10. ISSA-SSVM model's F1 value is 0.957,
which is 0.005 and 0.009 higher than the
GA-XGBoost model and stacking model,
respectively. The Recall value of the ISSA-SSVM
model in Figure 10(b) is 0.965, which is 0.005 and
0.011 less than the values for the Stacking model and
the GA-XGBoost model, respectively. 0.005 and
0.011 points less than the GA-XGBoost and Stacking
models, respectively.
89
90
91
92
93
94
95
96
40 60 80 100
F1
Epoch
(a) F1
20 40 60 80 100
Recall
Epoch
(b) Recall
20
88
ISSA-SSVM
Stacking
GA-XGBoost
ISSA-SSVM
Stacking
GA-XGBoost
89
90
91
92
93
94
95
96
97
Fig. 10: F1 and Recall values for three models
Figure 11 displays the outcomes of the MAE
metric analysis of the three models' performance on
the test set. ISSA-SSVM model's MAE value is 8.52,
which is 1.34 and 2.56 less than the GA-XGBoost
model and stacking model, respectively.
Epoch 200150100500
MAE
8.5
9.0
9.5
10.0
(b) MAE
10.5
11.0
12.5
ISSA-SSVM
Stacking
GA-XGBoost
Fig. 11: MAE values for three models
The test set confirmed the three models' fit
values, as shown in Figure 12. The ISSA-SSVM
model fits the test set data better, and the predicted
value is more closely aligned with the actual value.
The fit value of the ISSA-SSVM model in Figure
12(a) is 0.992, which is 0.006 and 0.014 higher than
the fit values of the Stacking model and the
GA-XGBoost model, respectively.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
311
Volume 11, 2023
4
Estimate
Target value
1 2 3
1
2
3
4
0
0
(a) ISSA-SSVM
Estimate
Target value
(b) Stacking
41 2 3
1
2
3
4
0
0
Estimate
Target value
(c) GA-XGBoost
41 2 3
1
2
3
4
0
0
Fig. 12: Fit of three models
Figure 13 displays the prediction accuracies of
the three models on the test set. ISSA-SSVM model
has a higher prediction accuracy, and the prediction
accuracy value rises faster and more during the
iterative process. In Figure 13, the prediction
accuracy of the ISSA-SSVM model reaches 97.92%,
which is 1.05% and 2.04% higher than the Stacking
model and GA-XGBoost model.
ISSA-SSVM
Stacking
GA-XGBoost
90
91
92
93
94
95
40 60 80 100
Accuracy/%
Epoch
20
98
97
96
Fig. 13: Accuracy values for three models
Figure 14 displays the ROC curves for the three
models on the training and test data. ISSA-SSVM
model's AUC values are noticeably greater than those
of the Stacking model and GA-XGBoost model.
Figure 14(a) shows that the AUC value of the
ISSA-SSVM model on the training set is 0.988,
which is 0.08 and 0.15 higher than that of the
Stacking model and GA-XGBoost model. In Figure
14(b), the AUC value of the ISSA-SSVM model on
the test set is 0.995, which is 0.12 and 0.17 higher
than that of the stacking model and the GA-XGBoost
model, respectively. Combining the above results, it
can be seen that the ISSA-SSVM user repeat
purchase behavior prediction model constructed in
this study has high prediction accuracy and efficiency,
and can achieve efficient and accurate prediction of
users' repeat purchase behavior, to provide data
support for the formulation of precise marketing
strategies for e-commerce enterprises and improve
the economic benefits of enterprises, which has
positive significance for improving the economic
benefits of enterprises.
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
True positive rate
(a) Training set
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
True positive rate
(b) Test set
ISSA-SSVM
Stacking
GA-XGBoost
ISSA-SSVM
Stacking
GA-XGBoost
Fig. 14: AUC values for three models
To evaluate the prediction model of user repeat
purchase behavior based on the ISSA-SSVM
algorithm more intuitively. The experiment selects
the current more advanced prediction model and
compares the GA-SVM model, GWO-SVM model,
and SSA-SVM model with the research model. The
prediction results of each model for households'
repeat purchase behavior are shown in Table 2.
Table 2.Comparison of four prediction models
Argument
Algorithm
GA-S
VM
GWO-S
VM
SSA-S
VM
ISSA-SS
VM
Penalty
parameter
98.2
1000
10.8
137.3
Nuclear
parameter
0.04
0.10
0.17
0.01
Accuracy rate
/%
93.5
90.1
92.2
98.8
Table 2 shows that the model based on the
ISA-SSVM algorithm has the highest accuracy of
98.8% in predicting users' repeat purchase behavior.
Compared with other advanced prediction models,
they are 5.3%, 8.7%, and 6.6% higher than the
GA-SVM model, GGO-SVM model, and SSA-SVM
model, respectively. The results show the superiority
of the research algorithm's prediction.
5 Conclusion
Intelligent and accurate prediction of user repetitive
purchase behavior can help e-commerce companies
develop accurate marketing strategies, thus
increasing their product sales and thus economic
benefits. To this end, the study proposes an
ISSA-SSVM user repetitive purchase behavior
prediction model. The results showed that the
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
312
Volume 11, 2023
ISSA-SSVM model converged earlier and required
only 82 iterations, which was 23 and 53 times less
than the Stacking model and GA-XGBoost model,
respectively; its error value was 0.14, which was
0.12 and 0.15 lower than the Stacking model and
GA-XGBoost model, respectively; its loss value was
0.20, which was higher than the Stacking model and
GA-XGBoost model, respectively; the F1 value was
0. 957, 0.005 and 0.009 higher than the Stacking
model and GA-XGBoost model respectively; the
Recall value was 0.965, 0.005 and the MAE value
was 8.52, which was 1.34 and 2.56 lower than the
Stacking model and GA-XGBoost model
respectively; the Goodness of Fit value reached
0.992, which was 0.006 and 0.014 higher than the
Stacking model and GA-XGBoost model
respectively. The prediction accuracy reached
97.92%, which is 1 higher than the stacking model
and GA. The AUC value is 0.995, which is 0.12 and
0.17 higher than the stacking model and
GA-XGBoost model respectively. User repetitive
purchase behavior prediction, thus providing data
support for the formulation of accurate marketing
strategies of e-commerce enterprises, which has
positive significance for the improvement of their
economic benefits. Because the experiment only
employed user behavior data from one e-commerce
company, there may have been some randomness in
the results, which could have caused differences
between the experimental results and the actual
results. Therefore, it is required to increase the
sample size of experimental data included in the
subsequent study to increase the veracity of the
actual results.
References:
[1] Bawack R E, Wamba S F, Carillo K D A, Akter
S. Artificial intelligence in E-Commerce: A
bibliometric study and literature review.
Electronic Markets, Vol.32, No.1, 2022, pp.
297-338.
[2] Pandiangan S M T. Effect of packaging design
on repurchase intention to the politeknik IT&B
medan using e-commerce applications. Journal
of Production, Operations Management and
Economics (JPOME), Vol.2, No.1, 2022, pp.
15-21.
[3] Kedah Z. Use of e-commerce in the world of
business. Startupreneur Bisnis Digital (SABDA
Journal), Vol.2, No.1, 2023, pp. 51-60.
[4] Lee M, Kwon W, Back K J. Artificial
intelligence for hospitality big data analytics:
developing a prediction model of restaurant
review helpfulness for customer decision
making. International Journal of
Contemporary Hospitality Management,
Vol.33, No.6, 2021, pp. 2117-2136.
[5] Mathew V, Soliman M. Does digital content
marketing affect tourism consumer behavior?
An extension of t echnology acceptance model.
Journal of Consumer Behaviour, Vol.20, No.1,
2021, pp. 61-75.
[6] Mishra R, Singh R K, Koles B. Consumer
decision-making in Omnichannel retailing:
Literature review and future research agenda.
International Journal of Consumer Studies,
Vol.45, No.2, 2021, pp. 147-174.
[7] Sheth J. Impact of Covid-19 on consumer
behavior: will the old habits return or die?.
Journal of business research, Vol.117, 2022,
pp. 280-283.
[8] Shahab M H, Ghazali E, Mohtar M. The role
of elaboration likelihood model in consumer
behaviour research and its extension to new
technologies: a review and future research
agenda. International Journal of Consumer
Studies, Vol.45, No.4, 2021, pp. 664-689.
[9] [9] Sheth J. New areas of research in marketing
strategy, consumer behavior, and marketing
analytics: the future is bright. Journal of
Marketing Theory and Practice, Vol.29, No.1,
2021, pp. 3-12.
[10] Han H. Consumer behavior and environmental
sustainability in tourism and hospitality: A
review of theories, concepts, and latest
research. Journal of Sustainable Tourism,
Vol.29, No.7, 2021, pp. 1021-1042.
[11] Suma V, Hills S M. Data mining based
prediction of demand in Indian market for
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
313
Volume 11, 2023
refurbished electronics. Journal of Soft
Computing Paradigm (JSCP), Vol.2, No.2,
2020, pp. 101-110.
[12] Sarkar M, De Bruyn A. LSTM response
models for direct marketing analytics:
Replacing feature engineering with deep
learning. Journal of Interactive Marketing,
Vol.53, No.1, 2021, pp. 80-95.
[13] Anitha P, Patil M M. RFM model for customer
purchase behavior using K-Means algorithm.
Journal of King Saud University-Computer
and Information Sciences, Vol.34, No.5, 2022,
pp. 1785-1792.
[14] Cervantes J, Garcia-Lamont F,
Rodríguez-Mazahua L, Lopez A. A
comprehensive survey on support vector
machine classification: Applications.
challenges and trends. Neurocomputing,
Vol.408, 2020, pp. 189-215.
[15] Yu D, Xu Z, Wang X. Bibliometric analysis of
support vector machines research trend: A case
study in China. International Journal of
Machine Learning and Cybernetics, Vol.11,
2020, pp. 715-728.
[16] Fan M, Sharma A. Design and implementation
of construction cost prediction model based on
svm and lssvm in industries 4.0. International
Journal of Intelligent Computing and
Cybernetics, Vol.14, No.2, 2021, pp. 145-157.
[17] Neelakandan S, Paulraj D. An automated
exploring and learning model for data
prediction using balanced CA-SVM. Journal
of Ambient Intelligence and Humanized
Computing, Vol.12, 2021, pp.4979-4990.
[18] Hamdan Y B. Construction of statistical SVM
based recognition model for handwritten
character recognition. Journal of Information
Technology, Vol.3, No.2, 2021, pp. 92-107.
[19] Balasubramaniam V. Artificial intelligence
algorithm with SVM classification using
dermascopic images for melanoma diagnosis.
Journal of Artificial Intelligence and Capsule
Networks, Vol.3, No.1, 2021, pp. 34-42.
[20] Xiao C, Xia W, Jiang J. Stock price forecast
based on combined model of
ARI-MA-LS-SVM. Neural Computing and
Applications, Vol.32, 2020, pp. 5379-5388.
[21] Shi Q, Zhang H. Fault diagnosis of an
autonomous vehicle with an improved SVM
algorithm subject to unbalanced datasets. IEEE
Transactions on Industrial Electronics, Vol.68,
No.7, 2020, pp. 6248-6256.
[22] Guo Y, Mustafaoglu Z, Koundal D. Spam
detection using bidirectional transformers and
machine learning classifier algorithms. Journal
of Computational and Cognitive Engineering,
Vol.2, No.1, 2022, pp. 5-9.
[23] Jain K, Saxena A. Simulation on supplier side
bidding strategy at day-ahead electricity
market using ant lion optimizer. Journal of
Computational and Cognitive Engineering,
Vol.2, No.1, 2023, pp. 17-27.
[24] Long X M, Chen Y J, Zhou J. Development of
AR experiment on electric-thermal effect by
open framework with simulation-based asset
and user-defined input. Artificial Intelligence
and Applications. Vol.1, No.1, 2023, pp. 52-57.
[25] Islam A, Othman F, Sakib N, et al. Prevention
of shoulder-surfing attack using shifting
condition with the digraph substitution rules.
Artificial Intelligence and Applications. Vol.1,
No.1, 2023, pp. 58-68.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
314
Volume 11, 2023
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The author contributed in the present research, at all
stages from the formulation of the problem to the
final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The author has no conflict of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.28
Qian Lyu
E-ISSN: 2415-1521
315
Volume 11, 2023