The Construction of a Model for Predicting Users' Repeat Purchase

Behavior and its Impact on the Economic Efficiency of Enterprises

QIAN LYU

Lyceum of the Philippines University Manila Campus,

Manila 1002,

PHILIPPINES

*Corresponding Author

Abstract: - Aiming at the shortcomings in efficiency and accuracy of the current prediction methods of user

repeat purchase behavior in e-commerce enterprises, an intelligent prediction model of user repeat purchase

behavior based on machine learning was proposed. In order to enhance the quality of the experimental data,

Kernel Principal Components Analysis (KPCA) and the synthetic Minority oversampling technique (SMOTE)

were first used to preprocess the data. After that, repeat purchase behavior is predicted using a Support Vector

Machine (SVM). Then, the Sparrow Search Algorithm (SSA), based on multi-strategy optimization, is suggested

to overcome the SSVM's drawbacks. The Smooth Support Vector Machine (SSVM) is employed as the feature

classifier for classification. On this basis, an intelligent prediction model of user repeat purchase behavior based

on ISA-SSVM is constructed to achieve efficient prediction of user repeat purchase behavior. The results showed

that the fitness value of the ISA-SSVM algorithm was always higher than other algorithms as the number of

iterations increases. And its convergence speed is fast, when the number of iterations is 13, the fitness value

reaches 94.6%. The error value of this model is 0.14, the loss value is 0.20, the F1 value is 0.957, the recall value

is 0.965, the MAE value is 8.52, the fit degree is 0.992, the prediction accuracy is 97.92%, and the AUC value is

0.995, all of which are better than the other two models. As a result, the ISA-SSVM developed in this work

outperforms previous models in terms of its ability to forecast customers' recurrent purchasing behavior. The

research approach is helpful for e-commerce businesses to implement precision marketing, which has a good

effect on the advantages of e-commerce businesses.

Key-Words: - Repeat purchase behavior; Smooth support vector machine; Sparrow search algorithm; Economic

efficiency; E-commerce; machine learning

Received: August 15, 2022. Revised: July 14, 2023. Accepted: August 16, 2023. Published: September 21, 2023.

1 Introduction

E-commerce has accelerated with the development of

information and Internet technologies and is now one

of the most popular ways to shop. Data relating to user

behavior have been saved in e-commerce platforms in

recent years as the number of businesses using them

grows and the e-commerce industry matures. By

analyzing these data, it is possible to obtain the

consumption behavior habits of users, and then

analyze their consumption patterns. In this context,

user repetitive purchase behavior has always been a

hot spot for research. The prediction of users'

repetitive purchase behavior can effectively help

e-commerce enterprises to achieve accurate

marketing, improve product sales and increase the

economic benefits of enterprises. In the current

research, intelligent prediction of users' repeat

purchase behavior is generally based on machine

learning techniques such as Support Vector Machine

(SVM), Random Forest (RF), and LightGBM model.

However, the prediction accuracy of existing machine

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

303

Volume 11, 2023

learning-based user repetitive purchase behavior

prediction models is low, so they cannot help

e-commerce companies achieve accurate marketing.

To address this problem, the study proposes to use the

Relief algorithm for feature selection and Smooth

Support Vector Machine (SSVM) as a feature

classifier to classify features. In addition, an improved

Sparrow Search Algorithm (SSA) is used for SSVM

optimization for its unsatisfactory performance, and

finally, an ISSA-SSVM-based intelligent prediction

model of users' repetitive purchase behavior is

constructed to achieve high-precision prediction of

users' behavior and thus improve the economic

efficiency of e-commerce enterprises. Two major

innovations are made in this study: the first is the

implementation of user behavior feature classification

based on SSVM, which predicts user repetitive

purchase behavior; the second is the proposal of an

improved SSA algorithm, which is then used to

optimize SSVM for model accuracy, [1], [2], [3]. The

main structure of the study has four parts: the first part

is to review and analyze the recent related research;

the second part is to construct an intelligent prediction

model of user repetitive purchase behavior using

ISSA-SSVM; the third part is for the model

performance analysis; and the last part is to

summarize the research content.

2 Related Works

A huge number of e-commerce businesses and a

large number of users have been drawn to rapidly

developing e-commerce platforms. The rapid

development of e-commerce has had a certain impact

on traditional shopping methods and has brought

development opportunities to many e-commerce

enterprises. The prediction of users' repetitive

purchase behavior can help e-commerce enterprises

accurately control users' preferences and purchasing

habits to realize precise marketing and improve

e-commerce enterprises’ benefits, [4], [5]. Therefore,

user repetitive purchase behavior prediction has

received attention from many researchers. To

determine the variables impacting consumer

purchase decisions in omnichannel retailing and to

forecast customer recurrent buying behavior, Mishra

R et al. conducted a thorough and in-depth

examination of the pertinent literature in recent years,

[6]. Sheth analyzed the impact of neocrown

pneumonia on consumer behavior and explored

whether neocrown, [7]. Shahab et al. proposed a

refined likelihood model and applied it to the

analysis and prediction work of consumer purchase

behavior. Results showed that the model predicted

consumer behavior more accurately, [8]. Sheth Jan

analyzed the correlation between marketing

strategies and consumer behavior, thus exploring the

factors influencing consumer repeat purchase

behavior and providing new ideas for predicting

consumer repeat purchase behavior, [9]. Han

reviewed the latest research successes, thus

analyzing the patterns of consumer behavior in the

tourism and hospitality industries and exploring the

influences of consumer repeat purchase behavior

factors on the development of the tourism and

hospitality industry, [10]. To forecast the market

demand for refurbished gadgets, Suma and Hills

employed data mining techniques to analyze and

predict the repeat purchasing behavior of customers

in the Indian area, [11]. Sarkar and De Bruyn

concluded that deep learning techniques can

effectively replace feature engineering and are based

on a long short-term memory network (LSTM)

response model to construct a marketing analytics

model. Results showed that the model can accurately

analyze and predict consumer purchase behavior,

thus enhancing marketing effectiveness, [12]. Anitha

and Patil constructed an RFM (Recency, Frequency,

Monetary) model and predictive analysis of

customers' repetitive purchase behavior to develop

targeted marketing strategies to improve corporate

turnover, [13].

SVM is a generalized linear binary classifier

with wide applications in classification problems like

image recognition and text classification. SVM can

well avoid dimensional disasters and can solve both

linear and nonlinear problems, but SVM also has

certain drawbacks that affect the classification

efficiency and accuracy of the model. For this reason,

many scholars have proposed strategies to optimize

SVMs and apply optimized SVMs to various fields.

Cervantes et al. conducted a comprehensive survey

of current research results related to SVMs and

discussed the optimization improvements,

application paths, challenges, and development

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

304

Volume 11, 2023

trends of SVMs, [14]. Yu et al. conducted a

comprehensive econometric analysis of relevant

research in China and research trends of SVM and

analyzed the effects of different SVM improvement

methods, providing theoretical support for the

improvement and application of SVM, [15]. Fan and

Sharma combined SVM with least squares support

vector machine (LSSVM) to design an engineering

cost prediction model, to achieve efficient and

accurate intelligent prediction of engineering costs

and provide data support for cost control of

engineering projects, [16]. Neelakandan and Paulraj

proposed an automatic learning model of SVM based

on the optimization of balanced meta-automata (CA)

to achieve intelligent and efficient data prediction,

which provides a reference for the improvement of

SVM optimization as well as practical applications,

[17]. Hamdan designed a statistical SVM-based

intelligent recognition model by which an efficient

and intelligent recognition of handwritten characters

can be achieved and the efficiency of character

recognition can be improved, [18]. [19], used an

improved SVM as a classifier to recognize

dermoscopic images and used the recognition results

to assist in the diagnosis of melanoma. The approach

has a good recognition accuracy and can help

clinically diagnose melanoma, according to the

results, [19]. To predict stock price, Xiao et al.

enhanced the SVM and suggested a least squares

support vector machine integrated model. The

model's great prediction accuracy was demonstrated

by the results, [20]. [21], designed an improved SVM

model based on an unbalanced data set to enhance

the SVM model performance and used the improved

SVM for autonomous vehicle fault diagnosis. The

findings demonstrated that the model may

significantly increase the safety of self-driving

automobiles and has a high diagnostic accuracy, [21].

To sum up, it can be seen that the prediction of

users' repeated purchase behavior is of great

significance to improving enterprises' economic

benefits. According to the findings of the most recent

research, repeat purchase intention has been the

subject of numerous studies, although it cannot be

quantified. Moreover, because this kind of intention

can not be measured by subjective factors and is

unstable, the efficiency and accuracy of current

methods for predicting user repeat purchase behavior

are not high enough. To solve the above problems,

the research from the measurable user behavior into

the data hand, mining the characteristics of users and

merchants to predict the user repeat purchase

behavior. With the SSA method, the SVM algorithm

is enhanced with higher performance and the feature

selection process is given more consideration for a

smaller set of samples. From the standpoint of

feature selection, the features chosen in this manner

can more accurately differentiate such samples and

address the issue of data imbalance. A multi-strategy

optimization SSA was proposed and utilized to

optimize SVM to build an intelligent prediction

model of user repeated purchase behavior of

ISA-SVM to increase prediction accuracy. This

research is helpful for merchants to identify users

with repeat purchase intentions to achieve precision

marketing. At the same time, to promote the

establishment of more connections between

merchants and users, the economic benefits of

e-commerce enterprises have positive significance.

3 Intelligent Prediction Model

Construction of User's Repeat

Purchase Behavior Using ISSA-SVM

3.1 Data Preprocessing based on KPCA

and SMOTE

After communication and consent, some real user

behavior data are obtained from an e-commerce

enterprise management system to build a dataset. The

analysis of these data enables us to obtain the basic

characteristics of users and analyze their

consumption behavior patterns. However, in the

constructed dataset, there are phenomena such as

missing information and data redundancy, which

increase the difficulty of data analysis, so it is

necessary to pre-process them. First, for the problem

of too many features and too high dimensionality of

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

305

Volume 11, 2023

user behavior data, Kernel Principal Components

Analysis (KPCA) is proposed for the data

dimensionality reduction, [22], [23]. The principle of

KPCA is that based on PCA, a kernel function is

introduced to make the dimensionality of the data to

be processed higher. The effect of KPCA on data

classification is shown in Figure 1.

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

0.0

x1x1

0.4 0.6 0.80.2-0.8 -0.6 -0.4 -0.2

-0.8 -0.6 -0.4 -0.2 0.0 0.4 0.6 0.80.2

(a) PCA (b) KPCA

Fig. 1: The classification effect of KPCA on data

Let there exist a sample in the dataset to be

processed

 

, 1,2, ,

x i n

, which is dimensioned

using KPCA and denoted as

 

 

, 1,2, ,

x i n





Calculate the dimensionalized sample covariance

matrix

, as shown in Equation (1).

   

1nT

C x x







(1)

Equation (1),

 



is the mapping vector of

the sample

in the high-dimensional space with

dimension

*1D

(

) and

represents the

number of features of the sample covariance matrix

. By decomposing Equation (1) into features,

Equation (2) is obtained.

   

, 1,2,

j j j

X X j d

    



(2)

In Equation (2),



is the eigenvalue of the

sample covariance matrix

;



is the vector

corresponding to



after the dimensionality is

raised to

*1D

;

 



is calculated according to

Equation (3).

       

 

,,n

X x x x

   



(3)

It is possible to compute and acquire



in the

example below using the linear form of the samples'

high-dimensional mapping vector.

 

  



(4)

In Equation (4),



is a column vector with the

dimension

*1n

. Combining Equation (2) and

Equation (4), the equation can be obtained(5)

           

T T T

jX X X X X X

       



(5)

In Equation (5), introducing the kernel function

, then Equation (6) can be constructed.

  



(6)

The solution of

 

,, n

   



performed according to Equation (6). For a certain

sample in the dataset

, the projection of

 



in the direction of



enables to obtain the nonlinear

dimensionality reduction result of

new

, as

shown in Equation (7).

 

new T

i i i





(7)

Combining the above, the sample data of the

dataset is realized to reduce the dimensionality. In

addition to the high dimensionality of the sample

features of the dataset, which affects the results of

user behavior analysis, the dataset constructed by the

study also has an unbalanced data sample. This

situation also affects the accuracy of the subsequent

user behavior analysis. It is simple to over-fit the

classification model, which has an impact on the

classification accuracy when the dataset contains

many data samples and the amount of data samples

varies significantly across various categories. The

samples in the experimental dataset are

pre-processed using the Synthetic Minority

Oversampling Technique (SMOTE) to avoid this

effect and ensure that the samples are roughly

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

306

Volume 11, 2023

balanced. The principle of SMOTE algorithm can be

represented in Figure 2.

(a)

(b)

yiGenerated Composite Instance

Fig. 2: The principle of SMOTE algorithm

In Figure 2, the distance between the samples of

a lesser number of species and other samples is first

calculated, and this step is implemented using

Euclidean distance. The Euclidean distance is

calculated below.

( , ) ( ) ... ( )

d x y y x y x      

(8)

After obtaining the distance between the

samples of a lesser number of species and other

samples, the imbalance ratio of samples in the

experimental dataset is calculated. Then the sampling

ratio is set by the calculation result

. After

repeated sampling operations by SMOTE algorithm,

it makes all the sampled data in the dataset achieve a

balanced ratio of 1:1. For any few samples

within a certain range around it,

*Nx

samples are

selected to construct a new sample, and the above

process is shown below.

(0,1)*( )

new

x x rand x x  

(9)

Equation (9),

rand

is a random function.

Based on the above, the oversampling of the samples

in the dataset is completed. KPCA-SMOTE argues

that there is a center for every class, so artificially

constructed few classes should also move towards

the center as the data is expanded. Thus, this method

replaces the SMOTE neighbor of the sample in

SMOTE with the center of the few-class sample. It is

hoped that the expanded small class sample will tend

toward the center of the small class sample. Figure 3

shows the data preprocessing process based on the

KPCA-SMOTE algorithm.

Start

Calculate a few sample centers

The data of less class

samples are extended

The Euclidean distance between the center point of the small

class sample and the small class sample is calculated, and

the small class sample points that are far away from the

center point are deleted according to the imbalance ratio

The expanded data is fused with

the original sample data

Whether the

imbalance ratio

is reached

Finish

Get classification

result

The classification

model is trained

and tested

Update the total

data set

Yes

Fig. 3: Smote data preprocessing process based on

the KPCA-SMOTE algorithm

In Figure 3, the KPCA-SMOTE algorithm first

calculates the SMOTE sample center. Then the

sample is expanded and the new few class samples

are obtained by random linear interpolation. The new

sample points that are relatively far away from the

center of the few samples in the expanded sample are

also deleted. According to the imbalance ratio, the

expanded few-class sample data are merged with the

original data and classified by the classification

algorithm. Some users' data are missing information,

such as gender, age, etc. However, the number of

users with missing age information and gender

information is very small, 0.38% and 0.42%,

respectively, so it will not have a more obvious

impact on the overall classification results. For users

with missing age information, the information was

filled with the average age value; for users with

missing gender information, the information was

filled with the plural. In the study, a total of 4

integers from 0 to 3 are used to represent users'

consumption behaviors and their gender information,

including click behavior, add to cart, purchase

behavior, and collection behavior, etc. Any value that

is partially outside this range is considered abnormal

data. A total of 0.65% of abnormal data was included

in the dataset constructed for the study. The

abnormal data will affect the model’s accuracy, so

these abnormal data are removed. The final table of

user behavior information is obtained (Table 1).

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

307

Volume 11, 2023

Table 1. Information on User Behavior

Field

Describe

Merchant_id

Unique ID code for e-commerce

User_id

User's unique ID encoding

Age_ range

User age range

Brand_id

Trademark identification of goods

Cat_id

Identification of the category to which the

product belongs

Gender

User gender, 0 is female, 1 is male

Label

Does the user have a repeat purchase

behavior identifier

Action_type

User behavior category

Time_stamp

Time (month, day)

3.2 Construction of SSVM based on

Multi-strategy Improved SSA Optimization

To profile user behavior, feature mining is performed

in three dimensions: user, enterprise, and

user-enterprise, and the Relief algorithm can select

features to obtain feature attributes that are more

correlated with user repeat purchase, including user

purchase frequency, purchase conversion rate, and

repeat purchase frequency in that enterprise. These

features are input into the SVM for classification and

identification, and the user's repeat purchase

behavior can be predicted. The basic structure of the

support vector machine is shown in Figure 4.

Input X

X(1)

X(2)

X(n)

K(X,X1)

K(X,X2)

K(X,Xn)

Bias b

Output Y

Fig. 4: The Basic Structure of Support Vector

Machines

The basic principle of SVM for classification of

features is to locate a hyperplane within the feature

space such that the distance between the nearest

points on either side of this hyperplane is maximized,

[24]. To further improve SVM’s classification

accuracy and efficiency, this study improves the

SVM objective function as shown in Equation (10).

 

222

min 22











(10)

In Equation (10),

is the weight;

is the

bias value, which is a constant;



is the relaxation

variable of the model; and



is the penalty

parameter set manually. In Equation (10), the

positive sign function simulated by the Sigmoid

function is introduced and represented in integral

form, and a smoothness factor is introduced to make

the SVM have smoothness, at which time the model

can be represented by Equation (11).

 

min 1 ,

w b p y wx b a







   



(11)

In Equation (11),

is the

th input vector

feature and the

th classification category,

respectively. The linear classification problem can be

solved by introducing the approximation function in

Equation (11). If one wants to solve a nonlinear

classification problem, the approximation function is

introduced followed by a kernel function, thus

up-dimensioning the nonlinear classification problem,

in a high-dimensional space, so that the problem

maps to a linear problem, [25]. This process can be

represented in Figure 5.

Feature

mapping

Low dimensional

nonlinearity High dimensional

linearity

Fig. 5: Ascending Dimension Transformation for

Nonlinear Classification Problems

To increase the classification accuracy of the

model, an SSVM model is built based on the

information above. The performance of the SSVM

model can be considerably influenced by the proper

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

308

Volume 11, 2023

parameter setup because the penalty factor and

kernel function parameters have a stronger impact on

it. The settings of these two parameters are generally

based on multiple experiments to select the

parameters that make the model performance optimal.

However, this method takes a lot of time and the

selection of the best parameters is often not

satisfactory. To address this problem, a multi-strategy

collaborative improvement SSA is proposed to

optimize the SSVM. The SSA swarm intelligence

optimization technique is used to determine the best

parameter values for the SSVM because it has the

advantages of a simple structure and high

performance in finding the optimal parameters. It

imitates the foraging and anti-predation behaviors of

sparrows. For the defects that SSA can easily fall into

local optimal solutions and weak convergence ability,

the study proposes a Chebyshev chaotic mapping to

initialize the SSA population, as shown in Equation

(12).

 

tt xtx 1

1coscos 



(12)

Equation (12),

is the location of the

sparrow individual

in the initial population. The

Chebyshev chaos mapping strategy was able to

increase the diversity of the initial population of SSA,

thus avoiding the phenomenon of premature

maturation in SSA. During the iterative process, the

positions occupied by the discoverers in the SSA

have high fitness values to better guide the

individuals toward the food. In traditional SSA, there

is often a lack of communication between different

discoverer individuals, which may lead the algorithm

to fall into local optimal solutions. The study offers

the golden sine technique to address this issue, which

enhances the discoverers' SSA position update

strategy as seen in Equation (13).

   

, 1 2 1 , 2

sin sin sin 1 2

t t t

i j P i j

ij t

X r r r c X c X R ST

XX QL R ST

     









(13)

In Equation (13),

21,rr

is two random numbers

in the range of [0, 2



] and [0,



], which mainly

affect the flight distance and direction of the

individual sparrow;

21,cc

is two golden mean

coefficients, which can guide the individual to the

direction of the optimal solution and can effectively

converge the algorithm;

tji

indicates individual’s

current position on the dimension;

is the

position with the largest fitness value occupied by all

the discoverer individuals

is a normally

distributed random number;

is a matrix of all

elements of

1B

and

is the dimension of the

problem to be solved. Based on the whale

optimization algorithm (WOA), a convergence

factor

is introduced to regulate the convergence

speed and the merit-seeking ability of SSA in the

iterative process. An improved nonlinear

convergence factor adjustment strategy is proposed,

as shown in Figure 7, to address the issue that the

linear convergence strategy of WOA causes the

algorithm to enter a local optimization search in the

early stage, which affects the convergence of the

algorithm in the later stage. The Nonlinear

Convergence Factor Adjustment Strategy is

presented in Figure 6.

Epoch

1000 300200 500400

0.4

0.8

1.2

1.4

2.0 Original strategy

Improvement strategy

Fig. 6: Nonlinear Convergence Factor Adjustment

Strategy

Based on the above algorithm, the intelligent

prediction process of users' repeat purchase behavior

obtained by ISSA-optimized SVM mainly includes

the following steps. The SVM optimization process

for ISSA is shown in Figure 7.

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

309

Volume 11, 2023

Start

Initialize SSA parameters and

SVM parameters

Updated fitness values and reverse

individual fitness values were calculated

Update the discoverers according to fitness

ranking and calculate the warning value

Update finder and follower

locations

Update aware of dangerous

sparrow location

Calculate the fitness value of the

updated location Update the current optimal fitness

value and optimal position

Whether the

maximum number

of iterations is

reached

Optimal output parameterization

The optimal SVM model is

obtained

Finish

Normalization

Yes

Fig. 7: Intelligent prediction process of user repeat

purchase behavior based on ISA-SSVM algorithm

The intelligent prediction model of user

repeated purchasing behavior's input and output are

first established. The characteristic amount of the

retrieved data serves as both the model's input signal

and the sort of behavior it will produce. For

normalization processing, the data is separated into

training samples and test samples. The sparrow

search method and SVM's initialization parameters

are then set. The training samples were categorized,

and the cross-validation accuracy was used to

determine each sparrow's fitness score. This was later

regenerated into an inverted sparrow population. The

fitness of all individuals in the sparrow population

was sorted and the individuals with higher fitness

values were the discoverers and the others were the

entrants. In a safe state, the sparrows can search

extensively, and if the population is greater than the

warning value, it will have anti-predator behavior.

Next, update the location of the participant. If the

individual fitness value is relatively low, these users

need to search other locations to improve their fitness.

The global optimal information is updated after

comparing this iteration's fitness value to the best

fitness value available at the time. Update

reconnaissance sparrows' locations, and enhance the

population's capacity for worldwide search. Check

the number of iterations to see if it satisfies the

termination criteria. If not, go back to Step 3. When

the maximum number of iterations is reached, the

system stops, outputs the optimal parameters, and

builds the user repeat purchase behavior intelligent

prediction model based on ISA-optimized SVM.

Integrate the above, complete the improvement of

SSA, and construct the ISSA-SSVM user repeat

purchase behavior, intelligent prediction model.

4 Performance Analysis of ISSA-SSVM

User Repeat Purchase Behavior

Prediction Model

After communication and agreement, some real user

behavior data from an e-commerce enterprise

management system was obtained to construct a

dataset. According to a 7:3 ratio, the dataset was split

into a training set and a test set. The test set

evaluated the model's performance after the training

set trained the model. The Genetic Algorithm

(GA)-based optimized XGBoost model

(GA-XGBoost) and the Stacking integrated fusion

model (Stacking) are now the two cutting-edge

models for forecasting customers' repetitive buying

behavior. For user repeat purchase behavior, the

prediction impacts of the ISSA-SSVM model, the

Stacking model, and the GA-XGBoost model are

compared. First, the ISSA-SSVM model, Stacking

model, and GA-XGBoost model are trained by a

training set. Figure 8 displays how the models' error

and loss numbers changed throughout the training

phase. The ISSA-SSVM model converges earlier

compared to the other two models, requiring only 82

iterations, which is 23 and 53 times less than the

Stacking model and GA-XGBoost model. This

proves that the ISSA-SSVM model has superior

convergence compared to the other two models. In

Figure 8(a), the error value of the model no longer

changes after the model fully converges. In the above,

compared with the other two models, the

ISSA-SSVM model requires fewer iterations to fully

converge during the training process, and the error

value and Loss of the model after fully converging

are lower. Values are lower, which indicates that the

ISSA-SSVM model has good convergence and can

complete the training faster, thus improving the

prediction efficiency of users' repeat purchase

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

310

Volume 11, 2023

behavior.

0100 200 30050 150 250 350

Epoch

Error

(a) Error

2.0

1.5

1.0

0.5

0.0 0100 200 30050 150 250

Epoch

Loss

(b) Loss

4.2

3.2

2.2

350

1.2

0.2

ISSA-SSVM

Stacking

GA-XGBoost ISSA-SSVM

Stacking

GA-XGBoost

Fig. 8: Changes in error and loss values

Based on the training samples' greatest

classification accuracy, the goal function is created.

The results of the SSA, PSO, and GWO

optimizations of SVM parameters were compared to

those of the ISSA optimization. Figure 9 shows the

optimization iteration curve of the cross-validation

accuracy of each algorithm.

100

Fitness /%

Number of iterations

020 40 60 80 100

GWO

PSOSSA

ISSA-SSVM

Fig. 9: Optimization curve of classification accuracy

Figure 9 illustrates how the fitness of the GWO

algorithm declines after more iterations than that of

other algorithms. When there had been 20 iterations,

the fitness had not altered (82.3%). It demonstrates

that, during the optimization process, the GWO

method is most likely to reach the local optimal. The

PSO algorithm's inadequate treatment of discrete

optimization issues prevents it from achieving

improved diagnostic accuracy. The fitness value of

the ISA-SSVM algorithm is consistently greater than

that of other algorithms as the number of iterations

rises. Additionally, its convergence occurs quickly; at

13 iterations, the fitness value reaches 94.6%. Due to

the addition of reconnaissance and early warning

mechanisms and dynamic reverse learning

mechanisms, the algorithm has a high classification

accuracy for training samples. Using the ISA-SSVM

algorithm's parameters to create a prediction model

of repeat customer behavior is possible. Two metrics,

F1 and Recall, are used to analyze the performance

in Figure 10. ISSA-SSVM model's F1 value is 0.957,

which is 0.005 and 0.009 higher than the

GA-XGBoost model and stacking model,

respectively. The Recall value of the ISSA-SSVM

model in Figure 10(b) is 0.965, which is 0.005 and

0.011 less than the values for the Stacking model and

the GA-XGBoost model, respectively. 0.005 and

0.011 points less than the GA-XGBoost and Stacking

models, respectively.

40 60 80 100

Epoch

(a) F1

20 40 60 80 100

Recall

Epoch

(b) Recall

ISSA-SSVM

Stacking

GA-XGBoost

ISSA-SSVM

Stacking

GA-XGBoost

Fig. 10: F1 and Recall values for three models

Figure 11 displays the outcomes of the MAE

metric analysis of the three models' performance on

the test set. ISSA-SSVM model's MAE value is 8.52,

which is 1.34 and 2.56 less than the GA-XGBoost

model and stacking model, respectively.

Epoch 200150100500

MAE

8.5

9.0

9.5

10.0

(b) MAE

10.5

11.0

12.5

ISSA-SSVM

Stacking

GA-XGBoost

Fig. 11: MAE values for three models

The test set confirmed the three models' fit

values, as shown in Figure 12. The ISSA-SSVM

model fits the test set data better, and the predicted

value is more closely aligned with the actual value.

The fit value of the ISSA-SSVM model in Figure

12(a) is 0.992, which is 0.006 and 0.014 higher than

the fit values of the Stacking model and the

GA-XGBoost model, respectively.

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

311

Volume 11, 2023

Estimate

Target value

1 2 3

(a) ISSA-SSVM

Estimate

Target value

(b) Stacking

41 2 3

Estimate

Target value

41 2 3

Fig. 12: Fit of three models

Figure 13 displays the prediction accuracies of

the three models on the test set. ISSA-SSVM model

has a higher prediction accuracy, and the prediction

accuracy value rises faster and more during the

iterative process. In Figure 13, the prediction

accuracy of the ISSA-SSVM model reaches 97.92%,

which is 1.05% and 2.04% higher than the Stacking

model and GA-XGBoost model.

ISSA-SSVM

Stacking

GA-XGBoost

40 60 80 100

Accuracy/%

Epoch

Fig. 13: Accuracy values for three models

Figure 14 displays the ROC curves for the three

models on the training and test data. ISSA-SSVM

model's AUC values are noticeably greater than those

of the Stacking model and GA-XGBoost model.

Figure 14(a) shows that the AUC value of the

ISSA-SSVM model on the training set is 0.988,

which is 0.08 and 0.15 higher than that of the

Stacking model and GA-XGBoost model. In Figure

14(b), the AUC value of the ISSA-SSVM model on

the test set is 0.995, which is 0.12 and 0.17 higher

than that of the stacking model and the GA-XGBoost

model, respectively. Combining the above results, it

can be seen that the ISSA-SSVM user repeat

purchase behavior prediction model constructed in

this study has high prediction accuracy and efficiency,

and can achieve efficient and accurate prediction of

users' repeat purchase behavior, to provide data

support for the formulation of precise marketing

strategies for e-commerce enterprises and improve

the economic benefits of enterprises, which has

positive significance for improving the economic

benefits of enterprises.

1.0

0.8

0.6

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

True positive rate

(a) Training set

1.0

0.8

0.6

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

True positive rate

(b) Test set

ISSA-SSVM

Stacking

GA-XGBoost

ISSA-SSVM

Stacking

GA-XGBoost

Fig. 14: AUC values for three models

To evaluate the prediction model of user repeat

purchase behavior based on the ISSA-SSVM

algorithm more intuitively. The experiment selects

the current more advanced prediction model and

compares the GA-SVM model, GWO-SVM model,

and SSA-SVM model with the research model. The

prediction results of each model for households'

repeat purchase behavior are shown in Table 2.

Table 2.Comparison of four prediction models

Argument

Algorithm

GA-S

GWO-S

SSA-S

ISSA-SS

Penalty

parameter

98.2

1000

10.8

137.3

Nuclear

parameter

0.04

0.10

0.17

0.01

Accuracy rate

93.5

90.1

92.2

98.8

Table 2 shows that the model based on the

ISA-SSVM algorithm has the highest accuracy of

98.8% in predicting users' repeat purchase behavior.

Compared with other advanced prediction models,

they are 5.3%, 8.7%, and 6.6% higher than the

GA-SVM model, GGO-SVM model, and SSA-SVM

model, respectively. The results show the superiority

of the research algorithm's prediction.

5 Conclusion

Intelligent and accurate prediction of user repetitive

purchase behavior can help e-commerce companies

develop accurate marketing strategies, thus

increasing their product sales and thus economic

benefits. To this end, the study proposes an

ISSA-SSVM user repetitive purchase behavior

prediction model. The results showed that the

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

312

Volume 11, 2023

ISSA-SSVM model converged earlier and required

only 82 iterations, which was 23 and 53 times less

than the Stacking model and GA-XGBoost model,

respectively; its error value was 0.14, which was

0.12 and 0.15 lower than the Stacking model and

GA-XGBoost model, respectively; its loss value was

0.20, which was higher than the Stacking model and

GA-XGBoost model, respectively; the F1 value was

0. 957, 0.005 and 0.009 higher than the Stacking

model and GA-XGBoost model respectively; the

Recall value was 0.965, 0.005 and the MAE value

was 8.52, which was 1.34 and 2.56 lower than the

Stacking model and GA-XGBoost model

respectively; the Goodness of Fit value reached

0.992, which was 0.006 and 0.014 higher than the

Stacking model and GA-XGBoost model

respectively. The prediction accuracy reached

97.92%, which is 1 higher than the stacking model

and GA. The AUC value is 0.995, which is 0.12 and

0.17 higher than the stacking model and

GA-XGBoost model respectively. User repetitive

purchase behavior prediction, thus providing data

support for the formulation of accurate marketing

strategies of e-commerce enterprises, which has

positive significance for the improvement of their

economic benefits. Because the experiment only

employed user behavior data from one e-commerce

company, there may have been some randomness in

the results, which could have caused differences

between the experimental results and the actual

results. Therefore, it is required to increase the

sample size of experimental data included in the

subsequent study to increase the veracity of the

actual results.

References:

[1] Bawack R E, Wamba S F, Carillo K D A, Akter

S. Artificial intelligence in E-Commerce: A

bibliometric study and literature review.

Electronic Markets, Vol.32, No.1, 2022, pp.

297-338.

[2] Pandiangan S M T. Effect of packaging design

on repurchase intention to the politeknik IT&B

medan using e-commerce applications. Journal

of Production, Operations Management and

Economics (JPOME), Vol.2, No.1, 2022, pp.

15-21.

[3] Kedah Z. Use of e-commerce in the world of

business. Startupreneur Bisnis Digital (SABDA

Journal), Vol.2, No.1, 2023, pp. 51-60.

[4] Lee M, Kwon W, Back K J. Artificial

intelligence for hospitality big data analytics:

developing a prediction model of restaurant

review helpfulness for customer decision

making. International Journal of

Contemporary Hospitality Management,

Vol.33, No.6, 2021, pp. 2117-2136.

[5] Mathew V, Soliman M. Does digital content

marketing affect tourism consumer behavior?

An extension of t echnology acceptance model.

Journal of Consumer Behaviour, Vol.20, No.1,

2021, pp. 61-75.

[6] Mishra R, Singh R K, Koles B. Consumer

decision-making in Omnichannel retailing:

Literature review and future research agenda.

International Journal of Consumer Studies,

Vol.45, No.2, 2021, pp. 147-174.

[7] Sheth J. Impact of Covid-19 on consumer

behavior: will the old habits return or die?.

Journal of business research, Vol.117, 2022,

pp. 280-283.

[8] Shahab M H, Ghazali E, Mohtar M. The role

of elaboration likelihood model in consumer

behaviour research and its extension to new

technologies: a review and future research

agenda. International Journal of Consumer

Studies, Vol.45, No.4, 2021, pp. 664-689.

[9] [9] Sheth J. New areas of research in marketing

strategy, consumer behavior, and marketing

analytics: the future is bright. Journal of

Marketing Theory and Practice, Vol.29, No.1,

2021, pp. 3-12.

[10] Han H. Consumer behavior and environmental

sustainability in tourism and hospitality: A

review of theories, concepts, and latest

research. Journal of Sustainable Tourism,

Vol.29, No.7, 2021, pp. 1021-1042.

[11] Suma V, Hills S M. Data mining based

prediction of demand in Indian market for

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

313

Volume 11, 2023

refurbished electronics. Journal of Soft

Computing Paradigm (JSCP), Vol.2, No.2,

2020, pp. 101-110.

[12] Sarkar M, De Bruyn A. LSTM response

models for direct marketing analytics:

Replacing feature engineering with deep

learning. Journal of Interactive Marketing,

Vol.53, No.1, 2021, pp. 80-95.

[13] Anitha P, Patil M M. RFM model for customer

purchase behavior using K-Means algorithm.

Journal of King Saud University-Computer

and Information Sciences, Vol.34, No.5, 2022,

pp. 1785-1792.

[14] Cervantes J, Garcia-Lamont F,

Rodríguez-Mazahua L, Lopez A. A

comprehensive survey on support vector

machine classification: Applications.

challenges and trends. Neurocomputing,

Vol.408, 2020, pp. 189-215.

[15] Yu D, Xu Z, Wang X. Bibliometric analysis of

support vector machines research trend: A case

study in China. International Journal of

Machine Learning and Cybernetics, Vol.11,

2020, pp. 715-728.

[16] Fan M, Sharma A. Design and implementation

of construction cost prediction model based on

svm and lssvm in industries 4.0. International

Journal of Intelligent Computing and

Cybernetics, Vol.14, No.2, 2021, pp. 145-157.

[17] Neelakandan S, Paulraj D. An automated

exploring and learning model for data

prediction using balanced CA-SVM. Journal

of Ambient Intelligence and Humanized

Computing, Vol.12, 2021, pp.4979-4990.

[18] Hamdan Y B. Construction of statistical SVM

based recognition model for handwritten

character recognition. Journal of Information

Technology, Vol.3, No.2, 2021, pp. 92-107.

[19] Balasubramaniam V. Artificial intelligence

algorithm with SVM classification using

dermascopic images for melanoma diagnosis.

Journal of Artificial Intelligence and Capsule

Networks, Vol.3, No.1, 2021, pp. 34-42.

[20] Xiao C, Xia W, Jiang J. Stock price forecast

based on combined model of

ARI-MA-LS-SVM. Neural Computing and

Applications, Vol.32, 2020, pp. 5379-5388.

[21] Shi Q, Zhang H. Fault diagnosis of an

autonomous vehicle with an improved SVM

algorithm subject to unbalanced datasets. IEEE

Transactions on Industrial Electronics, Vol.68,

No.7, 2020, pp. 6248-6256.

[22] Guo Y, Mustafaoglu Z, Koundal D. Spam

detection using bidirectional transformers and

machine learning classifier algorithms. Journal

of Computational and Cognitive Engineering,

Vol.2, No.1, 2022, pp. 5-9.

[23] Jain K, Saxena A. Simulation on supplier side

bidding strategy at day-ahead electricity

market using ant lion optimizer. Journal of

Computational and Cognitive Engineering,

Vol.2, No.1, 2023, pp. 17-27.

[24] Long X M, Chen Y J, Zhou J. Development of

AR experiment on electric-thermal effect by

open framework with simulation-based asset

and user-defined input. Artificial Intelligence

and Applications. Vol.1, No.1, 2023, pp. 52-57.

[25] Islam A, Othman F, Sakib N, et al. Prevention

of shoulder-surfing attack using shifting

condition with the digraph substitution rules.

Artificial Intelligence and Applications. Vol.1,

No.1, 2023, pp. 58-68.

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

314

Volume 11, 2023

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The author contributed in the present research, at all

stages from the formulation of the problem to the

final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The author has no conflict of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on COMPUTER RESEARCH

DOI: 10.37394/232018.2023.11.28

Qian Lyu

E-ISSN: 2415-1521

315

Volume 11, 2023