ross-border e-commerce refers specifically to cross-border

electronic commerce platform enterprises, which includes

both third-party cross-border e-commerce platforms and

self-built cross-border e-commerce platforms. In the

cross-border electronic commerce transaction, cross-border

e-commerce is the network hub of transaction activities, which

is not only the medium of commodity display and browsing, but

also the place where commodities are traded, and plays the role

of bridging the supply and consumption of commodities [1].

Cross-border e-commerce will be very cumbersome and

complicated when trading commodities, which also leads to

many different operation modes of cross-border e-commerce.

How to operate cross-border e-commerce reasonably and

effectively to better serve people in the future is a problem that

the relevant departments of cross-border e-commerce need to

focus on [2].

In recent years, data mining clustering algorithm has become

the most important means to achieve the goal of customer

segmentation. Among them, K-means algorithm is the most

widely used one. Many scholars have improved it and applied it

[3-4], or made in-depth performance comparison analysis

between K-means algorithm and other clustering analysis

methods [5]. Literature [6] puts forward a new method of initial

clustering center, which can get a better initial clustering center

without setting a threshold, but it needs to scan the data set to be

measured several times and calculate the distance of the

corresponding data, which leads to greater computational

complexity than other algorithms. Literature [7] proposed an

algorithm for optimizing the initial center of K-means

algorithm. When calculating the density of objects, this

algorithm adopted the density-sensitive similarity measure and

generated the initial cluster center of samples. Literature [8]

proposed a new effective clustering function. In order to reduce

the influence of outliers on the clustering results of the

algorithm, the weighted K-means method is used to improve

the traditional algorithm and get the clustering center.

Compared with the traditional K-means, its clustering results

are more effective. Literature [9] proposed an improved

K-means algorithm based on genetic algorithm, and adopted a

customer behavior segmentation model based on k-means

algorithm to segment customers. Literature [10] uses fuzzy

C-means clustering algorithm as the method of customer

clustering. It provides a q uantitative basis for the feature

analysis of customer groups, and obtains satisfactory customer

clustering results. On the basis of comprehensive analysis of

grid clustering algorithm and K-means clustering algorithm,

literature [11] proposed an algorithm based on minimum

clustering unit. In order to optimize the improper selection of

initial points in K-means clustering algorithm, the classification

results are greatly affected.

As a key component of customer relationship management,

customer segmentation has gradually become an important

premise for enterprises to apply customer relationship

management. Through customer segmentation, enterprises can

not only better identify different needs of different customers

for enterprises, but also provide different services to different

customers, thus improving customer satisfaction and loyalty.

You can also find potential valuable customers in the customer

A Clustering Algorithm for Cross-border E-commerce Customer

Segmentation

1SHULING YANG, 2YAN HOU

1Foreign Language School of Jilin Normal University, Siping CHINA

2Management School of Jilin Normal University, Siping, CHINA

Abstract—With the deepening of reform and opening up, cross-border e-commerce has made great progress

and plays a very important role in today's society. Cross-border e-commerce is not only a place for commodity

trading, but also a key channel for information communication when commodities are traded. Clustering

analysis is one of the common technologies in the field of data mining, and it has its unique advantages in the

application of customer segmentation. Firstly, this paper improves the selection of the initial clustering center

of K-means clustering algorithm. Aiming at the defects of the existing literature, such as long time-consuming

algorithm and poor accuracy when calculating the corresponding sample points for multiple maximum density

parameter values as the initial clustering center, an improved scheme based on quadratic density is proposed

and applied to customer value segmentation. The research shows that the improved K-means clustering

algorithm significantly improves the quality of clustering, thus improving the effectiveness and pertinence of

cross-border e-commerce marketing activities.

Keywords: K-means clustering algorithm; Cross-border e-commerce; Customer segmentation

Received: May 29, 2021. Revised: March 19, 2022. Accepted: April 27, 2022. Published: May 20, 2022.

1. Introduction

PROOF

DOI: 10.37394/232020.2022.2.17

Shuling Yang, Yan Hou

E-ISSN: 2732-9941

138

Volume 2, 2022

group and enhance the competitiveness of enterprises. The

main work of this paper is to improve the K-means clustering

algorithm and integrate a single clustering algorithm with the

idea of ensemble learning, and then apply the clustering

ensemble algorithm to customer segmentation of cross-border

e-commerce.

Generally speaking, customer segmentation can be carried

out according to the following three customer attributes [12]:

(1)External attribute

For example, the geographical distribution of customers, the

products owned by customers, and the organizational

ownership of customers-enterprise users, individual users,

government users, etc. This kind of stratification is usually the

simplest and most intuitive, but at the same time it is also a

relatively extensive classification. We still don't know which

customers contribute more to the enterprise and those

customers contribute less to the enterprise at every customer

level.

(2)Intrinsic attribute

Intrinsic attributes are attributes determined by the internal

factors of customers, such as gender, age, beliefs, hobbies,

income, family members, credit, personality and value

orientation, etc.

(3)Characteristics of consumption behavior

According to the consumer behavior, we can master the real

consumer habits and tendencies of customers, and usually get

ideal results in practice. However, classification according to

consumption behavior also has its limitations. It can only be

applied to existing customers. For potential customers, because

consumption behavior has not yet started, of course,

classification is impossible.

Customer segmentation can generally be divided into five

steps (as shown in Figure 1):

Figure 1 Customer segmentation steps

(1)Subdivision of general characteristics of customers

In order to classify customers according to these

characteristics, the main factors that should be considered are:

regional characteristics, such as urban or rural areas, urban

scale and urban economic development level; Education

background, such as age, sex, education level, nature of work

unit, position or level; Psychological factors, such as

personality characteristics, moral development level, etc.

(2)Customer value segmentation

Customers' contribution to enterprises is different according

to their own consumption level. Therefore, after customers are

subdivided according to their general characteristics, they

should be divided into several grades according to their

contribution to the enterprise, such as high-quality customers,

potential customers, general customers, small customers and

blacklist customers, etc.

(3)Customer common demand segmentation

On the basis of the first two steps of subdivision, select the

high-quality customers and potential high-quality customers in

the enterprise as the target. Analyze the demand characteristics

of all kinds of customers, and formulate enterprise strategies

under the guidance of customer demand, and finally provide

personalized products and services for each customer group.

(4)Select a clustering method suitable for enterprise data

characteristics

Clustering algorithm is an unsupervised learning algorithm.

When using clustering technology to subdivide customers, we

should choose the appropriate algorithm according to the needs

of enterprises, the characteristics of customers and the collected

data, so as to mine and discover the true distribution of data.

(5)Evaluate the customer segmentation model

The purpose of customer segmentation model is to divide

customers into different clusters according to their various

characteristics. According to the needs of enterprises,

customers in the same cluster should have similar contribution

and consumption tendency, while customers in different

clusters should try to be different in these aspects. These

characteristics can be measured according to the mean and

variance of customer attributes.

Clustering is a main technology in data mining. The process

of grouping a set of objects into multiple classes composed of

similar objects is called clustering [13-14]. After grouping, the

objects in the same class are similar, but the objects in different

classes are different. At the same time, cluster analysis is often

used as the first step of data mining, which preprocesses the

data, and then uses other algorithms to further analyze the

obtained classes. Clustering algorithm can be divided into

partition method, hierarchical method, density-based method,

grid-based method and model-based method.

K-means algorithm is one of the clustering algorithms based

on partition. It uses an iterative climbing method to discover

clusters and cluster centers from unlabeled data sets. Its

purpose is to divide

samples into

clusters, so that the

sum of squares of errors between the data samples in each

cluster and the mean value of this cluster is the smallest.

2. General Method and Process of

Customer Segmentation

3. Research Method

3.1 Clustering Analysis Algorithm

PROOF

DOI: 10.37394/232020.2022.2.17

Shuling Yang, Yan Hou

E-ISSN: 2732-9941

139

Volume 2, 2022

The work flow of K-means clustering algorithm is: firstly,

randomly select

samples as initial clustering centers, then

calculate the distance between each sample and the clustering

center, classify the samples into the class where the nearest

clustering center is located, and recalculate the adjusted

clustering center of the new class, repeating this process until

the clustering centers of the adjacent two times have not

changed, at this time, the adjustment of samples ends and the

algorithm has converged.

Let i

Cp∈,

represent a cluster,

represents the

sample point in

represents the center point (mean value)

, and the difference between

and

can be measured

( )

cpd ,, where

( )

cpd , represents the Euclidean

distance between two points

cp,

The quality of

can be measured by the objective function,

which represents the sum of error squares of all other sample

points in the center point

and

. Generally, there are the

following functions:

( ) ( )

∑∑ ∈=

cp i

pdIE 2

(1)

In which

( )

represents the sum of error squares of all

sample objects in the data set,

represents the sample point,

and

represents the center point of cluster

Execution flow of K-means algorithm:

Algorithm: K-means clustering algorithm

Input: data set D, size

, number of generated clusters

Output: divided

clusters.

Steps:

(1)Given a data set D with a sample point capacity of

;

Given the size of

, randomly select

sample points as

kjcj,,2,1, =

in the initial cluster.

(2)Cluster the data set

for the first time, calculate the

distance

( )

kjnicxd

,,2,1,,,2,1,,  ==

from other

sample points that are not the initial center to the center points

of each cluster, find out the minimum distance, divide the

sample points into the corresponding clusters, and find out the

designated

clusters.

(3)According to the obtained clusters, recalculate the mean

value of data samples in each cluster, and take it as the new

cluster center

kjcj,,2,1, =

(4)Repeat steps (2) and (3) until the clustering center points

of the previous two times do not change or the objective

function converges.

(5)Output the obtained

clusters.

Before K-means clustering, not only the value of cluster

number

should be given in advance, but also the initial

cluster center should be given. Therefore, the initial value has a

great influence on the clustering results. If the initial value is

not selected properly, the clustering results will have great

errors. Therefore, the K-means algorithm has a strong

dependence on the initial clustering center. If the selected initial

value is outlier data, the clustering criterion function will be

difficult to converge quickly and the clustering result will be

unstable, which is also a difficult problem in the K-means

algorithm.

In order to achieve better clustering results, we have made

corresponding algorithm improvements in view of the

shortcomings of k-means algorithm [15-16].

First of all, for the determination of the optimal

value,

we use the method of permutation and combination to exhaust

all cases, and the specific process is as follows:

Executing k-means algorithm once for all values

( )

nk ≤≤1

to obtain each corresponding clustering

result data;

Calculating the

value of the distance function after

clustering with different

values; Choose the one with the

smallest

value from all distance functions, and its

corresponding

value is the optimal

value we are looking

for.

Aiming at the defect that the initial center point of

k-means algorithm is blindly selected, we use a new distance

function to select the initial clustering center point according to

the distribution characteristics of sample data, so as to improve

the clustering speed and effect [17].

Set a data set Q, in which there are

data samples, the

final number of clusters is

, and the data variable is

. The

distance function between any two data points is defined as:

( ) ( ) ( )

( )

21 ,,,, pp yxyxyxyxd +++= 

(2)

In which: the minimum distance between data variable

and data set

is defined as:

( ) ( )( )

QyyxdQxd ∈= ,,min,

min

(3)

The maximum distance between data

and data set

defined as:

( ) ( )( )

QyyxdQxd ∈= ,,max,

max

(4)

The restriction conditions for data

to be included in

data set

are defined as follows:

( )( ) ( )( )

jidyxd

nyx

,min,max

,1 ≤≤

≤≤ −

(5)

The specific steps of improving the k-means algorithm are

as follows:

3.2 Improvement of K-means Algorithm

PROOF

DOI: 10.37394/232020.2022.2.17

Shuling Yang, Yan Hou

E-ISSN: 2732-9941

140

Volume 2, 2022

(1)Enter the data set and initialize the parameters.

(2)Run the iterative process.

1)Run the iterative process and get

clustering results. If

1HS=

goes to 3), otherwise proceed to the next step.

2)Check whether the clustering result converges, if so,

calculate

( )

kSil

and mark

1HS=

, and go to 4); Otherwise

turn back to the previous step.

3)Check whether the cluster center meets the convergence

condition, if it converges, get

clusters and calculate

max

Sil

if it is

( ) ( )

1−< kSilkSil

, then

1H=H+

;

0H=

when

( )

max

SilkSil

4)Check whether it is

kH >

; Whether

is 2; And

check whether the number of cycles meets the termination

condition, if so, go to 5), otherwise, go back to 1).

5)Test the number of optimal clusters corresponding to

max

Sil

. If it is 2, calculate the

Hartigan

index and compare

it.

6)Output cluster number and cluster center.

(3)The output cluster number

and cluster center initialize

k-means algorithm.

(4)Run k-means algorithm to get the final clustering result.

According to the above steps, draw the flow chart of the

improved k-means algorithm, as shown in Figure 2 below:

Figure 2 Flow chart of improved k-means algorithm

3.3.1Data acquisition

Today, with the vigorous development of e-commerce,

various e-commerce websites have accumulated a large amount

of data information, such as customer information, sales

information, commodity information, etc. These data play a

vital role in the formulation of sales strategies or promotion

strategies of enterprises [18-19]. How to obtain effective data

of customer segmentation is also the key to our success.

The data used in this paper comes from a co smetics

e-commerce website. Because there are a lot of complex data in

the data source, we need to extract the used data from the

database, including customer information table, commodity

information table, customer order table, etc.

3.3.2Data preprocessing

The original data usually has defects such as nonstandard,

repetitive and incomplete, etc. The original data can be repaired

by data cleaning, such as missing value processing, exclusion

of isolated points, deletion of noise data, etc., so as to make the

data consistent as much as possible.

In the enterprise database, when customers register their

accounts, some options are optional, and some options may be

sensitive information, which will cause customers to be

reluctant to fill in information, leaving a lot of missing values in

the database. Therefore, before data analysis, we must first deal

with the missing values. Commonly used methods to deal with

missing values include: manual processing, estimated filling

and so on.

Noise refers to repeated, wrong and incomplete data in data.

Error data can be detected by statistical principles [20].

Basically, the two positive and negative standard deviations of

data exceeding the mean value can be regarded as noise data.

Incomplete data is data with incomplete information. For

example, the commonly used language information left by

some customers is not complete enough, which will also

become potential noise data. As for the duplicate data, simply

put, it i s the data of repeated information. A customer's

consumption behavior is recorded as the same twice, which will

definitely have a wrong influence on the related analysis.

For each customer record with many dimensions, its attribute

types are not consistent, some are numeric, some are Boolean,

and some are text [21]. In order to facilitate the later correct

calculation, it is necessary to transform the source records to

meet the data mining standards. Here, the attribute values are

normalized and mapped to the [0,1] interval, and the calculation

formula is as follows:

( )

( ) ( )

iAA

Aminmax

min

−

∗

(6)

In the formula,

represents the

th attribute value of data

( ) ( )

AA max,min

represents the minimum and

maximum values of the

th attribute in all elements

respectively, and

∗

is the normalized value.

Because of the need to count the final subdivision results, the

REFERENCE attribute of customers is replaced by ordinal

numbers and recorded from 1, and each number represents a

customer.

3.3.3Establishment of customer behavior segmentation

model

Another important step before applying subdivision

3.3 Application of Improved k-means

Algorithm in &URVVERUGHU(FRPPHUFH

&XVWRPHU6HJPHQWDWLRQ

PROOF

DOI: 10.37394/232020.2022.2.17

Shuling Yang, Yan Hou

E-ISSN: 2732-9941

141

Volume 2, 2022

algorithm is to establish subdivision model, that is, which

subdivision method can get better subdivision results. After

determining the attribute index of subdivision, divide customer

groups by subdivision method, and finally extract the group

characteristics of each customer group.

In this application, after pre-processing the customer data, a

segmentation model is established according to the relevant

attributes of the customers, and then the data is integrated and

subdivided by using the selective clustering integration

algorithm, and different customer groups are divided according

to the segmentation results, and then corresponding marketing

suggestions are provided by extracting the group characteristics.

The specific process of the whole application case is shown in

Figure 3:

Figure 3 Overall flow chart of subdivision scheme

According to the characteristics of numerous customer data

attributes and different data types, customers are divided into

six groups according to the customer's Referranc R attribute. At

this time, each group only contains the customer ID.

In order to test the feasibility of the optimization algorithm

proposed in this paper, this section uses four different data sets

on UCI data set for testing. They are customers dataset, Iris

dataset, seeds dataset and Wine dataset. These data sets have

been specifically classified in UCI data set, so we can

accurately and intuitively calculate the accuracy and

experimental effect of each clustering algorithm on these data

sets.

Compare the initial clustering center selection time,

clustering accuracy, overall running time and other aspects, and

comprehensively compare the advantages and disadvantages of

each algorithm. The experimental environment is Windows 7,

64-bit operating system and eclipse integrated environment.

Because the traditional K-means clustering algorithm selects

the initial center point in a random way, there is no comparison

here, only the time spent by the OLD-means algorithm and the

text improved k-means algorithm in selecting the initial center

point.

The results obtained by these two algorithms are all fixed

when obtaining the initial center point, but according to the

operating conditions of the machine and platform, each

algorithm will have slight differences in time every time it runs,

so it runs on the experimental platform for 10 times to obtain

the average calculation time. As shown in Figure 5:

Figure 5 Time comparison of two algorithms in selecting

initial center point

As can be seen from Figure 5, for the given four kinds of data

sets, the time consumed by the improved k-means algorithm in

selecting the initial center point is reduced to varying degrees

compared with the OLD-means algorithm. Except for the

customers data set, the reduction rate of the other three data sets

is obvious.

However, Iris data set and Wine data set have more equal

density parameter values in the local range, and their value

jumps greatly in the overall range. Therefore, the improvement

measures in this paper are fully utilized in clustering, which

effectively shortens the time for finding the initial center point.

In each dataset, the traditional K-means clustering algorithm

is run 10 times, and the highest accuracy, the lowest accuracy

and the average accuracy are obtained respectively. However,

the results obtained by the OLD-means algorithm and the

improved k-means algorithm in this paper are fixed when

selecting the initial clustering center, so we only need to run

once to get better clustering accuracy. Fig. 6 shows the

accuracy of each algorithm on different data sets:

4. Analysis and Discussion

4.1 Performance Analysis of Improved

k-means Algorithm

PROOF

DOI: 10.37394/232020.2022.2.17

Shuling Yang, Yan Hou

E-ISSN: 2732-9941

142

Volume 2, 2022

Figure 6 Compare the accuracy of three algorithms

From Figure 6, we can see that the traditional K-means

algorithm has a great difference between the highest accuracy

and the lowest accuracy, so it has great instability. From Iris,

seeds and Wine data sets, we can see that the clustering

accuracy of the improved k-means algorithm has a cer tain

improvement compared with the OLD-means algorithm, and

the improved K-means data set has a lower improvement.

Below, from the perspective of the overall operation of

clustering algorithms, we compare the time consumed by the

three algorithms on different data sets, run them on the

experimental platform for 10 times, and take their respective

average values. Fig. 7 shows the overall time consumption of

the three algorithms.

Figure 7 Overall time-consuming of the three algorithms

It can be seen from Figure 7 that the traditional K-means

algorithm generates the initial cluster center randomly, so it

consumes less time than the other two algorithms, but it brings

instability and low accuracy. The latter two algorithms are

stable and have good clustering results, although it takes a lot of

time to calculate the density parameter values of sample points.

At the same time, it can be seen that the overall running time of

the improved k-means algorithm is shorter than that of the

OLD-means algorithm.

To sum up, by testing the time consumption of initial center

point selection, the accuracy of the algorithm and the overall

running time consumption of the algorithm, it is shown that the

improved k-means algorithm proposed in this paper has better

optimization effect.

According to the customer segmentation results obtained by

the improved k-means algorithm and the original data, we can

get the consumption characteristics of customers in the

following customer categories, as shown in Table 1.

Table 1 Customer consumption characteristics of

different customer categories

Customer

Category

Uneve

nly

distribute

Low

education

Uneve

nly

distribute

Low

income

Figure 8 Customer ratio and value ratio

According to the comparison of customer quantity ratio and

value ratio in the above table, it can be concluded that:

Category 4 has the largest number of customers, but the

created value is not high, so we call it lead customers, and we

don't need to invest too much resources in these customers;

Category 3 customers, although the number is the smallest,

have created higher value, and they belong to potential

customers. Enterprises should try their best to retain and

develop these customers closer to L customers;

Category 2 customers account for an average number and

create less value. Enterprises should use part of resources to

keep such customers close to Category 3 customers.

Category 1 customers are not the largest in number, but they

have created nearly 50% of profits. Therefore, this class of

customers is called Platinum customers. They have created

huge profits for enterprises, and enterprises should focus on

maintaining these customers with limited resources.

In recent years, with the rapid development of information

technology and cross-border electronic commerce, a h uge

amount of commercial information has been accumulated in the

database of cross-border e-commerce enterprises. If we can

extract useful information from the massive and complicated

information, make use of it and implement high-precision

marketing schemes, we can help enterprises maintain and

develop their own resource advantages in the fierce global

competition. In this paper, the advantages and disadvantages of

clustering analysis algorithm are analyzed, and a cl ustering

algorithm based on quadratic density optimization is proposed

on the premise of deep understanding of clustering analysis.

Based on the improved K-means clustering algorithm, the base

clustering algorithm is selected by extracting the sample subset,

calculating the standard mutual information value, and the

clustering integration algorithm is implemented by using voting

rules. This model can obtain more robust and accurate

clustering results by further combining multiple clustering

fusion results. In some test data sets and the actual project

implementation process, the model has achieved satisfactory

clustering results.

In this paper, the customer segmentation results are

divided into customer segmentation based on customer value

and customer segmentation based on customer behavior, and

the corresponding marketing strategies are given. However,

there is no proposal to combine the two aspects to analyze the

evaluation results. Therefore, in the next step, we can study the

two aspects together to see if it can provide some help to the

enterprise marketing strategy.

This work is supported by 2019 Doctoral Research Startup

Project of Jilin Normal University(Project No. 2019037).

[1] WU Qiping, WU Chengmao. A fast and robust clustering segmentation

algorithm for kernel space graphics[J].caai transactions on intelligent

systems, 2019, 014(004):804-811.

[2] Liu Z, Xiang B, Song Y, et al. An Improved Unsupervised Image

Segmentation Method Based on Multi-Objective Particle Swarm

Optimization Clustering Algorithm[J]. Computer, Materials and

Continuum, 2019(2):11.

[3] Li M. Study on the Grouping of Patients with Chronic Infectious Diseases

Based on Data Mining[J]. Journal of Biosciences and Medicines, 2019,

07(11):119-135.

[4] Hartini S, Gata W, Kurniawan S, et al. Cosmetics Customer

Segmentation and Profile in Indonesia Using Clustering and

Classification Algorithm[J]. Journal of Physics Conference Series, 2020,

1641:012001.

[5] Chinta S S. Kernelised Rough Sets Based Clustering Algorithms Fused

With Firefly Algorithm for Image Segmentation[J]. International Journal

of Fuzzy System Applications, 2019, 8(4):25-38.

[6] Liu F. 3D Block Matching Algorithm in Concealed Image Recognition

and E-Commerce Customer Segmentation[J]. IEEE Sensors Journal,

2019, PP(99):1-1.

[7] Nurmalasari, Mukhayaroh A, Marlina S, et al. Implementation of

Clustering Algorithm Method for Customer Segmentation[J]. Journal of

Computational and Theoretical Nanoscience, 2020, 17(2):1388-1395.

[8] Audrey, Guo. Four Keys of Cross-border E-commerce After Orders[J].

2021(2015-2):12-13.

[9] Long V T. Research on the Influence of Transportation Services Quality

on Purchasing Intention of Customer in E-Commerce - Evidence from

Purchasing Intention of Vietnamese Consumer in Cosmetic Industry[J].

International Journal of Social Science and Education Research, 2020,

3(5):45-53.

5. Conclusion

Acknowledgments

References

PROOF

DOI: 10.37394/232020.2022.2.17

Shuling Yang, Yan Hou

E-ISSN: 2732-9941

144

Volume 2, 2022

[10] Zhang Z, Y Wang, Yang J. Deep quantised portrait matting[J]. IET

Computer Vision, 2020, 14(6):339-349.

[11] Zhang Shi. Application of SOM-K-Means Cluster Algorithm in Customer

Segmentation of Retail Bank SOM-K-Means[J]. Journal of panzhihua

university, 2019, 036(005):66-70.

[12] Hu F, Chen H, Wang X. An Intuitionistic Kernel-Based Fuzzy C-Means

Clustering Algorithm With Local Information for Power Equipment

Image Segmentation[J]. IEEE Access, 2020, 8:4500-4514.

[13] Brust A F, Payton E J, Hobbs T J, et al. Application of the Maximum

Flow–Minimum Cut Algorithm to Segmentation and Clustering of

Materials Datasets[J]. Microscopy and Microanalysis, 2019,

25(4):924-941.

[14] Singh H. Improving Customer Segmentation in E-Commerce using

Predictive Neural Network[J]. International Journal of Advanced Trends

in Computer Science and Engineering, 2020, 9(2):2326-2331.

[15] Lei X, Ouyang H, Xu L. Kernel-Distance-Based Intuitionistic Fuzzy

c-Means Clustering Algorithm and Its Application[J]. Pattern

Recognition and Image Analysis, 2019, 29(4):592-597.

[16] Hasan F J, Mia M J. Association Rules and Clustering on Sparse Data of a

Leading Online Retailer[J]. International Journal of Computer Science

and Information Security, 2019, 17(4):112-116.

[17] Zhao H H, Luo X C, Ma R, et al. An Extended Regularized K-Means

Clustering Approach for High-Dimensional Customer Segmentation

With Correlated Variables[J]. IEEE Access, 2021, PP(99):1-1.

[18] Xie Zhongyang. Application and exploration of Python-based clustering

method in e-commerce customer segmentation[J]. Digital Technology

and Application, 2019, 37(03):230-231.

[19] Deng Xiaoyi, Jin Chun, Higuchi Ryoyuki, etc. KSP hybrid clustering

algorithm for customer segmentation in mobile commerce[J].

2021(2011-4):54-61.

[20] Wang Jian, Yin Lifei, Xu Jiadong. Research and application of LNG

customer segmentation based on cluster analysis[J]. Chemical

Management, 2019(6): 2.

[21] Huang Xuejin, Xie Chunxia. Research on the STP strategy of

Chongqing's cross-border e-commerce retail export[J]. Jiangsu Business

Forum, 2020(6): 4.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the Creative

Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en_US

PROOF

DOI: 10.37394/232020.2022.2.17

Shuling Yang, Yan Hou

E-ISSN: 2732-9941

145

Volume 2, 2022