Corporate Accounting Management Risks Integrating Improved
Association Rules and Data Mining
HAIYAN LI
Claro M Recto Academy of Advanced Studies,
Lyceum of the Philippines University,
PHILIPPINES
Abstract: - With the development of the times, enterprises need to face more data in operational decision-
making. Traditional data analysis strategies cannot handle the growing amount of data well, and the accuracy of
analysis will also decrease when faced with uneven data types. The research uses a corporate accounting
management risk analysis technology that combines big data algorithms and improved clustering algorithms.
This method combines big data processing ideas with a clustering algorithm that incorporates improved
weighting parameters. The results show that on the data sets DS1, DS2, and DS3, the NMI values of the GMM
algorithm are all 0; while the NMI values of the MCM algorithm correspond to 0.9291, 0.9088 and 0.8881
respectively. At the same time, the Macro-F1 values of the Verify2 algorithm correspond to 0.9979, 0.9501,
and 0.9375 respectively, and the recognition accuracy of the data remains above 85%. In the running time
comparison, when the number of samples in the data set reaches 5,000, the calculation time of the Verify2
algorithm remains within 5 seconds. In terms of practical application results, the study selected the profitability
risk indicators of 40 companies for analysis. After conducting risk ratings, it can be seen that companies No. 5,
6, 7, and 39 have the highest risk levels, and companies No. 33 and 34 have the highest risk levels. The lowest
level. After conducting risk assessments on the 40 selected listed companies, the risk level of net asset income
of each company remained at level 5, and the risk level of earnings per share remained at level 3. The above
results show that this technology has good performance in terms of calculation accuracy and calculation time,
can assess enterprise risks, and can provide data support for enterprise operation decisions.
Key-Words: - Data mining; Big data; Clustering algorithm; Risk assessment; Association rules; Enterprise
operation decisions.
Received: January 12, 2024. Revised: April 14, 2024. Accepted: June 16, 2024. Published: July 22, 2024.
1 Introduction
With the development of economic globalization
and computer technology, the market economy has
entered a new stage, [1]. More and more market
data need to be mined, analyzed, and processed.
Enterprises that can take advantage of the new
environment must have excellent data processing
and analysis capabilities, [2], [3]. Accounting
management risks not only affect the financial
health of enterprises but also directly affect their
sustainable development and market
competitiveness. Therefore, accurately identifying
and effectively controlling risks in accounting
management has become an important issue in
enterprise management. Traditional accounting
management risk analysis mainly relies on the
analysis of financial statements and internal
auditing. Although these methods can to some
extent reveal the financial situation of enterprises,
they have obvious shortcomings in handling large
amounts of complex data, predicting potential risks,
and exploring deep-seated risk factors, [4]. As
enterprises need to consider more and more data in
the risk decision-making process, traditional data
analysis methods cannot effectively handle complex
data information, and it is difficult to solve the
problem of data unevenness and data type. In recent
years, various artificial intelligence algorithms have
developed rapidly. Among them, improved
association rules can improve the efficiency and
accuracy of data processing through optimization
algorithms. The powerful ability of data mining
technology to process big data makes this method
suitable for dealing with complex enterprises. More
advantages when managing problems, [5]. The
application of this method not only helps enterprises
discover and respond to financial risks promptly, but
also promotes the optimization of internal controls
and scientific management decisions. Against this
background, this experiment proposes a new method
that integrates improved association rules and data
mining technology, aiming to analyze corporate
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
348
Volume 12, 2024
accounting management risks more
comprehensively and in-depth. In the process, the
data analysis process was first set up and the risk
assessment indicators for corporate accounting
management were selected; then a data mining
algorithm was selected to process the corporate data,
and then a cluster analysis algorithm and a big data
algorithm were combined to complete the analysis
of the corporate data. Effective mining of
accounting management-related data; finally, the
effectiveness and reliability of the proposed method
are verified through performance testing and
application analysis.
There are two innovative points in the research.
The first point is that the research uses a
combination of big data and clustering algorithms.
The second point is that the study combines
enterprise risk assessment with accounting work
experience and conducts risk assessment on
enterprises from a professional perspective.
The main content of the research is divided into
four parts. The first part is a summary of the current
domestic and foreign research on corporate
accounting management risk technologies related to
big data and improved clustering algorithms; the
second part is an introduction to the corporate
accounting management method proposed by the
laboratory that integrates improved association rules
and data mining; the third part is the performance
analysis and application effect testing of the
constructed algorithm; the fourth part is a summary
of the methods and results of the entire article, and
also analyzes the development direction of future
research.
2 Literature Review
Management risk research can reduce corporate
risks and increase industry profits. With the
development of new technologies, management risk
assessment research has attracted more and more
attention from scholars. [6], used the K-Means data
mining method when classifying poverty lines
according to counties/cities in North Sumatra
Province to understand the poverty risk in cities.
The research results provide information for the
economic allocation of the North Sumatra
government to further overcome the poverty
problem in the region. [7], used the particle swarm
optimization algorithm when studying management
risks in public sector organizations. The research
results indicate that RM issues are not well
integrated at the MACS level, a thorough cultural
change is still needed, and future RM research must
provide empirical data on integration in practice to
reduce management risks. [8], used the R
bibliometric application to process and analyze data
when studying the development trends of
environmental accounting published in domestic
and foreign journals, Research results show that the
most popular keywords at the moment are energy,
environment, and assessment. [9], investigated how
students view Facebook's help in accounting
learning from aspects such as ease of use,
usefulness, attitudes towards Facebook usage
activities, and student performance. The findings
indicate that student performance and course
learning outcomes are most likely to improve when
students actively participate through the course
Facebook group. [10], used a method of readjusting
learning and teaching strategies when studying the
impact and response measures of COVID-19 in
accounting education. The findings indicate that it
identifies issues that need to be addressed during the
recovery and redesign phases of crisis management
and sets a new research agenda for accounting
education research.
[11], used data mining methods when
summarizing the use of traditional Chinese medicine
to preserve ejection fraction in the treatment of heart
failure. The database was established using
Microsoft Excel 2019, and then the apriori
algorithm and hclust function were used in R-Studio
(version 4.0.3) for association rule analysis and
hierarchical clustering analysis respectively.
Research results show that the treatment methods
for this disease are to replenish qi, warm yang,
activate blood circulation, and diuresis. Astragalus
and salvia are the basic compatibility of traditional
Chinese medicine. [12], used data mining methods
when studying the prescription patterns of different
dosage forms of Chinese herbal medicine in the
treatment of rheumatoid arthritis (RA) and their
impact on immune and inflammatory indicators.
Each prescribed herbal medicine was quantified and
standardized against the knowledge base to build a
database of RA treatment formulas. The research
results show that immune and inflammatory
indicators have been significantly improved after
treatment with traditional Chinese medicine
granules and decoction pieces, and there is a long-
term correlation between comprehensive evaluation
indicators and intervention measures. [13],
conducted a systematic and comprehensive review
of various data mining tasks and techniques when
studying research trends in the field of data mining.
The research results introduce various practical
applications of data mining and challenges and
problems faced by the field of data mining research.
When studying the main impact of e-learners'
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
349
Volume 12, 2024
satisfaction on the e-learning process, [14], used
data mining technology to identify relevant factors
that affect student learning outcomes. The research
results illustrate the impact of e-learning on student
performance. When studying spatial data mining
components, scholars such as [15], adopted the
extended adjacency spatial clustering method based
on density and grid and used middleware
technology to complete an agricultural geographic
information system based on MapXtreme. The
research results show that this method solves the
problem of agricultural informatization and
improves the optimization performance.
Through analysis of existing literature, we found
that although traditional accounting management
methods can identify and control risks to a certain
extent, they have limitations in processing large-
scale data sets, predicting future trends, and
revealing deep correlations. In contrast, improved
association rules and data mining technology can
more comprehensively identify risks and predict
possible risk trends through efficient data analysis
capabilities, thereby providing a more scientific
basis for corporate decision-making. In addition,
current research also emphasizes the important role
of data mining technology in discovering hidden
patterns, identifying abnormal behaviors, and
optimizing internal controls of enterprises.
Combined with improved association rules, this
approach can improve the effectiveness of enterprise
risk management while ensuring the accuracy and
efficiency of data analysis. Given this, the
experiment proposes a corporate accounting
management risk assessment method that integrates
association rules and data mining to further analyze
the application of this method in different types of
enterprises and diversified business environments,
and how to better integrate these technologies into
the enterprise's risk management framework to
provide more effective support for the sustainable
development and risk control of enterprises.
3 Model Construction of Risk
Analysis Technology that Combines
Big Data Algorithms and Cluster
Analysis Algorithms
With the development of society, the pressure of
data analysis faced by enterprises in the decision-
making process is gradually increasing. The study
designed data mining technology that combines big
data and clustering algorithms to solve this problem.
The study introduces the operating ideas of the risk
assessment model and the selection criteria of risk
factors. According to the characteristics of corporate
accounting data, a more suitable improved
clustering algorithm is selected to process these
data, and the design ideas of the improved clustering
algorithm are introduced.
3.1 The Construction Idea of Risk Analysis
Technical Model and the Selection
Method of Risk Assessment Indicators
For a long time, accounting work has focused on
financial accounting and ignored the importance of
management accounting. With the development of
global economic integration, complex market
information has brought greater pressure to
corporate decision-making. At the same time, the
development of Internet-related technologies also
provides support for accounting management work,
[16]. Management accounting work based on data
mining can conduct value analysis on massive data
and provide information support for corporate
decision-making. In data mining work, research is
based on big data methods and combined with
cluster analysis algorithms to improve the
performance of data analysis algorithms. During the
analysis process, due to differences in data types
and sources, data analysis algorithms need to
combine multiple processing algorithms to build a
complete application process. The study set up the
data analysis process based on the common data
types in corporate accounting work. The specific
flow chart is shown in Figure 1.
Business Understanding
Stage
Data Understanding
Stage
Data acceptance
stage
ModelingEvaluationDeployment phase
Fig. 1: Workflow diagram of data mining
In Figure 1, the study divides the data mining
process into the business understanding stage, data
understanding stage, data preparation stage, model
establishment stage, model evaluation stage, and
algorithm deployment stage. Among them, the main
purpose of the business understanding stage is to set
analysis indicators based on the objective nature of
the problem and initially judge the number and type
of influencing factors. The main purpose of the data
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
350
Volume 12, 2024
understanding stage is to quantitatively analyze the
indicators set in the business understanding stage
and convert various types of abstract data into
quantifiable standards to facilitate subsequent data
classification work, [17]. The main work of the data
preparation stage is to sort out the collected
quantitative data, filter out erroneous data, and
reduce the dimensionality of the data to improve the
speed of data processing. In the subsequent model
establishment and evaluation stages, the data is
sorted to establish an analysis model of the data, and
then through multiple iterations, the actual fit of the
model is gradually improved. Finally, in the
algorithm deployment stage, the results obtained
from data mining are analyzed with the actual
situation and finally compiled into a report or
management accounting report. In the selection of
enterprise risk analysis indicators, we studied the
selection criteria of enterprise financial capabilities
as indicators, and finally formulated four indicators.
Among them, the profitability indicators of the
enterprise are shown in Figure 2.
Net profit Gross margin Roe Basic earnings
per share
Return on
Total Assets
Corporate profitability
Fig. 2: Enterprise profitability indicators
Figure 2 briefly describes the influencing factors
of a company's profitability. Corporate profitability
reflects a company's ability to create profits. Among
them, net profit margin and gross profit margin
reflect the company's ability to generate income
within a certain period, the company's ability to
resist risks, and the company's operational fault
tolerance. The subsequent three indicators describe
the investment value of the company, will affect the
company's financing ability, and reflect the
company's development potential. Afterward, the
study set up the solvency indicators of the
enterprise. The specific solvency indicators are
shown in Table 1.
Table 1. Debt paying ability indicators of enterprises
Financial
index
Quick ratio
Current ratio
Cash ratio
Asset liability
ratio
Interest
coverage ratio
Table 1, the solvency of an enterprise is divided
into long-term solvency and short-term solvency.
Although the solvency will not affect the operation
of the enterprise, it will affect the credit of the
enterprise, [18]. Indirectly reduce the financing
ability of enterprises and increase the difficulty of
enterprise development. After setting evaluation
indicators for the company's profitability and
solvency, the study also analyzed the company's
operational capabilities. The specific operational
capability evaluation indicators are shown in Figure
3.
Enterprise operational
capability
Total Asset
turnover
Inventory turnover
rate
Accounts receivable
turnover rate
Fig. 3: Operational capability indicators of
enterprises
Figure 3, the study uses the company's turnover
capacity as an evaluation index to measure the
quality of the company's operating capabilities. The
liquidity of an enterprise reflects its good risk
tolerance and good credit in the market. Therefore,
it can show better resilience when encountering
operational risks. Finally, the study also included
the growth ability of enterprises as one of the
evaluation indicators of enterprise risk management.
Among them, the various evaluation factors of
specific enterprise growth capability indicators are
shown in Figure 4.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
351
Volume 12, 2024
Total Assets
Growth Rate
Operating revenue
growth rate
Net profit
growth rate
Enterprise Growth
Capability
Fig. 4: Growth capability indicators of enterprises
Figure 4, the assessment of corporate growth
capabilities reflects the company's ability to
continuously exaggerate its development potential
through business activities. Among the enterprise
growth capability indicators, the total assets growth
rate reflects the overall growth trend of the
enterprise. The revenue growth rate reflects the
current growth rate of the company and also reflects
the increase in the number of products sold and the
number of services provided by the company. Net
profit growth reflects the growth rate of the
company's disposable funds and the error tolerance
of risk decisions.
3.2 Algorithm Design Ideas and Key
Parameter Solution Methods for Risk
Assessment Technology
After completing the setting of enterprise risk
assessment indicators, it is necessary to select a data
mining algorithm to process enterprise data. The
study uses a combination of cluster analysis
algorithm and big data algorithm to complete the
data mining task in enterprise risk management
analysis, [19]. Since accounting risk management
belongs to non-uniform data, the research uses a
Spark-based non-uniform data clustering algorithm
to perform data mining, [20]. Traditional clustering
algorithms require more iterative calculations when
performing cluster division tasks, but Spark changes
the calculation mode of traditional clustering
algorithms and reduces the computational
complexity. The specific parallel computing process
is shown in Figure 5.
Start Read Create
RDD Object
Data
vectorization
and caching
Randomly
select k initial
cluster center
points
Map data
objects to the
nearest cluster
Reduce global
cluster
partitioning and
update cluster
centers
Output
clustering
results
Convergence or not
End
Y
N
Fig. 5: Improved clustering algorithm
Figure 5, the parallel processing idea can be
mainly divided into six stages. In the first stage, data
is imported locally and converted into RDD objects
to divide task nodes. In the second stage, the data is
initialized and the representative points of the initial
clusters are obtained, and then the representative
points are imported into all working nodes. In the
third stage, the sample points are divided into
clusters based on the center point. In the fourth
stage, the global cluster center node is updated, and
after the returned data is obtained, the global cluster
center point is calculated. In the fifth stage, the
results are analyzed based on actual needs and it is
decided whether it is necessary to continue iteration.
In the sixth stage, the clustering results are output
and the evaluation conclusions of corporate
accounting risk management are drawn. In the
process of constructing a probability model for non-
uniform data, there are often differences in cluster
densities. To characterize this difference, it is
necessary to design a
j
probability density function
based on the attributes of the sample. The specific
form is shown in formula (1).
2
2
1
( , , , ) exp
22
j kj
j kj kj k
k
kj kj
xv
p x v w
ww








(1)
In formula (1),
k
C
represents the variance of the
cluster. Because in actual situations, the probability
of an attribute is often multi-dimensional. To
simulate this situation, the study estimates the
probability of the vector through the product of a set
of marginal distribution variables based on the
formula (1). The form of the density function
changes as shown in formula (2).
1
( , , , ) ( , , , )
D
j kj kj k j kj kj k
j
p x v w p x v w

(2)
Formula (2),
()Px
represents the probability
density of any sample in the cluster. After obtaining
the probability density change of any sample, it is
also necessary to consider that the same data may be
included in multiple clusters. Therefore, constraints
need to be set, as shown in formula (3).
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
352
Volume 12, 2024
1
:0
1
k
K
k
k
k

(3)
Formula (3),
k
a
is the size mark of the cluster. In
the process of data processing, the size of the cluster
needs to represent the weight information of each
cluster. Therefore, the model of non-uniform data
should be rewritten as shown in formula (4).
1
( , , , )
k
K
k k k k
k x C
L p x v w




(4)
Formula (4), is the set of group parameters.
Based on the above model, the clustering problem
can be transformed into DB to obtain optimized
parameters to maximize the weighted likelihood.
The specific form of the data processing model
should be as shown in formula (5),
11
1
max ln ln ( , , , )
k
K
K
k k k k k
kk x C
J C p x v w



(5)
Formula (5), the model undergoes logarithmic
transformation based on the original model. Put the
reality obtained after transformation into formula (1)
and formula (2), you can remove the constant terms
in the formula and rewrite the formula. The specific
form of the rewritten formula is shown in formula
(6),
2
2
22
1 1 1
11
()
11
max ln ln
22
k
DD
K K K jk j kj
k
k k k
k k k
j j x C
kj k
w x v
J C C w
(6)
Formula (6), is the optimization objective
function of the algorithm and is a constant term.
Differences between different clusters are
represented by differences in constant terms. In the
clustering algorithm, it is necessary to constrain the
value of the feature weight of the algorithm. Among
them, the specific form of the traditional feature
weight constraint is shown in formula (7),
1
1
01
1, 2, ,
D
kj
j
kj
w
w
jD

(7)
Formula (7), although the traditional clustering
algorithm's method of constraining feature weights
can well solve the problem of sample feature
information loss, it will affect the algorithm's effect
on feature selection. Therefore, the study adopts an
improved feature weight constraint formula, the
specific form of which is shown in formula (8),
11
0
1, 2, ,
D
kj
j
kj
w
w
jD
(8)
Formula (8), the weight constraint used
effectively amplifies the differences between
features and improves the classification accuracy of
the algorithm. Afterwards, the Lagrange multiplier
method is used to introduce the constraints of the
feature weight parameters into the objective
function, and the form of the objective function can
be rewritten as shown in formula (9),
1
2
11
1
(1 ) (1 )
D
KK
k kj k
kk
j
J J w


(9)
Formula (9), is the Lagrange multiplier. Since
the above objective function is a nonlinear function,
it is difficult to obtain the global optimal solution.
Therefore, it is necessary to rewrite the formula and
obtain each parameter separately. Fix the parameters
,,w v x
and obtain the parameters
G
. The calculation
formula of the parameters is as shown in formula
(10),
1
2
2
1
arg max
()
exp
22
ki
kD
ji
j
Dkj kj ij kj
ijkk
Gx
zGx
ww x v
Gx






(10)
Formula (10), by comparing the probability of
each Gaussian component, the sample is divided
into the cluster with the highest probability, and the
parameters are obtained. After calculating the
parameters
k
, the solution of the parameters is
shown in formula (11),
k
k
C
N
(11)
Formula (11),
2
k
is the number of samples in the
cluster, and is
ij
w
the total number of samples, The
variance expression of each cluster can be obtained
through the different number of samples between
each cluster in non-uniform data (12),
2
1
2
()
ik
DK
kj ij kj
j x C
k
k
w x v
DC


(12)
Formula (12), after obtaining the variance
expression of each cluster. Fixed parameters
,wx
,
calculated parameters
v
, and the expression of the
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
353
Volume 12, 2024
parameters can be obtained as shown in formula
(13),
/
ik
kj ij k
xC
v x C
(13)
In
,vx
formula (13), by fixing the parameters
and calculating the parameters
w
, the parameter
expression (14) can be obtained,
2
k
kj k k
kj
wC X

(14)
Formula (14), the specific expression of the
parameters is as shown in formula (15),
2
1
2
1
()
11
()
ik
kj ij kj
xC
DD
kk
jkj
k
X x v
C
X






(15)
Formula (15), after limiting the value range of
the parameters and obtaining each parameter
separately. The clustering algorithm is constructed.
4 Performance Analysis and
Application Effects of Risk
Assessment Technology Combining
Big Data Algorithms and Improved
Clustering Algorithms
In actual production and life, enterprise decision-
making often has to face complex data types and
huge amounts of data. To better deal with these
problems, the research uses a non-uniform
clustering algorithm to complete the problems of
data mining and data processing. To study the data
processing capabilities of the clustering algorithm,
the study used a random number generation function
to generate three data sets. By changing the data
dimensions in the data set and setting the variance to
enhance the data dispersion, we simulate the
situation of uneven data types in actual situations.
Among them, the specific situation of the data set is
shown in Table 2.
Table 2. Characteristics of synthetic datasets
Datasets
Clusters
Dimensions
Variance
DS1
20:50:00
10
0.21:0.14
DS2
2000:100
50
0.90:0.64
DS3
5000:200
100
1.64:1.34
Table 2, the three synthetic data gradually
increase in sample number and data complexity to
examine the data processing performance of the
improved clustering algorithm. To verify the
performance of the improved clustering algorithm,
the study selected the GMM algorithm and MCN
algorithm to compare the performance of the
improved clustering algorithm. To measure the
algorithm performance of the improved algorithm,
the specific comparison results are shown in Table
3.
Table 3. Clustering results of different algorithms
Measure
Datasets
MCM
GMM
Verify2
Macro-F1
DS1
0.4237
0.5595
0.9979
DS2
0.4132
0.5331
0.9501
DS3
0.4477
0.5148
0.9375
NMI
DS1
0.9291
0
0.0058
DS2
0.9088
0
0.0147
DS3
0.8881
0
0.0320
In Table 3, on the three data sets of DS1, DS2,
and DS3, the F1 values of the Verify2 algorithm are
0.9979, 0.9501, and 0.9375 respectively; the F1
values of the GMM algorithm are 0.5595, 0.5331,
and 0.5148 respectively; the F1 values of the MCM
algorithm correspond to 0.4237, 0.4132 and 0.4477.
In addition, the NMI values of the GMM algorithm
on the three data sets are 0, while the NMI values of
the MCM algorithm on the three data sets
correspond to 0.9291, 0.9088, and 0.8881
respectively; in addition, the NMI values of the
Verify2 algorithm correspond to 0.0058, 0.0147,
and 0.0320 respectively. This shows that the
improved clustering algorithm shows higher
recognition accuracy, and the recognition accuracy
of the data remains above 85%. Since the amount of
market data is generally large, to better demonstrate
the operation of the three algorithms on synthetic
data sets. The study tested the robustness of the
three algorithms and tested the change curve of the
calculation time by gradually increasing the number
of samples. The calculation time consumption of
different algorithms was statistically calculated. The
specific situation is shown in Figure 6.
0
2
4
6
8
10
12
14
16
01000 2000 3000 4000 5000
Calculation time (s)
Number of samples
Venify2 MCN GMM
Fig. 6: Running time of different algorithms on the
dataset
In Figure 6, as the amount of data increases, the
running time of the three algorithms shows an
increase in varying degrees. When the amount of
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
354
Volume 12, 2024
data increases to 5000, the running time of the three
algorithms has a maximum value. At this time, the
corresponding running times of the GMM, MCM,
and Verify2 algorithms are 14.54s, 6.89s, and 4.98s
respectively. Throughout the entire experiment, the
running time of the Verify2 algorithm has been
lower than that of the other two algorithms. When
the amount of data reaches 5,000, the running time
of the Verify2 algorithm is always less than 5
seconds. The reason why the calculation time of the
GMM algorithm is significantly lower than that of
the Verify2 algorithm is that the Verify2 algorithm
uses a spectral clustering algorithm, which makes
the improved clustering algorithm save more time
when performing matrix operations and is less
affected by matrix operations. The comparison
shows that the Verify2 algorithm took a shorter time
during the experiment, but the amount of sample
data processed was equivalent to the other two
methods. This shows that when the Verify2
algorithm is applied to the enterprise accounting
management system, the system runs faster and
more efficiently, processes data faster than the other
two algorithms, and has strong data processing
capabilities. Then, non-uniform clustering is
combined with big data algorithms to deal with the
prediction process of corporate accounting risks.
The study combines the 16 corporate risk indicators
mentioned above, including gross profit margin, net
profit margin, return on equity, basic earnings per
share, return on total assets, quick ratio, current
ratio, cash ratio, asset-liability ratio, and interest
coverage ratio., total asset turnover rate, accounts
receivable turnover rate, inventory turnover rate,
total asset growth rate, net profit growth rate, and
operating income growth rate, respectively, are set
to the 16 English letters of ap, To explore the
correlation between each influencing factor and
enterprise risk, the Pearson coefficient between each
risk indicator is shown in Figure 7.
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
m n o p
Correlation coefficient
Coefficient type
(d) Parameter relationship of m-p
a b c d e f g h i j k l m n o p
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
i j k l
Correlation coefficient
Coefficient type
(c) Parameter relationship of i-l
a b c d e f g h i j k l m n o p
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
a b c d e f g h i j k l m n o p
Correlation coefficient
Coefficient type
(b) Parameter relationship of e-h
e f g h
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
a b c d e f g h i j k l m n o p
Correlation coefficient
Coefficient type
(a) Parameter relationship of a-d
a b c d
Fig. 7: Correlation coefficients of different risk
indicators
Figure 7, by analyzing and processing the
correlation of risk factors, the multi-dimensional
situation of each risk factor can be obtained. Since
the risk factors at this time are generally discretized,
although the specific values of each factor can be
obtained, the risk level reflected by the data is not
yet clear. Therefore, it is also necessary to transform
risk factors from discrete data into continuous data
through relevant financial data indicators. After
completing the data processing of the influencing
factors, the risk factors are brought into the
improved non-uniform clustering algorithm to
analyze and predict enterprise risks. To reflect the
performance of accounting risk management
technology, the study selected the profitability risk
indicators of 40 companies for analysis and
conducted statistics and risk ratings on their gross
profit margin, net profit margin, net income from
assets, and earnings per share. The specific values
and risk rating data of interest rate and net interest
rate are shown in Figure 8.
-100
-80
-60
-40
-20
0
20
40
0 5 10 15 20 25 30 35 40 45
Profit margin (%)
Company serial number
(a) Gross profit margin and net profit
margin curve
Gross
margin
Net
profit
0
1
2
3
4
5
6
7
8
9
10
1 3 5 7 9 1113 15 1719 21 2325 27 29 3133 35 37 39
Risk score
Company serial number
(b) Gross profit margin and net profit margin
risk scores
Gross profit margin
risk score
Net interest rate
risk score
Fig. 8: Comparison of changes in gross profit
margin and net profit margin before and after the
transformation
Figure 8(a) shows the changes in gross profit
margin and net profit margin curves. It can be found
that as the company's serial number changes, the
values of net profit margin and gross profit margin
change in waves. During the change in net interest
rate, Company No. 39 has the highest net interest
rate value, which is as high as 4.56; Company No.
23 has the smallest net interest rate value, which is -
62.85. During the change in gross profit margin,
Company 13 has the largest gross profit margin,
with a value as high as 25.89; Company No. 6 has
the smallest gross profit margin, with a value of -
3.44. Figure 8(b) shows the changes in gross profit
margin and net profit margin risk scores. It can be
seen that experiments convert data into specific risk
levels. After conducting the risk rating, it can be
seen that companies No. 5, 6, 7, and 39 have the
highest risk levels, while companies No. 33 and 34
have the lowest risk levels. After mining and
processing corporate revenue data, the risk
management technology integrated with big data
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
355
Volume 12, 2024
reflects the company's risk profile. Next, the
experiment selected 40 companies and analyzed
their net asset income and earnings per share. The
specific results are shown in Figure 9.
-120
-100
-80
-60
-40
-20
0
20
Return on equity (% )
Company serial number
(a) Return on equity curve
Net asset income
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
10 11 10 15 20 25 30 35 40 45
Eearnings per share
Company serial number
(b) Earnings per share curve
Earnings per share
0
1
2
3
4
5
6
7
8
9
10
15 9 13 17 21 25 29 33 37
Risk score
Company serial number
(c) Return on equity and risk score of earnings per share
Net asset income risk score Earnings per share risk score
40
10 11 10 15 20 25 30 35 40 45
Fig. 9: Radar chart of net asset income and earnings
per share before and after restructuring
In Figure 9, it can be seen that companies No.
24, 26, 38, and 39 have higher income risks, while
companies No. 14 and 15 have lower income risks.
By analyzing the income data of each enterprise, a
clear risk level can be provided as a data reference
for enterprise investment and risk avoidance from
an accounting perspective. For example, Company
No. 39 not only has a high risk of gross profit
margin and net profit rate but also has a high-risk
level of net asset income and earnings per share,
making it a very dangerous investment object. The
shares of companies No. 14 and 15 have higher
returns, which means they are better investment
targets. The gross profit margin and net income risk
levels of Company No. 33 are low, which means
that the company is making steady profits. Given
this, it can be seen that data mining technology that
combines big data and clustering algorithms can
provide information support for enterprise risk
assessment and quantitative scoring of various
enterprise data.
5 Conclusion
With the development of computer technology and
the acceleration of global economic processes,
enterprises need to face increasing competitive
risks. Traditional risk assessment methods cannot
adapt well to larger data structures and uneven data
types. Therefore, the study uses accounting risk
management technology that combines big data
algorithms and non-uniform clustering algorithms to
provide data support for corporate risk decisions.
Experimental results show that on the three data sets
of DS1, DS2, and DS3, the maximum F1 values of
the GMM algorithm and the MCM algorithm are
0.5595 and 0.4477 respectively; while the maximum
F1 value of the Verify2 algorithm is 0.9979. At the
same time, on the three data sets, the NMI value of
the GMM algorithm is 0. In the comparison of
different running times, as the amount of data
continues to increase, the running time of the
Verify2 algorithm has always been at the minimum.
When the sample data volume reaches 5,000, the
running time of the Verify2 algorithm still stays
within 5 seconds. The experiment selected 40
companies as research objects and analyzed their
profitability risk indicators. When gross profit
margin and net profit margin are used as influencing
factors, it can be seen that companies No. 5, 6, 7,
and 39 have the highest risk levels, and companies
No. 33 and 34 have the lowest risk levels. When the
data related to net asset income and earnings per
share are used as influencing factors, it can be seen
that companies No. 24, 26, 38, and 39 have higher
income risks, while companies No. 14 and 15 have
lower income risks. The higher F1 value shows that
the proposed method has superior evaluation ability
and high recognition accuracy. The short running
time shows that the method proposed in the
experiment has excellent data processing
capabilities and strong robustness. In addition, by
analyzing and processing the correlation of risk
factors, the multi-dimensional situation of each risk
factor can be obtained. Comprehensive research
results show that the accounting risk management
technology that combines big data algorithms and
non-uniform clustering algorithms has higher
performance in terms of calculation speed than
traditional algorithms, has better data processing
and data analysis capabilities, and can provide
enterprises with Provide risk level assessment
function. However, at the end of the experiment,
specific processes in the application and
management of big data methods were proposed
based on actual enterprise cases. However, when
faced with different industries and problems, due to
different data acquisition methods and differences in
data composition and structure, specific problems
still need to be developed. Correspond to specific
analysis rather than generalizations. At the same
time, due to other objective reasons, it was not
possible to select more companies and a longer
period to better reflect the analysis effect of the data
mining algorithm, which can be used as a follow-up
research direction.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
356
Volume 12, 2024
References:
[1] W. Haoxiang and S. Smys, Big data analysis
and perturbation using data mining
algorithm, Journal of Soft Computing
Paradigm, Vol. 3, No. 1, 2021, pp. 19-28.
[2] N. S. Amin, P. Shivakumara, T. X. Jun, K.
Y. Chong, D. L. L. Zan, and R. Rahavendra,
An augmented reality-based approach for
designing interactive food menu of restaurant
using android, Artificial Intelligence and
Applications, Vol. 1, No. 1, 2023, pp. 26-34.
[3] M. Barma and U. M. Modibbo,
Multiobjective mathematical optimization
model for municipal solid waste management
with economic analysis of reuse/recycling
recovered waste materials, Journal of
Computational and Cognitive Engineering,
Vol. 1, No. 3, 2022, pp. 122-137.
[4] T. Mahmood and Z. Ali, Analysis of
Maclaurin symmetric mean operators for
managing complex interval-valued q-Rung
orthopair fuzzy setting and their applications,
Journal of Computational and Cognitive
Engineering, Vol. 2, No. 2, 2023, pp. 98-115.
[5] Y. Fang, B. Luo, T. Zhao, D. He, B. B. Jiang,
and Q. L. Liu, ST-SIGMA: Spatio-temporal
semantics and interaction graph aggregation
for multi-agent perception and trajectory
forecasting, CAAI Transactions on
Intelligence Technology, Vol. 7, No. 4, 2022,
pp. 744-757.
[6] M. A. Hanafiah and A. Wanto,
Implementation of data mining algorithms
for grouping poverty lines by district/city in
North Sumatra, IJISTECH (International
Journal of Information System and
Technology), Vol. 3, No. 2, 2020, pp. 315-
322.
[7] E. Bracci, T. Mouhcine, and T. Rana, Risk
management and management accounting
control systems in public sector
organizations: a systematic literature review,
Public Money and Management, Vol. 42,
No. 6, 2022, pp. 395-402.
[8] M. Taqi, A. S. Rusydiana, N. Kustiningsih,
and I. Firmansyah, Environmental
accounting: A scientometric using
biblioshiny, International Journal of Energy
Economics and Policy, Vol. 11, No. 3, 2021,
pp. 369-380.
[9] R. Othman and N. M. Zambi, Social media
as a learning tool in cost and management
accounting, ANP Journal of Social Science
and Humanities, Vol. 2, No. 2, 2021, pp. 39-
46.
[10] A. Sangster, G. Stoner, and B. Flood,
Insights into accounting education in a
COVID-19 world, Accounting Education,
Vol. 29, No. 5, 2020, pp. 431-562.
[11] H. X. Guo, J. R. Wang, G. C. Peng, P. Li,
and M. J. Zhu, A data mining-based study on
medication rules of Chinese herbs to treat
heart failure with preserved ejection fraction,
Chinese Journal of Integrative Medicine,
Vol. 28, No. 9, 2022, pp. 847-854.
[12] H. Dan, L. Jian, X. Ling, J. Xie, Q. Zhu, P.
Chen, Z. Shen, Q. Meng, and H. Wang, Data
mining study on prescription patterns of
different dosage forms of Chinese herbal
medicines for treating and improving
immune-inflammatory indices in patients
with rheumatoid arthritis, Chinese Journal of
Integrative Medicine, Vol. 28, No. 3, 2022,
pp. 1-8.
[13] M. K. Gupta and P. Chandra, A
comprehensive survey of data mining,
International Journal of Information
Technology, Vol. 12, No. 4, 2020, pp. 1243-
1257.
[14] I. L. H. Alsammak, A. H. Mohammed, and I.
S. Nasir, E-learning and COVID-19:
predicting student academic performance
using data mining algorithms, Webology,
Vol. 19, No. 1, 2022, pp. 3419-3432.
[15] H. Si, C. Sun, and H. Qiao, Application of
improved multidimensional spatial data
mining algorithm in agricultural
informationization, Journal of Intelligent and
Fuzzy Systems, Vol. 38, No. 2, 2020, pp.
1359-1369.
[16] S. Shakya, A self-monitoring and analyzing
system for solar power station using IoT and
data mining algorithms, Journal of Soft
Computing Paradigm, Vol. 3, No. 2, 2021,
pp. 96-109.
[17] W. Haoxiang and S. Smys, Big data analysis
and perturbation using data mining
algorithm, Journal of Soft Computing
Paradigm (JSCP), Vol. 3, No. 1, 2021, pp.
19-28.
[18] S. Choudhuri, S. Adeniye, and A. Sen,
Distribution alignment using complement
entropy objective and adaptive consensus-
based label refinement for partial domain
adaptation, Artificial Intelligence and
Applications, Vol. 1, No. 1, 2023, pp. 43-51.
[19] M. A. Hanafiah and A. Wanto,
Implementation of data mining algorithms
for grouping poverty lines by District/City in
North Sumatra, IJISTECH (International
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
357
Volume 12, 2024
Journal of Information System and
Technology), Vol. 3, No. 2, 2020, pp. 315-
322.
[20] P. Durana, V. Krastev, and K. Buckner,
Digital twin modeling, multi-sensor fusion
technology, and data mining algorithms in
cloud and edge computing-based Smart city
environments, Geopolitics, History, and
International Relations, Vol. 14, No. 1,
2022, pp. 91-106.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2024.12.34
Haiyan Li
E-ISSN: 2415-1521
358
Volume 12, 2024