HVLV-Motor-KC: Production Efficiency of HVLV Motor
Classification using K-means Clustering
YEJI DO1,2, CHAEGYU LEE1,*, JONGPIL JEONG1,*, JIHO JEONG1, DONGGEUN BAE1,
INKWON YEO1, MINGYU KIM1
1Department of Smart Factory Convergence,
Sungkyunkwan University,
2066 Seobu-ro Jangan-gu, Suwon, 16419,
REPUBLIC OF KOREA
2AI Research Lab,
Hygino,
25 Simin-daero 248beon-gil, Dongan-gu, Anyang-si, Gyeonggi-do,
REPUBLIC OF KOREA
*Corresponding Authors
Abstract: - This paper aims to introduce the K-means clustering algorithm to complement the Group
Technology (GT) methodology as part of a multi-product, low-volume production system. This challenge aims
to overcome the limitations of the GT methodology and optimize the production schedule to increase efficiency.
We propose a high-variation, low-volume K-means clustering (HVLV-Motor-KC) algorithm, which is a K-
means clustering algorithm that focuses on high-variety, low-volume data. This algorithm helps to optimize
production by placing motors with similar characteristics in the same cluster.
Key-Words: - Group Technology, HVLV Production, K-means Clustering, Hierarchical Clustering, Silhouette
Score.
Received: December 19, 2023. Revised: August 19, 2024. Accepted: September 23, 2024. Published: October 14, 2024.
1 Introduction
The development of intelligent manufacturing and
smart machines in modern industry has been fuelled
by the integration of artificial intelligence (AI) and
big data methodologies, [1]. It is anticipated that this
AI-supported Industry 4.0 revolution will enhance
industrial productivity by at least 30% within a few
years of its implementation, [1]. This innovation
employs the use of artificial intelligence (AI) to
reduce the incidence of machine failure, enhance
quality control, boost productivity, and markedly
reduce product costs. Consequently, humanity is
undergoing a significant transformation that will
fundamentally alter the future and our way of life,
[2].
This trend is also bringing significant changes to
modern manufacturing. Firstly, the development of
smart factories capable of collecting and storing
process data in real-time through information
automation technology has advanced, [2]. These
intelligent factory operation technologies can create
value-added data through big data analysis, thereby
improving quality within the production process.
Furthermore, in the era of the Fourth Industrial
Revolution, production methods are shifting from
mass production of a few varieties to small-batch
production of various varieties, [3]. This illustrates
that an era has arrived where it is necessary to
secure not only product quality but also service
quality and brand quality through the evolution of
technologies such as sensors, the Internet of Things,
and big data utilization. Consequently, the speed of
quality transformation must be swiftly adjusted to
meet customer preferences, [3]. In line with these
trends, modern manufacturing is increasingly
transitioning to a system of small-batch production
of various varieties to meet diverse customer
demands and respond quickly to market changes.
The objective of this study is to examine the
factors contributing to the observed decline in
productivity at the Korean motor manufacturer K,
resulting from the practice of small-batch
production for a range of different types of small
motors. In the case of mass production of standard
items, a standardized production flow can be
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
488
Volume 21, 2024
designed and standardized as the same product is
continuously manufactured on a single piece of
equipment, [4].
However, in small-batch production processes
of various types, products with different
characteristics are mixed and manufactured. In one
case, a company reported that the model is changed
more than 10 times per day on average per
production line, [2]. Each time this occurs, the
preparation and production flow must be adjusted
continuously, resulting in only 5% of the
manufacturing time being spent on actual processing
and preparation, while the remaining 95% is non-
processing time, leading to waste.
Manufacturers producing small-batch, multi-
variety products apply the concept of Group
Technology (GT) to solve the aforementioned
problems and achieve their goals, [5], [6]. By
classifying similar parts based on shape, dimensions,
or processes, and applying optimized part design,
machine allocation, tools, and work methods to each
group, this methodology minimizes the radius of
action and setup time, thereby enhancing
productivity. This methodology reclassifies parts
into part families with similar design or
manufacturing characteristics to maximize
efficiency, [5], [7]. However, there are several
issues with the GT methodology.
Firstly, it should be noted that the results may
vary depending on the subjective judgment,
experience, or preferences of the individual
responsible for classifying the parts, [7]. Secondly,
the process of analysis and grouping is inherently
time-consuming and costly, and it can be
challenging to analyse when the data volume is
large or complex. Finally, the costs associated with
data collection and processing may be considerable.
While the GT methodology is suitable for small
quantities of data, it may be challenging to apply to
large-scale data sets.
To address the shortcomings of the GT
methodology, the K-means clustering algorithm, a
fundamental tool in machine learning, can be
employed.
In this study, the efficacy of the K-means
clustering algorithm was assessed using the
Silhouette Score as a post-hoc evaluation metric.
The analysis yielded a Silhouette Score of 0.9158,
indicative of a robust clustering effect. This
indicates that the K-means clustering algorithm is an
effective method for classifying data and can be
used to complement the GT methodology, as
previously demonstrated in [8], [9].
The structure of the paper is as follows: Section
2 covers several key topics, Group Technology, K-
means Clustering, and Hierarchical Clustering.
Section 3 describes the proposed HVLV-Motor-KC
framework in detail. Section 4 describes the
experimental environment, datasets, evaluation
measures, and results of the three experiments.
Finally, Section 5 presents the conclusions of the
three experiments and future research directions.
2 Related Work
2.1 High-Variety Low-Volume Production
The advent of Industry 4.0 has ushered in a new era
of technological advancement, with market trends
shifting from mass, low-mix production to high-mix,
low-volume (HMLV) production, [8]. Small-batch,
multi-variety production refers to the method of
producing multiple types of products in small
quantities, diversifying the production process to be
flexible and not limited to a single type, [6]. This
approach enables manufacturers to respond flexibly
to fluctuations in market demand for customized
products. In a small-batch, multi-variety production
environment, it is common for order sizes to be
small, the number of orders to be high, and the
variability of products to increase. This enables the
enhancement of production efficiency and
maximization of resource utilization.
Fig. 1: Types of Manufacturing and Service
Processes Based on the Number of Varieties and
Production Volume
Figure 1 illustrates the types of manufacturing
and service processes based on the diversity of
varieties and production volumes in manufacturing.
Four characteristics—Volume, Variety, Variation,
and Visibility—are interrelated, [10]. Volume and
Variety generally have an inverse relationship
within a single operational process, determining the
position of a specific operational process along this
continuum. Both manufacturing and service process
types can be identified along this continuum.
Manufacturing processes are categorized into
Project, Jobbing, Batch, Mass, and Continuous
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
489
Volume 21, 2024
processes, while service processes are divided into
Professional, Service, and Mass services. These
process types are determined by the inverse
correlation between Volume and Variety. After
determining the appropriate process type, another
crucial factor is the layout. The choice of layout is
closely linked to the process type and is generally
determined based on the process type. The most
common layout types are Fixed Position, Functional,
Cellular, and Product layouts.
As consumer demands become more diverse
and specific, manufacturers are introducing a variety
of products and launching high-performance
customized products. Small-batch, multi-variety
production reflects this trend and is utilized in
various fields such as clothing, jewellery, cosmetics,
ships, and robotic devices. In small-batch, multi-
variety production, customers directly determine the
design, specifications, production volume, and
delivery time of products, making management for
companies very complex and uncertain.
Furthermore, frequent job changes and different
specifications and standards require advanced
technology. These challenges necessitate
manufacturers to adopt more flexible and efficient
management systems.
[8], discusses the importance of small-batch,
multi-variety systems and the technical
requirements to support them. This study explains
how small-batch, multi-variety production systems
should be designed and operated to meet the
demand for customized products. Additionally, [11],
emphasizes the optimization of production
schedules and the efficiency of personalized product
manufacturing by introducing digital twin
technology.
2.2 Group Technology
GT is a concept based on the principle of processing
similar products in a similar manner.
Fig. 2: Group Technology
Figure 2 illustrates two major methodologies in
GT. The first methodology is cluster analysis, which
groups objects into similar clusters based on their
characteristics, [12]. This method is used to
minimize setup times and tool change times
according to the types and quantities of
manufacturing parts, [7]. Through cluster analysis,
machines and workstations can be rearranged. In
frequently changing production environments,
virtual rearrangement can provide a number of
benefits. Formal methods for clustering machines
and parts include matrix, mathematical
programming, and graph methods. The second
methodology classifies parts into groups based on
their design characteristics. This approach includes
visual methods and coding methods. The visual
method groups similar parts based on their
geometric shapes. This method is suitable when the
number of parts is small, but can vary depending on
the subjective judgment of the classifier. The coding
method assigns numerical or alphabetic codes based
on characteristics such as geometric shape,
complexity, and machining precision of parts or
products. The Opitz coding system is a
representative example.
Thus, GT rationalizes design and allocates
appropriate production facilities and tools to each
classified group, thereby reducing setup times, inter-
process transportation, and machining waiting times,
[5]. This increases the lot size compared to a
disordered production method, achieving an effect
similar to mass production and enhancing
productivity. Additionally, in production preparation,
for previously designed and produced repeat or
similar parts, GT allows the calculation of part
design, process planning, manufacturing cell design,
and estimated manufacturing costs based on data
retrieved from part production information, [13].
2.3 K-means Clustering
Although clustering algorithms have been
developed for decades, the K-means algorithm
remains widely used due to its simple principles,
convenience, and high efficiency, [14]. The K-
means clustering algorithm, [15], is one of the
unsupervised learning algorithms used in machine
learning. It groups similar data points by leveraging
their characteristics, [8], [15]. This algorithm
partitions a dataset so that data points with similar
attributes belong to the same cluster, [8]. It provides
a method to divide points in a multi-dimensional
space into k clusters. The algorithm classifies n
points into k clusters, ensuring that points within
each cluster share similar characteristics and exhibit
different attributes from points in other clusters.
This results in the formation of clusters where
similar data points are grouped together.
K-means clustering has the advantage of being
easy to apply to large-scale and high-dimensional
data. However, it has the disadvantage that the
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
490
Volume 21, 2024
number of clusters must be predetermined by the
analyst. In addition, the randomness of the initial
centroid selection can lead to different results each
time it is run. To mitigate these drawbacks,
initialization methods such as K-means++ and
various modified algorithms have been proposed to
achieve more stable and consistent clustering results.
Fig. 3: Steps in Performing K-means Clustering
Figure 3 shows the steps involved in performing
K-means clustering. There are four steps in the K-
means clustering process. Firstly, a number of
clusters or a value of k are determined, chosen by
empirical methods or domain knowledge. In the
second step, initial centroids are randomly assigned
and all data points are assigned to the most closely.
The third step is to calculate the average of the data
points classified to the respective clusters and to
update the centroids in accordance with this
calculation. Finally, the fourth step repeats the third
and fourth steps until the centroids stabilize,
meaning the algorithm continues iterating until there
are no changes in the centroids, confirming
convergence. Through this process, the final cluster
for each data point is determined.
2.4 Hierarchical Clustering
Hierarchical clustering is an algorithm that merges
or splits data samples into groups based on their
similarity to form a hierarchical structure, [16].
Fig. 4: Hierarchical Clustering Algorithm
As shown in Figure 4, this algorithm captures
various levels of relationships between clusters and
creates clusters through a step-by-step merging
process of data samples. For example, when there
are n samples, initially each sample starts as an
individual cluster. Subsequently, the most similar
pair of clusters is repeatedly merged to reduce the
number of clusters, continuing this process until
only one cluster remains. This continuous merging
process is central to hierarchical clustering and is
useful for understanding the structure of clusters
based on data similarity, [17], [18].
Fig. 5: Hierarchical Clustering Dendrogram
The aforementioned process allows for the
generation of a dendrogram, as illustrated in Figure
5, which provides a visual representation of the
complete clustering process and offers an intuitive
understanding of the relationships between clusters.
This hierarchical approach is an effective method
for discovering and understanding the inherent
structure of the data.
The field of hierarchical clustering is
characterized by two principal approaches:
agglomerative clustering, as outlined in [19] and
divisive clustering, as detailed in [20]. In the context
of agglomerative clustering, the process commences
with each data point constituting an independent
cluster. Thereafter, the most analogous clusters are
repeatedly merged. In the initial stage of the process,
each data point is represented by a distinct cluster.
The procedure continues until all points are merged
into a single cluster. In contrast, divisive clustering
commences with all data points in a single cluster
and proceeds to repeatedly divide the most disparate
clusters, continuing until each data point becomes
its own distinct cluster. Figure 6 illustrates the
process of agglomerative hierarchical clustering,
accompanied by a visual representation of the
resulting dendrogram.
Hierarchical clustering forms clusters based on
the Euclidean distance between data points or
clusters without the use of an objective function.
This method circumvents the issue of initial
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
491
Volume 21, 2024
parameter determination, thereby distinguishing it
from K-means clustering. The outcome of K-means
clustering is contingent upon the initial parameter
settings, which may result in disparate clustering
outcomes for the same dataset. In contrast,
hierarchical clustering is not susceptible to the
influence of initial parameter settings, thereby
ensuring the generation of consistent clustering
results. Nevertheless, hierarchical clustering is not
without its limitations, particularly in regard to its
applicability to large datasets. The computational
complexity of this method increases exponentially
with the number of data points, due to the necessity
of calculating and storing the distance for every pair
of data points. Consequently, this significantly
increases the demand on memory and the time
required for computation, rendering it inefficient for
large datasets.
Fig. 6: Agglomerative Hierarchical Clustering &
Dendrogram
3 HVLV-Motor-KC
3.1 Methodology
HVLV-Motor-KC proposes a method that can
efficiently and flexibly handle complex datasets,
such as those of high-variety low-volume motor
data. Figure 7 provides an explanation of the
methodology.
Previously, the GT method was used. This
method mainly relies on subjective judgment and
experience, which poses limitations when dealing
with complex patterns or large volumes of data.
Managing data through Excel sheets is time-
consuming and costly, and it is difficult to apply this
method across various environments or fields.
Therefore, classifying high-variety, low-volume
motor items through GT is challenging.
The framework of HVLV-Motor-KC has the
following features. First, it automates the clustering
process to minimize human intervention, improving
processing speed and efficiency. Second, K-means
clustering groups data points more accurately based
on similarity. Third, it performs clustering
considering the characteristics of diverse data,
making it applicable to various types of data.
Fourth, it provides the ability to process large-scale
datasets quickly and effectively. Finally, it is
designed to respond swiftly to changes in data.
This methodology can be particularly useful in
areas such as production line optimization,
inventory management, and product development.
The data insights obtained through clustering can
enhance production efficiency, improve product
customization, and contribute to a more accurate
understanding of customer needs.
Fig. 7: HVLV-Motor-KC Framework
3.2 K-means Clustering
HVLV-Motor-KC algorithm is a K-means algorithm
that clusters data points through multiple stages.
Figure 8 explains the algorithm.
Figure 8 compares the general K-means
clustering algorithm with the HVLV-Motor-KC
algorithm to help clearly understand the differences
and characteristics of each algorithm. The data input
for the general K-means clustering algorithm is
based on the initial dataset. This algorithm is
executed to divide the given dataset into k clusters.
In Figure 8, the number of clusters is set to 3. As a
result, the data points are grouped into three clusters,
with each cluster represented in a different color.
The data input for the HVLV-Motor-KC
algorithm is based on the same dataset as the
general K-means clustering algorithm. In Figure 8,
the number of clusters is set to 3 for all levels. In the
first level of clustering, the initial dataset is divided
into three clusters. Each cluster formed at the first
level becomes the dataset for the second level.
Therefore, the number of clusters for the next level
is the same as the k value set in the previous
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
492
Volume 21, 2024
clustering, which is 3 in Figure 8. Within each
cluster, repeated clustering allows for further
refinement in motor classification. To classify these
multi-variety motors, we iteratively train the K-
means algorithm to find patterns in each motor and
classify them by their unique characteristics.
Fig. 8: K-means Clustering Algorithm
4 Experiment and Results
4.1 Experimental Environments
As shown in Table 1, the hardware configuration is
based on a 12th Gen Intel® Core™ i7-1260P CPU,
Intel® Iris® GPU, 16 gigabytes (GB) of RAM, and
an M.2 SSD for storage.
Table 1. Hardware Configuration
Item
Description
CPU
12th Gen Intel(R) Core(TM) i7-1260P
GPU
Intel(R) Iris(R) Xe Graphics
RAM
16.0GB
Storage
M.2 SSD
The software environment was configured as
follows. The operating system used was Windows
11, and Python 3.10.12 was adopted as the
programming language. For data analysis and
visualization, libraries such as Numpy 1.25.2,
Pandas 2.0.3, scikit-learn, UMAP, Matplotlib, and
Seaborn were utilized. The development
environment used was Jupyter Notebook, and the
virtual environment was set up through Anaconda.
This software configuration facilitates the efficient
execution of various data processing and analysis
tasks. Table 2 summarizes the software
configuration environment.
Table 2. Software Configuration
Item
Operating System
Programming Language
Libraries
Dev Environment
Experiments were conducted in the hardware
and software environment configured as described,
enabling smooth execution of various data
processing and analysis tasks. This environment
configuration enhances the reproducibility of the
experiments and is suitable for meeting diverse data
processing and analysis requirements.
4.2 Datasets
The dataset collected for this study is based on the
production data of a Korean small and medium-
sized enterprise (SME) K, a manufacturer of high-
variety, low-volume motors, primarily consisting of
small, medium, and ultra-small motor product
families.
Table 3. Motor Dataset
ITEM_GUBN
ITEM_SERIES
EQUIP_CD
SMALL
KAFZ
EQ004
SMALL
KAFZ
EQ004
MEDIUM
PSMR
EQ001
ULTRA-SMALL
PSMS
EQ007
MEDIUM
RTSE
EQ015
As shown in Table 3, the items used to
differentiate the data in the actual field include
ITEM_CD (item code), ITEM_REV (item revision),
ITEM_NM (item name), PROD_GUBUN
(production classification), ITEM_GUBUN (item
classification), ITEM_SERIES (item series),
LEAD_TIME (production time), and EQUIP_CD
(equipment code).
A total of 909 motor data records were collected.
Of these, 727 records (80%) were used as training
data, while the remaining 182 records (20%) were
utilized as test data for performance evaluation.
To train the HVLV-Motor-KC algorithm on the
motor data, we performed several preprocessing
steps. First, we removed rows that contained
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
493
Volume 21, 2024
missing values for EQUIP_CD and ITEM_GUBUN,
which are essential fields for the analysis.
Then, we converted the categorical data,
EQUIP_CD, ITEM_GUBUN, and ITEM_SERIES,
into a and converted them to numeric codes by
applying data scaling.
The main variables used in the experiment are
ITEM_GUBUN, EQUIP_CD, and ITEM_SERIES.
Table 4 shows the variables used for each primary
and secondary clustering.
Table 4. Variables for Clustering 1 and 2
Clustering
ItemGubun
EquipCd
ItemSeries
1
x
2
4.3 Evaluation Metrics
The silhouette coefficient is an indicator of how
closely a data point is clustered based on its distance
from data points in similar clusters, [21] and how far
it is distributed from data in other clusters. The
silhouette coefficient can have a value between -1
and 1, with values closer to 1 indicating better
clustering. The HVLV-Motor-KC algorithm was
evaluated using the silhouette coefficient above.
󰇛󰇜
󰇛󰇜

(1)
󰇛󰇜

󰇛󰇜

(2)
󰇛󰇜 󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜 󰇛󰇜
(3)

󰇛󰇜

(4)
4.4 Visualization of Clustering Numbers
To find the optimal number of clusters for the
HVLV-Motor-KC algorithm, we used the silhouette
and elbow methods.
Fig. 9: Elbow Method
The first silhouette measured the number of
clusters with the highest silhouette score, while the
second elbow method measured the number of
clusters with a point where the Within Cluster Sum
of Squares (WCSS) decreases sharply.
Fig. 10: Silhouette
We experimented with finding the optimal
number of clusters using the above method and
determined that seven clusters was the best number,
as shown in Figure 9 and Figure 10.
Table 5 shows the SSE (Within-Cluster Sum of
Squares) and silhouette scores for different numbers
of clusters. When the number of clusters is 7, the
SSE sharply decreases to 18.0726, indicating that
data points are more closely grouped around each
cluster center. The silhouette score is the highest at
0.9188, indicating the best separation between
clusters.
Additionally, the initial values for the cluster
centers (init) were set using the k-means++
algorithm. This approach selects the initial cluster
centers in a better way to improve the performance
of clustering. The number of times to use different
initializations was set to the default value of $10$.
The maximum number of iterations (max_iter) was
set to 300 to allow sufficient iterations for the
algorithm to converge.
Table 5. Comparison of Elbow Method
and Silhouette
n-
Clusters
SSE
Silhouette Score
2
315.73772788646966
0.7707042591059704
3
175.85620616762893
0.79852165813859
4
86.75635309788157
0.7439876446148647
5
42.993683685000555
0.7369224511116048
6
29.25106647321429
0.8081641831658453
7
18.072619525625903
0.9188765290433094
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
494
Volume 21, 2024
In summary, by setting the optimal number of
clusters to 7, the motor data could be grouped most
effectively according to similar characteristics.
4.5 First Classification
Figure 11 presents the results of the initial clustering
process. Each cluster is delineated by a distinct
colour, facilitating the identification of its
distribution and boundaries in relation to other
clusters. The size of the data points is adjusted
proportionally to the number of samples within the
corresponding cluster; that is, the more samples in a
cluster, the larger the size of the data point. This
visualization allows for an immediate understanding
of the relative size and density of each cluster.
Fig. 11: First Classification on ItemGubun and
EquipCd
Table 6 shows the exact number of data points
belonging to each cluster, providing specific
information on the cluster distribution. This helps to
clearly understand the composition and distribution
of each cluster.
Table 6. Comparison of Elbow Method
and Silhouette
n-Clusters
Sample Count
0
102
1
84
2
109
3
21
4
122
5
105
6
185
Cluster 6, consisting of 185 data points, forms
the largest cluster and accounts for a significant
portion of the total data. The next largest cluster is
Cluster 4, which includes 122 data points. In
comparison, the data points in cluster 3 show the
smallest number of data points, 21. The first
clustering result from K-means gives a silhouette
score of 0.9188, which is very close to 1, indicating
that the boundaries between clusters are clear and
the data is well distributed.
4.6 Second Classification
Based on the actual classification of motors in
Company K, we performed primary clustering by
equipment code and item classification, and then
secondary clustering by item series. When we made
multiple classifications, such as primary, secondary,
and so on, we were able to make more detailed and
sophisticated clustering. In the second stage, the
ITEM_SERIES variable was incorporated into the
process to facilitate a more refined clustering
approach. Any missing values in the ITEM_SERIES
field were treated as Other.
Fig. 12: Second Classification on ItemGubun,
EquipCd and ItemSeries
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
495
Volume 21, 2024
In the second stage of the clustering process,
each cluster generated in the initial stage was further
subdivided in order to identify more detailed
patterns within the motor production data. A further
iteration of K-means clustering was conducted
within each cluster, incorporating the
ITEM_SERIES variable. In order to ascertain the
optimal number of clusters, the elbow method and
silhouette analysis were employed, in a manner
analogous to that undertaken in the initial stage of
clustering. The initial value settings and other
optimal algorithm settings were maintained in
accordance with those employed in the initial stage
of clustering.
Figure 12 visualizes the results of the second-
stage clustering based on the first-stage clustering.
Each cluster is represented in a different color and
visualized in 3D. This visualization allows the
silhouette scores of each cluster to be confirmed,
enabling a more detailed grouping of the motors.
Table 7. Comparison of Silhouette
n-Clusters
Silhouette Score
0
0.913
1
0.840
2
0.923
3
0.944
4
0.848
5
0.911
6
0.888
Table 7 shows the number of clusters and
silhouette scores for the second-stage clustering.
The number of clusters was determined to be
between 6 and 9, and all clusters had silhouette
scores above 0.8, indicating that the clustering was
very successful. Particularly, Cluster 3 recorded the
highest silhouette score of 0.9444, indicating that
the data points within this cluster are very densely
packed and clearly distinguished from other clusters.
4.7 Results
We applied the HVLV-Motor-KC algorithm to
optimize the production of 900 small, medium, and
ultra-small motors by classifying them into
equipment codes, item categories, and item series.
As the results of the experiment show in Table 8,
both primary and secondary clustering in the
training data showed the highest silhouette score of
0.91742, which is consistent with the process of
classifying multi-variety motors into seven motors
by field workers in the actual field, thus increasing
the applicability in the real industry.
Table 8. Comparison of Silhouette Score
Silhouette Score
train_data
0.9174270361483706
test_data
0.9195868772687471
Figure 13 shows, keeping the number of clusters
at 7, applying the test data resulted in a high
silhouette score of 0.91958, demonstrating that the
proposed HVLV-Motor-KC algorithm can classify
new varieties into similar types of varieties even
when new varieties are introduced.
Fig. 13: Clustering Comparison of Training and Test
Data
5 Conclusion
The HVLV-Motor-KC algorithm demonstrated high
performance on training and test data from a large
variety of low-volume motors, showing that it can
efficiently classify complex and large volumes of
motor data to optimise production management.
Further research is required to enhance the
algorithm's performance by conducting a
comparative analysis with alternative data types and
clustering techniques. This will assist in further
verifying the versatility and stability of the HVLV-
Motor-KC algorithm. Moreover, further
experiments will be conducted with the objective of
enhancing the practicality of the algorithm and
advancing the research in order to increase its
applicability in real industrial environments.
Finally, a user-friendly interface will be
developed to facilitate straightforward utilisation in
practical applications. In order to achieve this, tools
that provide a visual representation of the
algorithm's results will be developed, thereby
enabling users to directly control and analyse the
clustering process.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
496
Volume 21, 2024
Acknowledgement:
This research was supported by the SungKyunKwan
University and the BK21 FOUR (Graduate School
Innovation) funded by the Ministry of Education
(MOE, Korea) and National Research Foundation of
Korea (NRF). Moreover, This work was supported
by ICT Creative Consilience Program through the
Institute of Information & Communications
Technology Planning & Evaluation (IITP) grant
funded by the Korea government (MSIT)(IITP-
2024-2020-0-01821).
Declaration of Generative AI and AI-assisted
Technologies in the Writing Process
During the preparation of this work the authors used
ChatGPT in order to improve the readability of the
paper. After using this tool/service, the authors
reviewed and edited the content as needed and take
full responsibility for the content of the publication.
References:
[1] Jagatheesaperumal, S. K., Rahouti, M.,
Ahmad, K., Al-Fuqaha, A., & Guizani, M.,
The duo of artificial intelligence and big data
for industry 4.0: Applications, techniques,
challenges, and future research directions,
IEEE Internet of Things Journal, Vol. 9, No.
15, 2022, pp. 12861-12885.
[2] Riew, M. C., & Lee, M. K., A Case Study of
the Construction of Smart Factory in a Small
Quantity Batch Production System: Focused
on IDIS Company, Journal of Korean Society
for Quality Management, Vol. 46, No. 1, 2018,
pp. 11-26.
[3] Chong, H. R., Bae, K. H., Lee, M. K., Kwon,
H. M., & Hong, S. H., Quality strategy for
building a smart factory in the fourth
industrial revolution, Journal of Korean
Society for Quality Management, Vol. 48, No.
1, 2020, pp. 87-105.
[4] Im, K.-H., Rule-based Process Control
System for multi-product, small-sized
production, Journal of Korea Society of
Industrial Information Systems, Vol. 15, No. 1,
2010, pp. 47–57.
[5] Park, H. K., & Oh, C. J., Integration of design
and process planning using group technology,
Proceedings of the Korean Society for
Intelligent Information Systems Conference,
1997, pp. 107-112.
[6] Gödri, I., Improving Delivery Performance in
High-Mix Low-Volume Manufacturing by
Model-Based and Data-Driven Methods,
Applied Sciences, Vol. 12, No. 11, 2022, pp.
5618.
[7] Park, G. J., & Park, J. W., A Study on the
Application of Group Technology for Naval
Ship Design and Manufacturing, Journal of
the military operations research society of
Korea, Vol. 32, No. 2, 2006, pp. 78-91.
[8] Ahmed, M., Seraj, R., & Islam, S. M. S, The
k-means Algorithm: A Comprehensive Survey
and Performance Evaluation, Electronics, Vol.
9, No. 8, 2020, pp. 1295.
[9] Capó, M., Pérez, A., & Lozano, J. A., An
efficient K-means clustering algorithm for tall
data, Data mining and knowledge discovery,
Vol. 34, 2020, pp. 776-811.
[10] Brown, S., Blackmon, K., Cousins, P., &
Maylor, H., Operations management: policy,
practice and performance improvement,
Routledge, 2013
[11] Sit, S. K., & Lee, C. K., Design of a Digital
Twin in Low-Volume, High-Mix Job
Allocation and Scheduling for Achieving
Mass Personalization, Systems, Vol. 11, No. 9,
2023, pp. 454.
[12] Shahin, A., & Janatyan, N., Group
Technology (GT) and Lean Production: A
Conceptual Model for Enhancing Productivity,
International Business Research, Vol. 3, No.
4, 2010, pp. 105-117.
[13] Askin, R. G., & Chiu, K. S., A graph
partitioning procedure for machine
assignment and cell formation in group
technology, The International Journal of
Production Research, Vol. 28, No. 8, 1990,
pp. 1555-1572.
[14] Hu, H., Liu, J., Zhang, X., & Fang, M., An
effective and adaptable K-means algorithm
for big data cluster analysis, Pattern
Recognition, Vol. 139, 2023, pp. 109404.
[15] Ikotun, A. M., Ezugwu, A. E., Abualigah, L.,
Abuhaija, B., & Heming, J., K-means
clustering algorithms: A comprehensive
review, variants analysis, and advances in the
era of big data, Information Sciences, Vol.
622, 2023, pp. 178-210.
[16] Pal, S. S., Mukhopadhyay, J., & Sarkar, S.,
Finding hierarchy of clusters, Pattern
Recognition Letters, Vol. 178, 2024, pp. 7-13.
[17] Nielsen, F., & Nielsen, F., Hierarchical
clustering, Introduction to HPC with MPI for
Data Science, 2016, pp. 195-211.
[18] Campello, R. J., Moulavi, D., & Sander, J.,
Density-based clustering based on
hierarchical density estimates, Pacific-Asia
Conference on Knowledge Discovery and
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
497
Volume 21, 2024
Data Mining, Berlin, Heidelberg, Springer
Berlin Heidelberg, Vol. 7819, 2013, pp. 160-
172.
[19] Rodriguez, M. Z., Comin, C. H., Casanova, D.,
Bruno, O. M., Amancio, D. R., Costa, L. D. F.,
& Rodrigues, F. A., Clustering algorithms: A
comparative approach, PloS one, Vol. 14, No.
1, 2019, pp. e0210236.
[20] Ran, X., Xi, Y., Lu, Y., Wang, X., & Lu, Z.,
Comprehensive survey on hierarchical
clustering algorithms and the recent
developments, Artificial Intelligence Review,
Vol. 56, No. 8, 2023, pp. 8219-8264.
[21] Minh, H. L., Sang-To, T., Wahab, M. A., &
Cuong-Le, T., A new metaheuristic
optimization based on K-means clustering
algorithm and its application to structural
damage identification, Knowledge-Based
Systems, Vol. 251, 2022, pp. 109189.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
This research was supported by the SungKyunKwan
University and the BK21 FOUR (Graduate School
Innovation) funded by the Ministry of Education
(MOE, Korea) and National Research Foundation of
Korea (NRF). Moreover, This work was supported
by ICT Creative Consilience Program through the
Institute of Information & Communications
Technology Planning & Evaluation(IITP) grant
funded by the Korea government(MSIT)(IITP-2024-
2020-0-01821).
Conflict of Interest
The authors have no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.44
Yeji Do, Chaegyu Lee, Jongpil Jeong,
Jiho Jeong, Donggeun Bae,
Inkwon Yeo, Mingyu Kim
E-ISSN: 2224-3402
498
Volume 21, 2024