DAEHWAN LEE, JONGPIL JEONG
Department of Smart Factory Convergence,
Sungkyunkwan University,
2066 Seobu-ro, Jangan-gu, Suwon 16419,
REPUBLIC OF KOREA
Abstract: - This paper executes bearing fault diagnosis with little data through few-shot learning. Recently,
deep learning-based fault diagnosis methods have achieved promising results. In previous studies, fault
diagnosis requires numerous training samples. However, in manufacturing, it is not possible to obtain sufficient
training samples for all failure types under all working conditions. In this work, we propose a Few shot
learning-based rolling bearing fault diagnosis that can effectively learn with limited data. Our model is based
on the siamese network, which learns to use the same or different class of sample pairs.
Key-Words: - Few shot learning, Siamese network, Fault detection, WDCNN(Wide-first-layer kernels CNN),
Bearing
Received: September 19, 2021. Revised: October 28, 2022. Accepted: November 25, 2022. Published: December 31, 2022.
1 Introduction
Manufacturing competitiveness is important in the
era of global competition and the fourth industrial
revolution. Product quality and facility management
are important for securing manufacturing
competitiveness. It is difficult to manage production
facilities on-site and most companies do not have
facilities maintenance workers. In many cases,
production is often stopped because the equipment
is stopped until repair workers arrive. Stable
production is impossible because of equipment
failure. Unstable production creates several losses.
Additionally, it can significantly affect the quality of
the product and cause significant losses to the
company.
Most equipment failures occur in rotating
equipment and bearing damage is the number one
cause of failure in rotating equipment. As the most
essential component of rotating mechanical
equipment, the condition of rolling bearings
significantly impacts the entire facility and
manufacturing line, [1], [2], [3].
If the bearing is damaged while the rotating
machine is running, the machine or the entire
equipment may jam or malfunction. The bearing
defects are caused y complex working conditions
and long-term operation, resulting in microcracks
inside the bearing and then internal microcracks
accumulate, gradually starting with surface
breakage. It is possible to detect the initial defect of
a bearing by grasping the accident condition of the
bearing from its vibration signal of the bearing, [1],
[3].
Previous bearing defect studies have undertaken
CNN, [8], [10], RNN, [11], [12], and Auto-encoder,
[13], [14]. Other than that, there was a GAN, [16],
[17], [20] study. In the above work, many data-
based and deep learning-based technologies have
been applied to increase accuracy and reliability, but
most technologies require large amounts of training
data, such as vibration, sound, motor and current
signals. However, obtaining sufficient data samples
of good quality to train all failure-type
classifications in actual manufacturing sites is
difficult. Therefore, there is limited data in actual
manufacturing sites, so it is necessary to use a more
effective algorithm.
This paper proposes a bearing fault diagnosis
method for siamese networks based on Few shot
learning. We compare accuracy and parameters
according to the number of blocks in the
WDCNN model. The method was experimented
on Case Western Reserve University (CWRU)
data, [18]. The composition of this paper is as
follows. Section 2 describes CWRU-bearing data,
a few shot learning, and the siamese networks.
Section 3 describes
the few-shot learning-based
bearing fault detection. Section 4 of the relevant
study describes the experimental procedures and
results. Finally, Section 5 presents the conclusion
and future research.
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.30
Daehwan Lee, Jongpil Jeong
E-ISSN: 2224-2678
276
Volume 21, 2022
Bearing Fault Detection based on Few-Shot Learning in Siamese
Network
2 Related Work
2.1 Few-Shot Learning
Few-shot learning was first addressed in the 1980s,
[4]. Recently, Few-shot learning has made great
progress in solving the data shortage problem, [5].
Few samples have been used for classification or
regression. Few-shot learning can categorize data
well with literally few samples. Few-shot learning
differs from conventional supervised learning
methods and does not generalize the training set to a
test set. It is divided into training, support and query
sets in all data. We train the model in the training
set, and the goal of training is to learn the
similarities and differences between objects.
2.2 Siamese Network
Siamese networks were first introduced in the early
1990s by Bromley and LeCun to solve signature
verification as an image-matching problem, [4]. Fig
1 shows the Siamese network structure. Unlike
ordinary CNN, they consist of two CNN models but
the two models have the same structure, [6], [19].
Fig
. 1: Siamese network
That is why it is also called the twin network.
Siamese networks usually use CNN models, but
other models can be used as well.
A Siamese network takes two images as input and
receives two images. The neural network outputs
two feature vectors extracted from two input images
and obtains a difference result vector between the
two feature vectors. Multiple dense layers are used
to process the resulting difference between vectors,
and finally, a number between scalar 0 and 1 is
obtained by applying a sigmoid activation function.
The output is close to 1 if the two images are of the
same class, and close to 0 if they are of different
classes. A network that vectorizes and returns the
similarity between two vectors (similarity in [0,1]).
The method to learn pairwise similarity is to train
with positive and negative samples using a training
set and randomly sample images from the training
samples. If the two samples are the same, it is a
positive sample, and if the two samples are different,
it is a negative sample, [2], [7], [8], [9]. For example,
there are the Husky, Elephant, Tiger, Parrot, Tea
Class, and Tiger classes. First, select one sample
from the tiger class and then select another type of
tiger sample from the same class. Both samples are
of the same class and are marked as 1. Conversely,
other classes can also be selected. First, select one
from the tiger class, then another sample from the
other class. The two samples are of different classes,
and if they are of different types, 0 is displayed.
2.3 Fault Detection
Fig 2 shows the bearing components. The basic
components consist of an outer ring, an inner ring, a
ball, and a cage (or retainer). A bearing is the basic
element of the machine that supports the rotating
shaft inside the machine and aids in the rotation of
objects by reducing friction.
Fig. 2: Components of rolling bearing
As the most essential component of rotary
mechanical equipment, the condition of the bearing
has a significant impact on the entire facility.
Bearing defects are caused by internal microcracks
and surface damage due to the accumulation of
microcracks. Fig 3 shows the bearing crack. Then,
Fig. 3: Bearing crack
due to the lack of a lubricant, contact between
bearing surfaces and abnormally excessive external
force is applied to the bearing. Defects in bearings
mainly appear in the inner diameter, outer diameter
or ball, [10]. The bearing defects are caused by
complex working conditions and long-term
operation, resulting in microcracks inside the
bearing and then internal microcracks accumulate,
gradually starting with surface breakage. It is
possible to detect the initial defect of a bearing by
grasping the accident condition of the bearing from
its vibration signal of the bearing, [1], [3].
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.30
Daehwan Lee, Jongpil Jeong
E-ISSN: 2224-2678
277
3 Few Shot Learning based Bearing
Fault Detection
It is a Siamese network few-shot learning
classification method based on our proposed
WDCNN model. Fig 4 consists of three stages with
the system structure presented in the paper. The data
preparation stage (Top), the Few-shot-learning
training & test (Middle), and the last is a siamese
network structure based on the WDCNN model
(Bottom).
Fig 4 shows the system structure. The first step is
data preparation. To verify the performance, we
selected 12k drive end-bearing fault data from the
Case Western Reserve University (CWRU) bearing
datasets as the experiment data. In the experiment,
each sample is extracted from two vibration signals.
Half of the vibration signal is used to generate the
training sample and the other half is used to
generate the test sample. Training samples were
generated with a window size of 2048 points and 80
shift steps. The test samples are also created without
overlap with the same window size.
The second is the training and testing phase.
During training, the model is trained with a set of
sample pairs of the same or different categories. The
input is a sample pair with the same or different
classes. The WDCNN model uses the two vibration
signals prepared above as inputs. Each neural
network outputs.
Fig. 4: System structure
Each neural network outputs two feature vectors
extracted from two input images. After the output,
the difference (distance) between the two feature
vectors was obtained. After that, a dense layer was
used to process differences between vectors. A
number is obtained between 0 and 1 by applying the
sigmoid activation function. Two similarities are
measured and the output is that if the two images are
of the same class, the output is close to 1 and the
other class is close to 0. The difference between the
target value and the predicted scalar is measured
using the loss function.
The test is carried out using several one-shot k-way
tests. In the N-shot K-way test, the model is
provided with a support set of K different classes
with N samples each. Determine which support set
class the test sample belongs to. In this paper, since
we proceeded with 5 Shot, the one-shot K-way test
is repeated 5 times each time the support set is
randomly selected from the training data. After five
trials, five probability factors (P1, P2, P3, P4, P5)
are calculated, and then the sum is calculated to
obtain the largest value, [2]. The third is the
structure of WDCNN according to the number of
blocks in the Siamese network. 1 Block consists of a
convolutional layer and a pooling layer. Each model
consists of 6, 5, 4 or 3 blocks. Compare the
accuracy of each model and the number of
parameters.
Table 1 shows the structure of the few-shot learning
model based on WDCNN
Table 1. Structure of few-shot learning model based
on WDCNN
No
Layer Type
Kernel
Size/Stride
Kernel
Number
Output
Size
(width * Depth)
padding
1
Convolution1
64*1/16*1
16
128*16
same
2
Pooling1
2*1/2*1
16
64*16
valid
3
Convolution2
3*1/1*1
32
64*32
same
4
Pooling2
2*1/2*1
32
32*32
valid
5
Convolution3
3*1/1*1
64
32*64
same
6
Pooling3
2*1/2*1
64
16*64
valid
7
Convolution4
3*1/1*1
64
16*64
same
8
Pooling4
2*1/2*1
64
16*64
valid
9
Convolution5
3*1/1*1
64
6*64
valid
10
Pooling5
2*1/2*1
64
3*64
valid
11
Fully-
connected
100
1
100*1
It consists of 5 convolutional layers and a pooling
layer, initially setting the kernel size to 64. One
block means the sum of one convolution layer and
one pooling layer, and the configuration of one
block follows the size, stride, and padding of the
kernel of con 5 in Table 1 and the size, stride, and
padding of the kernel of pooling 5.
Block 3, 4, 5, 6 models have reduced number of
blocks or added 1 block from the 5 blocks in the
base model. When reducing the number of blocks,
reduce sequentially from Conv5+Pooling5. For
example, in the case of block4, it means
conv4+pooling4 by reducing Conv5+Pooling5 in
Table 1. When adding block counts use the same
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.30
Daehwan Lee, Jongpil Jeong
E-ISSN: 2224-2678
278
kernel size, number and stride for conv5+pooling5.
For example, for block6, conv6 +pooling6 is added
after con5+pooling5.
4 Experiment and Results
4.1 Experiment Environments
Table 2 shows the experimental environment. The
hardware used in this study consisted of an Intel
Core i7- 8700k processor and GeForce GTX 3080ti.
The software uses Windows, Tensorflow 2.4 and
Python 3.6.
Table 2. System specification
Hardware Environment
Software Environment
CPU: Intel Core i7-8700K CPU@ 3.70GHZ
Six-core
window, Tensorflow
2.4
GPU: NVIDIA Geforce GTX 3080ti
Python 3.6
In this paper, the CWRU dataset was used. The
CWRU dataset is data collected for normal bearings,
Drive end, and Fan end defects. Drive end was
collected in samples measured 12k per second (12k
12000 vibrations per second) and 48k per second
(48k 48000 vibrations per second) and Fan end in
samples measured 12k per second (12k 12000
vibrations per second). There are three types of
bearing fault: Inner race, Outer race, and Ball and an
independent data set exist according to the size of
each bearing fault. Each fault size consists of 0.007
inches, 0.014 inches and 0.021 inches, respectively.
For each failure size, 0-3 hp was configured. Outer
Raceway Faults measured vibration for fault
conditions at 3 o’clock, 6 o’clock and 12 o’clock
positions, [12], [15].
Fig. 5: Bearing simulator of CWRU
Fig 5 shows the bearing simulator of CWRU. The
CWRU simulator is composed of the dynamometer,
Electric motor, Drive end bearing, Fan end bearing
and Torque transducer and encoder.
Table 3 shows the description of rolling bearing
datasets. There are 10 types of fault labels, as shown
in Table 3. Dataset A combined 660 tests of Load 1,
2 and 3 training and 25 tests to create 1980 training
sets and 75 test sets.
In this experiment, the test set of dataset A is set as
this and 60, 90, 120 and 200 samples are randomly
sampled from the training samples of dataset A,
respectively.
Table 3. Description of rolling bearing datasets
Fault
Location
None
Ball
Inner Race
Outer Race
Load
Fault
Diameter
(inch)
0
0.007
0.014
0.021
0.007
0.014
0.021
0.007
0.014
0.021
Fault
Labels
1
2
3
4
5
6
7
8
9
10
Dataset
A
Train
1980
1980
1980
1980
1980
1980
1980
1980
1980
1,2,3
Test
75
75
75
75
75
75
75
75
75
4.2 Evaluation Metric
Accuracy is the most intuitive indicator. The
problem, however, is that unbalanced data labels
can skew performance. The equation for this
parameter is:
Accuracy = |𝑇𝑃|+|𝑇𝑁|
|𝑇𝑃|+|𝐹𝑃|+|𝐹𝑁|+|𝑇𝑁| (1)
The recall is the ratio of a class to what the model
predicts as true among those that are actually true.
The recall can be expressed by the following
equation:
Recall(sensitivity) = |𝑇𝑃|
|𝑇𝑃|+|𝐹𝑁| (2)
Precision is the proportion of what the model
classifies as true that is actually true. Precision can
be expressed by the following equation:
Precision = |𝑇𝑃|
|𝑇𝑃|+|𝐹𝑃| (3)
The f1-score is the harmonic average of precision
and recall. When the data labels are unbalanced, the
performance of the model can be accurately
evaluated. The f1 score can be expressed in the
following equation:
F1 score = 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 (4)
4.3 Results
In this paper, we proceeded with 60, 90, 120, and
200 samples from the training sample of dataset A,
respectively and each experiment was set to a batch
size of 32. All experiments used the most important
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.30
Daehwan Lee, Jongpil Jeong
E-ISSN: 2224-2678
279
accuracies mentioned in the evaluation index. To
cover compensate for the shortcomings of accuracy,
an f-1 score was used. The experimental results are
shown in Fig 5 and Fig 6 show the change in
accuracy according to the number of blocks and Fig
5 shows the change in the parameters according to
the number of blocks.
Fig 6 shows a graph of the change in accuracy
with the number of blocks. The graph in Fig 6
shows the number of blocks 3, 4, 5 and 6 in the
WDCNN model, respectively and the x-axis shows
the number of samples 60, 90, 120 and 200. The
blue graph represents block 3, the orange graph
represents block 4, the gray graph represents block 5
and finally, the yellow graph represents block 6. It
can be seen that the accuracy of Block 5 is high in
most of the samples. However, it can be seen that
the accuracy of block 4 is higher in samples 120 and
200.
Fig. 6: Accuracy changes according to the
number of blocks
Fig 7 shows the Parameter changes according to
the number of
blocks
graph.
Fig. 7: Parameter changes according to the
number of blocks
You can see that it decreases rapidly as you go
through Block 3 and Block 4, and there is almost
no change in Block 5 and Block 6.
In Sample
120,
block 4 is about 1% higher than block 5, but
block 5 is more efficient because the number of
block 5 parameters is 20,000 less than the number
of block 4 parameters.
Table 4. Block accuracy & F1-score
Sample60
Block3
Block4
Block5
Block6
Accuracy
59.01
66.75
82.80
81.02
F1-score
58.28
68.72
78.90
79.58
Table 5. Block accuracy & F1-score
Sample90
Block3
Block4
Block5
Block6
Accuracy
74.47
83.22
91.37
85.42
F1-score
75.73
87.39
91.50
85.96
Table 6. Block accuracy & F1-score
Sample120
Block3
Block4
Block5
Block6
Accuracy
78.46
94.03
92.66
89.28
F1-score
78.10
94.25
78.28
87.00
Table 7. Block accuracy & F1-score
Sample200
Block3
Block4
Block5
Block6
Accuracy
86.16
94.63
94.32
93.69
F1-score
85.97
94.43
90.83
88.87
Tables 4, 5, 6, and 7 show the accuracy and f1-
score. Samples 60, 90, and 200 do not drop
significantly in the F1-score relative to accuracy.
However, unlike blocks 3, 4, and 6 on sample 120,
block 5 performs poorly due to its low f1 score
compared to its accuracy.
(a) Sample 60
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.30
Daehwan Lee, Jongpil Jeong
E-ISSN: 2224-2678
280
(b) Sample 90
(c) Sample 120
(d) Sample 200
Fig. 8: Cofusion Matrix of block 6
The confusion matrix plots the sample's predicted
results on the horizontal axis and the actual labels of
the samples on the vertical axis. Fig 8(a), (b), (c), (d)
shows the confusion matrix results for block 6. It is
difficult to diagnose in sample 120 compared to
other samples. In particular, it can be seen that it is
difficult to diagnose in other categories except
category 4.
5 Conclusion
In this paper, we propose a siamese network
structure for classifying bearing defects through
Few shot learning on the CWRU data set and see the
changes in accuracy and parameters according to the
number of blocks in WDCNN.
In future studies, in addition to the CWRU dataset,
it can be considered in future research as a dataset
with noise added to a dataset in the actual field.
Additionally, we plan to conduct research focusing
on improving bearing fault diagnosis accuracy while
reducing the number of parameters through other
models other than WDCNN.
Acknowledgement:
This research was supported by the National
Research Foundation of Korea (NRF) grant funded
by the Korea government (MSIT) (No.
2021R1F1A1060054), the MSIT (Ministry of
Science and ICT), Korea, under the ITRC
(Information Technology Research Center) support
program (IITP-2022-2018-0-01417) and the ITC
Creative Consilience Program (IITP-2022-2020-0-
01821) supervised by the IITP (Institute for
Information Communications Technology Planning
Evaluation) supervised by the IITP (Institute for
Information Communications Technology Planning
Evaluation)” Corresponding author: Professor
Jongpil Jeong.
References:
[1] S. Lee and J. Jeong, “SSA-SL Transformer for
Bearing Fault Diagnosis under Noisy Factory
Environments”, Journal of electronics,
Vol.11, Issue. 9, May. 2022, pp. 1-21.
[2] A. Zhang, S. Li, Y. Cui, W. Yang, R. Dong
and J. Hu, Limited Data Rolling Bearing
Fault Diagnosis with Few-Shot Learning,
IEEE Access, Vol.7, Aug. 2019, pp. 110895-
110904.
[3] S. Han, S. Oh and J. Jeong, “Bearing Fault
Diagnosis Based on Multiscale Convolutional
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.30
Daehwan Lee, Jongpil Jeong
E-ISSN: 2224-2678
281
neural network Using Data Augmentation”,
Journal of Sensors, Feb. 2021, pp. 1-14.
[4] K. Yip and G. Sussman, “Sparse
Representation
s for fast, One-Shot Learning”, National
Conference on Artificial Intelligence, July
1997, pp. 1-29.
[5] G. Koch, R. Zemel, and R. Salakhutdinov,
Siamese neural networks for one-shot image
recognition”, ICML Deep Learning
Workshop, pp. 130, July 2015.
[6] Y. Wang, Q. Yao, J. Kwok and L. M. Ni,
“Generalizing from a few examples: A survey
on few-shot learning, Arxiv, Apr. 2019, pp 1-
33.
[7] Z. Cui, X. Kong and P. Hao, Few-shot
Learning for Rolling Bearing Fault Diagnosis
Based on Residual Convolution Neural
Network”, 2021 4th International Conference
on Artificial intelligence and Big Data, May.
2021, pp. 320-324.
[8] Y. Yang, H. Wang, Z. L and Z. Y, “Few Shot
Learning for Rolling Bearing Fault Diagnosis
Via Siamese Two-dimensional Convolutional
Neural Network”, 2020 11th International
conference on Prognostics and System Health
Management, Oct. 2020 pp. 373-378.
[9] D. Wu, F. Zhu, L. Shao, “One shot learning
gesture recognition from RGBD images”,
2012 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition
Workshops, Jun. 2012. pp.7-12.
[10] S. Oh, S. Han and J. Jeong, Multi-Scale
Convolutional Recurrent Neural Network for
Bearing Fault Detection in Noisy
Manufacturing Environments, Journal of
Applied Sciences, Vol.11, Issue.9, May. 2021,
pp. 1-16.
[11] M. Alrifaey, W. Lim and C. Ang, “A Novel
Deep Learning Framework Based RNN-SAE
for Fault Detection of Electrical Gas
Generator”, IEEE Access, Vol.9, Jan.
2021, pp. 21433-21442.
[12] Q. yu, Z. Peng, X. cheng and F. dong, “RNN
based Method for Fault Diagnosis of
Grinding System”, 2017 IEEE 7th Annual
International Conference on CYBER
Technology in Automation, Control, and
Intelligent Systems (CYBER), Aug. 2018, pp.
673-678.
[13] X. Lin, B. Li, X. Yang and J. Wang “Fault
Diagnosis of Aero-engine Bearing Using a
Stacked Auto-Encoder Network”, 2018 IEEE
4th Information Technology and
Mechatronics Engineering Conference
(ITOEC), Dec. 2018, pp. 545-548.
[14] C. Liu, B. Chen, H. Zhang and X. Wang,
“Fault Diagnosis Application of Short Wave
Transmitter Based on Stacked Auto-Encoder”,
IEEE 4th International Conference on
Computer and Communications(ICCC), Dec.
2018, pp.119-123.
[15] D. Neupane and J. Seok, “Bearing Fault
Detection and Diagnosis Using Case Western
Reserve University Daataset With Deep
Learning Approaches: A review”, IEEE
Access, Vol.8, Apr. 2020, pp. 93155-93178.
[16] Q. Guo, Y. Li, Y. Song, D. Wang and W.
Chen, “Intelligent Fault Diagnosis Method
Based on Full 1-D Convolutional Generative
Adversarial Network”, IEEE Transactions on
Industrial Informatics, Vol.16, Issue.3, Aug.
2019, pp.2044-2053.
[17] F. Zhou, S. Yang, H. Fujita, D. Chen and C.
Wen, “Deep learning fault diagnosis method
based on global optimization GAN for
unbalanced data”, Knowledge-Based Systems,
Vol.187, Jan. 2020, pp.1-19.
[18] Case Western Reserve University(CWRU)
(https://engineering.case.edu/bearingdatacente
r).
[19] A. Parnami, M. Lee "Learning from Few
Examples: A Summary of Approaches to
Few-Shot Learning", Arxiv, Mar. 2022, pp. 1-
32.
[20] C. Chen, Z. Liu, G. Yang, C. Wu and Q. Ye
"An Improved Fault Diagnosis Using 1D-
Convolutional Neural Network Model",
Journal of electronics, Vol.10, Issue.1, May.
2022, pp. 1-21.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.30
Daehwan Lee, Jongpil Jeong
E-ISSN: 2224-2678
282