Abstract—Aiming at the problems of low accuracy and slow diagnosis speed in the existing fault diagnosis model of

electrical machine bearing, this paper presents an electrical machine bearing fault diagnosis method based on Deep

Gaussian Process of particle swarm optimization(DGP). A total of 10 characteristics of 9 damage states and no fault

states of the bearing are determined, constructing a deep Gaussian process model for electrical machine bearing fault

diagnosis based on expectation propagation and Monte Carlo method, and use the particle swarm optimization

algorithm to perform parameter searching optimization for its induction point value. The experimental results show

that the fault recognition rate of DGP on the CWRU data set reaches 95%, significantly better than other deep learning

methods, integration methods and machine learning methods. DGP method can better diagnose electrical machine

bearing faults, provide technical support for the safe operation of the electrical machine which are important for real

industrial applications.

Keywords—Deep Gaussian Process; electrical machine fault diagnosis; particle swarm optimization.

Received: June 24, 2021. Revised: March 18, 2022. Accepted: April 21, 2022. Published: May 18, 2022.

1. Introduction

ith the deepening of machine learning research in the

field of artificial intelligence, machine learning

technology is increasingly used in the field of pattern

recognition[1]. Traditional recognition tasks mainly apply

machine learning models such as support vector machine

(SVM), neural network, and random forest [2]. In recent years,

deep learning has developed rapidly in academia and industry,

significantly improve the accuracy of recognition on many

traditional recognition tasks, demonstrates its superb ability to

handle complex recognition tasks, attracted a large number of

experts and scholars to conduct research on its theory and

application [3-5].

Electricity fault diagnosis technology can find faults in

electrical equipment at the early stage of fault diagnosis,

therefore, timely targeted maintenance can be carried out,

saving a lot of time and funds for repairing faults, while

avoiding production stalls, it also improves economic

efficiency. Also in the memory fault detection technology is also

necessary, Eitan Yaakobi [6] et al. proposed a structure of single

error correction WOM code with better WOM rate, this

structure can effectively update and store the data in the

memory. Liang Xi [7] et al. proposed a Multisource

Neighborhood Immune Detector Adaptive Model for Anomaly

Detection, solve various problems existing in the real-valued

shape-space under dynamic environment mentioned and

improve the overall detection performances, and got better

stability.

In today's production activities and daily life, the electrical

machine is the most important motive power and drive unit, and

it has been widely used in various fields of people's production

and life.

The fault detection of the electrical machine often needs to

detect the fault in a very short time, so as to carry out the

targeted maintenance in time, so it needs fast detection speed

and flexible detection method. The Gaussian Process has the

characteristics of low computational complexity and fast

convergence speed in a small sample space. The Gaussian

process is named after the German mathematician Carl

Friedrich Gauss to commemorate his proposal of the concept of

normal distribution, developed based on statistical learning

theory and Bayesian theory. In the following decades, rich

research results have been obtained. Ori Shental [8] et al.

proposed a Gaussian Belief Propagation Solver for Systems,

compared with the traditional method, this method has a faster

convergence speed. D Bickson [9] and others proposed a

Gaussian Belief Propagation Based Multiuser Detection

algorithm, compared with the previous formula, the new

Electrical Machine Bearing Fault Diagnosis Based on Deep

Gaussian Process Optimized by Particle Swarm

HAI GUO1,*, HAORAN TANG1, XIN LIU1, JINGYING ZHAO1, 2, LIKUN WANG3

1 College of Computer Science and Engineering, Dalian Minzu University,

18 LiaoheWest Road, Dalian Development Zone, Dalian, 116600, CHINA

2 Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology,

Dalian, 116023, CHINA

3 College of Electronic and Electrical Engineering, Harbin University of Science and Technology,

Harbin 150080, CHINA

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

100

Volume 21, 2022

algorithm reduces memory requirements, calculation steps and

the number of messages passed. The deep Gaussian process has

certain theoretical advantages and is suitable for the research of

electrical machine fault detection technology. The deep

Gaussian process model is a deep model that superimposes

multiple Gaussian processes, any number of Gaussian processes

can be superimposed. The Gaussian process controls the

mapping between layers and also has the advantages of the

Gaussian process. Zhao [10] et al. proposed Computer

Modeling of the Eddy Current Losses of Metal Fasteners in

Rotor Slots of a Large Nuclear Steam Turbine Generator Based

on Finite-Element Method and Deep Gaussian Process

Regression, the analysis results show that compared with the

independent finite-element analysis, this method reduces the

design cycle time and improves the design efficiency for a

large-capacity turbine generator. Guo [11] et al. proposed

Predicting Temperature of Permanent Magnet Synchronous

Motor(PMSM) Based on Deep Neural Network. This model

can effectively predict the temperature change of stator

winding, provide technical support to temperature early

warning systems and ensure safe operation of PMSMs. Wang

[12] et al. proposed Cuckoo Search Algorithm for

Multi-Objective Optimization of Transient Starting

Characteristics of a Self-Starting HVPMSM. Experiments show

that the optimization speed of this method is significantly faster

than other methods, and this method has a faster convergence

speed while ensuring accuracy.

Existing deep learning models have been able to diagnose

electrical machine faults well, but there are still many problems

such as insufficient accuracy and slow training speed. For

example, the semi-supervised training method of deep belief

network has the problem of slow training speed, the

autoencoder network has problems such as limited expression

features and difficulty in reconstruction, convolutional neural

network training requires a lot of data, and the effect is not ideal

when processing industrial signals, RNN has problems such as

gradient disappearance [13]. Therefore, this paper uses the

strong recovery ability of the deep Gaussian process to outliers

and the strong non-linear problem processing ability to

construct a deep Gaussian process classification model

optimized by particle swarm optimization, and apply it to the

fault diagnosis of electrical machine rolling bearings, and early

warning of motor bearing faults based on abnormal changes in

the signal before the fault occurs, so as to avoid electrical

machine damage and reduce losses caused by this model.

2. Deep Gaussian Process Classification

Model

2.1 Deep Gaussian process

For a given N observed values,

1,...,()

y y y

and

D-dimensional coordinates

1,...,()

X X X

, The output of

each hidden layer of a DGP model with L layers can be

expressed as

{}

H



. The number of columns in Hi is the

number of nodes in layer L. It is also called the dimension of

the layer and can be written as Di.,which can be expressed by

equation (1).

1,1 1,

,1 ,

N N D











(1)

in:

,()

l l i l

n i n

h f h 



(2)

is given by Gaussian priors, usually for ease of

understanding, latent variable dimensions are ignored and

is written as

,and

,()

f

is written as

()

f

. First,

set a zero-mean Gaussian prior for

( | )



of each layer,

for layers with multiple nodes, the prior function is an

independent Gaussian function inside each layer. Assuming

that the noise of i.i.d can be parameterized in the output of

each layer, the prior and dependent variables of the deep

Gaussian process can be regarded as equation (3) and

equation (4):

( | ) ( | 0, ), 1, ,

l l l l

p f GP f K l L





(3)

1 2 1 2 0

( | , , ) ( | ( ), ),

l l l l l

l n l n l n n

p h f h hN f h h x











(4)

represents the kernel matrix between a given input and

layer L, where

( , )

l l l

K k h h





, For the layer with

1L

the input will no longer be a certain value, and the

corresponding output will not obey the normal distribution.

When

1L

, the model will become a shallow Gaussian

process model . Finally, the conditional probability of a

given target value in the output layer is shown in equation

(5).

1 2 1 2

( | , , ) ( | ( ), )

L L L

L n L n L

p y f f hNhy









(5)

Figure 1 is an example of a two-layer model, where a hidden

layer and an output layer are used for a two-dimensional

problem. The number of nodes

()

of the output layer will be

equal to the dimension of the observation value

of the

regression problem, or equal to the number of classes of the

classification problem. Like the shallow Gaussian process

model, adding appropriate sparse technology to the deep model

can effectively reduce the computational complexity of the deep

Gaussian process.

By omitting the dimension in the symbol and adding a

Gaussian prior to the induction point of each layer, the final

Sparse Depth Gaussian Process Model can be written as

equation (6)-equation (8).

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

101

Volume 21, 2022

u ,u

(u | ) N(u | 0, ),l 1,...,L







(6)

h ,h

(h | u , h , ) (h | A u , K )

l l l l l l l

l n n n

p N Q











(7)

h ,h

( | u , h , ) (y | A u , K )

L L L L l

L n n n

p y N Q











(8)

Fig. 1 An example of a DGP

XZ0

Fig. 2 Gauss process model with two-layer depth sparseness

The sub-index of the covariance matrix k is their

corresponding output, for example,

u ,u

represents the

covariance matrix of the induced point uL. It takes position

,(z ,z )





as a parameter, the covariance matrix

of nodes in a layer, and uses the output of the previous

layer

,h (h ,h )





to construct. We also define

matrices A and Q as equation (9) and equation (10):

,u u ,u

l l l l

A K K

(9)

,u u ,u u ,

l l l l

Q K K K





(10)

Figure 2 shows a 2-layer deep sparse Gaussian process

model. The hidden layer depends not only on the output of the

layer but also on the induced point variables.

2.1 Reasoning technology of deep Gaussian process

In the deep Gaussian process model, the output of inference

induction points

{u }L

ll

and hidden layer

{h }l



is performed

by marginalizing latent variables, which can predict the

posterior probability of the test set and calculate the slight

possibility of hyperparameter adjustment. However, both

variables are difficult to handle. When considering a deep

Gaussian process model with L=2 layers, the joint distribution

of the model is shown in equation (11).

       

1 2 2 2 2 1 2 1 1 2

1 1 2 1

p , h ,{u } |{ , } , | u , h , h | u , , | ...(11)

l l l l

y X p y p X p u

    







To simplify the description, all model parameters are now

grouped into equation (12).

 

1 2 2

0 1 1

{z } ,{ , }

  





(12)

The marginal possibility is obtained by the marginalization

{u }

ll

and the hidden variable

in equation (11) to obtain

equation (13).

 

1 2 1 2 1

y | , ,h ,{u } | ,

p X p y X du du dh







(13)

However, some of the integrals in equation (13) are difficult

to handle because they involve calculating the covariance

function with respect to random variables [14]. Integration can

be achieved by joint distribution in extension, as shown in

equation (12). Starting from equation (10), a corresponding

distribution of the output of layer

 

h | hl

p

is obtained,

need to calculate a density-dependent nonlinear kernel

function (Nonlinear kernel function of density)

hl

Another thing that needs to be predicted is the posterior

distribution on the induction point, which also requires the

calculation of the integral (13) in the model evidence as shown

in equation (14).

 

2 1 2 1

{u } | , , ,h ,{u } | , ...(14)

p X y p y X dh

p y X









This result can be generalized to the case of

L2

. For the

sake of simplicity, the layer dependency will be removed from

the symbol, and

u {u }

l



and

{h }

h



will be

abbreviated, for any number of layers, the following

(generalized) induction point becomes

 

u | , ,p X y



. In order

to calculate the marginal likelihood and posterior, approximate

reasoning techniques are needed.

Some work in the literature related to the deep Gaussian

process suggests the use of a general sampling algorithm [15] to

evaluate the logarithmic probability, the main difference from

the method explained in this section is that the sampling

algorithm does not set any distribution on the output (assuming

the induced output is fixed), so they are included as model

parameters. In this way, some of the benefits of regularization

are lost. They also proposed to train the model by maximum a

posteriori estimation (MAP), but the author did not compare the

method with any other state-of-the-art technology, and no

improvement was observed when the number of layers was

increased.

Another method explained in [16] involves using a stochastic

characterization vector

 





to approximate the kernel

function

 

,xkx 

of the Gaussian process. The core can be

approximated as an inner product as shown in equation (15).

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

102

Volume 21, 2022

     

,x T

k x x x







(15)

The results show that the deep Gaussian process model can

be regarded as a Bayesian neural network, and the output of the

layer is given by

 

g wx b

 

g

is the activation function,

is the probability distribution

 

, and

is Bayesian

noise.

2.3 Deep Gaussian process based on expectation

propagation and Monte Carlo method

The deep Gaussian process based on the expectation

propagation and Monte Carlo method follows the method

explained in [17], and uses the binding factor constrained

expectation propagation algorithm to approach the posterior

inducing point. As shown in Figure 3, the process of calculating

ln n

at a single point in a 2-layer deep Gaussian process

model. In the first level,

 

is given by a normal distribution

(blue in the figure above) sampled from it. In the second layer,

the true distribution of

ln n

(blue) is no longer a Gaussian

distribution, but is given by equation (16). The proposed

method calculates and propagates the samples through the

network (green) to make the model more flexible and able to

approximate the heterogeneous distribution.

The final form of

approximated by s samples is given by

the Gaussian mixture as shown in equation (16).

 

1ˆ

Z q y h







(16)

Where

ˆL

h

represents

s th

samples from the

corresponding distribution

ˆ ˆ

( | )

q h h



, which can be

calculated by the above sampling technique.

Fig. 3 The deep gauss process model is an example of

network propagation

Contrary to the method proposed in [14], this method can

capture the complex dependencies between DGP layers. In the

literature [18], this method is also suitable for stochastic

gradient descent training, such as small batch training. The final

form of marginal likelihood approximation is shown in equation

(17).

 

( ) [(1 ) ( ) ( ) ( )] ln ... 17

Ll l l

q prior b

F N N Z

   



       



Where α includes the model to be adjusted and the AEP

parameters, |B| is the selected mini-batch size and

can be

calculated for each mini-batch using equation (16).

3. Particle swarm optimization optimizes

the deep Gaussian process classification

electrical machine rolling bearing fault

diagnosis model

The deep Gaussian process electrical machine bearing fault

diagnosis classification model constructed in this section uses

expectation propagation and Monte Carlo methods to

approximate the Gaussian posterior, and the particle swarm

algorithm is used to search the number of induced points in the

deep Gaussian process model in the range. The construction

model adopts a 5-layer network structure, namely the input layer,

the 3-layer hidden layer, and the output layer. The hidden layers

all use the square exponential kernel as the kernel function of

the Gaussian map, as shown in equation (18).

( , ) 2

K x x l



  





exp -

(18)

The overall model network structure is shown in Figure 4:

h11

h12

h13

h14

h15

f11(x)

f12(x)

f13(x)

f14(x)

f15(x)

h21

h22

h23

h24

f21(x)

f22(x)

f23(x)

f24(x)

h31

h32

h33

f31(x)

f32(x)

f33(x)

h41

h42

f41(x)

f42(x)

Y10

...

Inputs First hidden

layer

Second hidden

layer

Third hidden

layer Outputs

Fig. 4 Deep gauss process model network architecture

The particle swarm optimization algorithm (PSO) is a kind of

swarm intelligence algorithm, and its design is based on the

simulation of bird predation behavior. Assuming that there is

only one food in the target area (that is, the optimal solution in

the optimization problem), the goal of the flock of birds is to

find this food source. Throughout the entire search process, the

birds communicate with each other to let other birds find their

position, and through collaboration, they can judge whether

they find the optimal solution, and at the same time, they can

also obtain the information of the optimal solution. Passed to the

entire flock of birds, and finally the entire flock of birds can

gather around the food source, that is, we have found the

optimal solution, that is, the problem converges [19].

The particle swarm optimization algorithm simulates the

birds in a flock of birds by designing a massless particle. The

particle has only two attributes: speed V and position X. Speed

represents the speed of movement, and position represents the

direction of movement. Each particle searches for the optimal

solution individually in the search space, which is recorded as

the current individual extreme Pbest, and the individual extreme

value is shared with other particles in the entire particle swarm,

find the optimal individual extreme value as the current global

optimal solution Gbest of the entire particle swarm. All particles

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

103

Volume 21, 2022

in the particle swarm adjust their speed and position according

to the current individual extremum Pbest found by themselves

and the current global optimal solution Gbest shared by the entire

particle swarm. The idea of particle swarm optimization

algorithm is relatively simple, mainly divided into: 1. Initialize

the particle swarm; 2. Evaluate the particles, that is, calculate

the fitness value; 3. Find the individual extreme value; 4. Find

the global optimal solution; 5. Modify the speed and position of

the particles. The particle swarm optimization algorithm is used

to search the induction point value of the deep Gaussian process

model, and the optimal parameter setting of the model is

determined to improve the accuracy of model classification.

The specific process is shown in Figure 5.

Start

Initialize population size, position and

speed

Calculate the current individual extremum

of particles at all levels and find the current

global optimal solution of the whole

particle swarm

Initialization depth Gaussian process model

Updates the speed and position of

individual particles

Whether the termination

conditions are met

Output optimal solution

End

Fig. 5 Particle swarm optimization

4. Results and discussion

4.1 Sample Database

The test data comes from Case Western Reserve University

(CWRU) Rolling Bearing Data Center. The CWRU data set is a

world-recognized standard data set for bearing fault diagnosis.

At present, the bearing fault diagnosis algorithm is updated

quickly. In order to evaluate the superiority of the algorithm

proposed in this chapter, all experiments use CWRU bearing

data.

The CWRU bearing center data acquisition system is shown

in Figure 6. The test object of this test is the drive end bearing in

the picture. The model of the bearing to be diagnosed is the deep

groove ball bearing SKF6205, which is manufactured by EDM

under the load of 0HP, 1HP, 2HP, and 3HP. The sampling

frequency of the system is 12kHz. There are a total of 3 types of

defects in the diagnosed bearing, which are rolling element

damage, outer ring damage and inner ring damage. The

diameter of the damage is 0.007inch, 0.014inch and 0.021inch,

and the specific information of 9 kinds of damage states is

shown in the table. 4.1. In the experiment, 2048 data points are

used for diagnosis each time. In order to facilitate the training of

the deep Gaussian network, each segment of the signal is

normalized, and the equation of the normalized processing is

shown in (19).

min

max min

xxx





(19)

Fig.6 Data acquisition system of CWRU rolling bearing

Select 10000 pieces of data under 0HP load as shown in

Table 4.1. There are 1000 pieces of data for each type of fault,

including 900 training samples and 100 test samples. The

training sample adopts the data enhancement method as shown

in Figure 7. The length of the training sample collected each

time is 2048, the offset is 1, and there is no overlap between the

test samples.

Fig.7 Data enhancement

Table 4.1 Test data set description

*Damage diameter is 0.007 inch (inch), **Damage diameter is

0.014 inch,***Damage diameter is 0.021 inch

4.2 Experimental conditions and evaluation

indicators

The CPU used in the simulation experiment in this section is

Intel i7-7130U, the memory is 4GB RAM, the programming

language used is Python, and the framework used is Tensorflow,

Keras and sklearn. Using 10,000 signals under zero load in the

CWRU data set as samples, they are divided into 9 types of

faulty bearings and 1 type of non-faulty bearings, totaling 10

categories. Each type of fault contains 1000 pieces of data, 90%

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

104

Volume 21, 2022

of which are taken as the training set, and the remaining 10% as

the test set. The accuracy, precision, recall and F1-Score under

macro-average are used as the evaluation indicators of the

model. The specific formula is as follows.

In the classification problem, to analyze the effect of data and

classifiers, evaluation indexes can be used for auxiliary analysis.

In this paper, the following evaluation indexes are used to

comprehensively analyze and discuss the experimental results.

Accuracy is the most primitive evaluation index in

classification problems. The definition of accuracy is the

percentage of correct results in the total sample. The equation is

shown in (20):

TP TN

Accuracy TP TN FP FN





(20)

in:

 True Positive (TP): Positive samples predicted to be

positive by the model;

 False Positive (FP): Negative samples predicted to be

positive by the model;

 False Negative (FN): Positive samples predicted to be

negative by the model;

 True Negative (TN): Negative samples predicted to be

negative by the model.

Precision rate is the probability of the actual positive samples

among all the predicted positive samples, which can be

expressed by equation (21).

Precision TP FP



(21)

Recall rate is the probability of being predicted to be a

positive sample in a sample that is actually positive, which can

be expressed by equation (22).

FNTP

call 

Re

(22)

F1-score is a weighted average of the precision and recall of

the model. The closer F1-score is to 1, the better the empirical

effect is. The evaluation index f1-Score can be expressed by

equation (23).

Pr Re

2Pr Re

ecision call

Fecision call





 

(23)

The specific parameters of the deep Gaussian process

electrical machine rolling bearing fault diagnosis model are set

as follows: the maximum iteration times is 500, the minimum

batch_size is 100, the learning rate is 0.01, and each node

contains The number of samples is 15, and the noise level is set

to 1e-5. On this basis, the particle swarm optimization algorithm

is used to search for the number of induced points in the range of

[10,100]. The parameters of the particle swarm optimization

algorithm are set as follows: population size is set to 100, the

maximum number of iterations is 150, and the inertia factor is

set to 2, and the weight factor is set to 0.5.

4.3 Experimental results and discussion

Under 10,000 data sets of electrical machine rolling bearings

under 0HP, the deep Gaussian process model based on particle

swarm optimization algorithm is used to classify faults. When

the induction point is 50 and the number of iterations is 40

times, the fault diagnosis classification accuracy reaches the

highest 0.95. Under the same experimental conditions, the

results of comparison with deep learning models such as Deep

RNN LSTM DNN DGP

0.0

0.2

0.4

0.6

0.8

1.0

Recognition accuracy

Model

Accuracy

Precision

ReCall

F1-Score

Fig.8 The accuracy of deep gauss process classification

model compared with other deep learning models

SGD KNN LR DT SVC GaussianNB DGP

0.0

0.2

0.4

0.6

0.8

1.0

Recognition accuracy

Model

Accuracy

Precision

ReCall

F1-Score

Fig.9 Comparison of accuracy between deep gauss

process classification model and other machine learning

algorithms

RF AdaBoost Bagging ET GB DGP

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Recognition accuracy

Model

Accuracy

Precision

ReCall

F1-Score

Fig.10 The accuracy of the depth gauss process

classification model was compared with other ensemble

learning models

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

105

Volume 21, 2022

Neural Network DNN, Recurrent Neural Network RNN, and

long short-term memory network (LSTM) are as follows:

Shown in Figure 8.

It can be seen from Figure 8 that the accuracy of the deep

Gaussian process for bearing fault diagnosis under the samples

used in this chapter is up to 0.95, while LSTM and RNN also

maintain high accuracy rates of 0.93 and 0.88 respectively, the

accuracy of deep neural network is 0.74, and the accuracy of

deep Gaussian process model is higher than that of the above

deep learning model.

In the same experimental environment, machine learning

algorithms such as stochastic gradient descent (SGD), k-nearest

neighbor (KNN), decision tree (DT), support vector machine

(SVC), Gaussian NB and logistic regression (LR) are

compared.

The experimental results are shown in Figure 9. The

classification accuracy of the deep Gaussian process fault

diagnosis model is much higher than other commonly used

machine learning algorithms.

Compared with ensemble learning algorithms such as

RandomForest (RF), AdaBoost, Bagging, ExtraTree (ET) and

GradientBoosting(GB) in the same experimental environment,

the experimental results are shown in Figure 10. The

classification accuracy of the deep Gaussian process model for

bearing faults is higher than that of the above ensemble learning

algorithm, which is more suitable for the fault diagnosis of

electrical machine bearings under large samples.

5. Conclusion

A fault diagnosis classification model of deep Gaussian

process electrical machine rolling bearing based on particle

swarm optimization is proposed. The basic components and

structural parameters of the deep Gaussian process model are

introduced. The parameter propagation formula based on

expected propagation and Monte Carlo method is derived. The

proposed model is trained and tested on the CWRU rolling

bearing data set. The fault recognition rate of the trained model

on the test set can reach 95 %, which is higher than that of other

machine learning, ensemble learning and deep learning

algorithms. It can better diagnose the electrical machine bearing

fault and provide technical support for the safe operation of the

motor.

Acknowledgment

This work was supported only by the Science Foundation of Ministry

of Education of China (No.18YJCZH040).

References

[1] Zhang X Y, Bengio Y, Liu C L, “Online and offline

handwritten Chinese character recognition: A

comprehensive study and new benchmark,” Pattern

Recognition, vol. 61, no. 61, pp. 348-360, 2017.

[2] Xie Z, Sun Z, Jin L, et al., “Learning Spatial-Semantic

Context with Fully Convolutional Recurrent Network for

Online Handwritten Chinese Text Recognition,”. IEEE

Transactions on Pattern Analysis and Machine Intelligence,

vol. 40, no.8, pp. 1903-1917, 2018.

[3] J. Thiagarajan V B, “Designing Accurate Emulators for

Scientific Processes using Calibration-Driven Deep

Models,” Nature Communications, vol. 11, no.1, pp.

5622–5632, 2020.

[4] Sagheer A K M, “Unsupervised Pre-training of a Deep

LSTM-based Stacked Autoencoder for Multivariate Time

Series Forecasting Problems,”. Sentific Reports, vol. 9, pp.

19038, 2019.

[5] Kim R G, Doppa J R, Pande P P, et al, “Machine Learning

and Manycore Systems Design: A Serendipitous

Symbiosis,” Computer, vol. 51, no. 7, pp. 66–77, 2018.

[6] Y Aa Kobi E, Siegel P H, Vardy A, et al., “Multiple

error-correcting WOM-codes,” IEEE International

Symposium on Information Theory. IEEE, vol. 58, no. 4,

pp. 2220-2230, 2012.

[7] O. Shental, P. H. Siegel, J. K. Wolf, D. Bickson and D.

Dolev, “Gaussian belief propagation solver for systems of

linear equations,” 2008 IEEE International Symposium on

Information Theory, pp. 1863-1867, 2008.

[8] D. Bickson, D. Dolev, O. Shental, P. H. Siegel and J. K.

Wolf, “Gaussian belief propagation based multiuser

detection,” 2008 IEEE International Symposium on

Information Theory, pp. 1878-1882, 2008.

[9] L XI*, Rui-Dong Wang, et al., “Multi-source neighborhood

immune detector adaptive model for anomaly detection,”

IEEE Transactions on Evolutionary Computation, vol. 25,

no. 3, pp. 582-594, 2021.

[10] Jingying Zhao, Hai GUO, Likun Wang, Min Han,

“Computer Modeling of the Eddy Current Losses of Metal

Fasteners in Rotor Slots of a Large Nuclear Steam Turbine

Generator Based on Finite Element method and Deep

Gaussian Process Regression,” IEEE Transactions on

Industrial Electronics, vol. 67, no. 7, pp. 5349-5359, 2020.

[11] L. Wang, H. Guo*, F. Marignetti, C. D. Shaver and N.

Bianchi, “Cuckoo Search Algorithm for Multi-Objective

Optimization of Transient Starting Characteristics of a

Self-Starting HVPMSM,” IEEE Transactions on Energy

Conversion, vol. 36, no. 3, pp. 1861-1872, 2021.

[12] Guo, Hai, Qun Ding, Yifan Song, Haoran Tang, Likun

Wang, and Jingying Zhao. 2020. “Predicting Temperature

of Permanent Magnet Synchronous Motor Based on Deep

Neural Network,” Energies, vol. 13, no. 18, pp. 4782,2020.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

106

Volume 21, 2022

[13] Jia F G L, Lei Y, “A neural network constructed by deep

learning technique and its application to intelligent fault

diagnosis of machines,” Neurocomputing, vol. 272, no. 10,

pp. 619–628, 2017.

[14] Damianou A C L N D, “Deep Gaussian Processes”,

Computer Science, pp. 207–215, 2012.

[15] Depeweg, S. et al., “Learning and Policy Search in

Stochastic Dynamical Systems with Bayesian Neural

Networks,” Machine Learning, 2016.

[16] Cutajar K M P, Bonilla E V, “Random Feature Expansions

for Deep Gaussian Processes,” Machine Learning, no. 70,

pp. 884–893, 2016.

[17] Bui T D, Hernandez-Lobato D, Li Y, et al., “Deep Gaussian

Processes for Regression using Approximate Expectation

Propagation,” 33rd International Conference on Machine

Learning, vol. 48, pp. 76-85,2016.

[18] Salimbeni H D M, “Doubly Stochastic Variational

Inference for Deep Gaussian Processes,” Neural

information processing systems, no. 30, pp. 4588–4599,

2017.

[19] J. Kennedy and R. Eberhart, “Particle swarm

optimization,” Proceedings of ICNN'95 - International

Conference on Neural Networks, vol.4, pp. 1942-1948,

1995.

Hai Guo received the B.S. in Electronic

Engineering from the Heilongjiang

University, in 2000, the M.S. in Pattern

Recognition and Intelligent Systems

from the Kunming University of Science

and Technology, in 2004, and the Ph.D.

degree in Material Science from the

Harbin University of Science and

Technology (HUST), Harbin, China.

Since 2010, he has been an Associate

professor with the College of Computer Science and

Engineering, Dalian Minzu University. He has authored over 30

articles in international journals and conference proceedings.

His current research interests include pattern recognition and

their applications.

Haoran Tang recrived the B.S. degrees

from College of Mathematical, Shanghai

Normal University, Shanghai, China, in

2018, the M.S. degree in Computer

Science and Engineering, Dalian Minzu

University in 2020.He current research

interests include artificial intelligence,

machine learning and electrical

engineering.

Xin Liu received the B.S. degrees from

College of Computer, Cangzhou Normal

University, Hebei, China, in 2020. He is

currently working towards the M.S. degree

at Computer Science and Engineering,

Dalian Minzu University.He current

research interests include deep learning

and artificial intelligence.

Jingying Zhao received the B.S. and M.S.

degrees from School of Computer Science

and Technology, Changchun University of

Science and Technology, Jilin, China, in

2000 and 2003. Since 2013, she has been

an Associate professor with the College of

Computer Science and Engineering, Dalian

Minzu University. She is currently working

towards the Ph.D. degree at Faculty of

Electronic Information and Electrical Engineering, Dalian

University of Technology, Liaoning, China. Her current

research interests include pattern recognition and machine

learning method and their applications.

Likun Wang(M’17) received the B.Sc.,

M.Sc., and Ph.D. degrees in electrical

machinery and appliance from the Harbin

University of Science and Technology

(HUST), Harbin, China, in 2010, 2013,

and 2015, respectively. Since 2017, he has

been working as a Postdoctoral Fellow

with the Institute of Electromagnetic and

Electronic Technology, Harbin Institute of

Technology, Harbin. Since 2018, he has

been an Associate Professor of Electrical Machinery and

Appliance with the College of Electrical and Electronic

Engineering, HUST. His research interests include synthesis

physical fields and dynamic operation mechanism of electrical

machines and its system. Dr. Wang was the recipient of the first

prize of science and technology progress of colleges and

universities of Heilongjiang province in 2019.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2022.21.11

Hai Guo, Haoran Tang,

Xin Liu, Jingying Zhao, Likun Wang

E-ISSN: 2224-266X

107

Volume 21, 2022

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

This work was supported only by the Science Foundation of Ministry

of Education of China (No.18YJCZH040).