Deep Reinforcement Learning Algorithm based PMSM Motor Control

for Energy Management of Hybrid Electric Vehicles

S. MUTHURAJAN

St. Peter’s Institute of Higher Education and Research, Deemed to be University, Chennai, INDIA

RAJAJI LOGANATHAN

ARM College of Engineering and Technology, Chennai, INDIA

R. RANI HEMAMALINI

St. Peter’s Institute of Higher Education and Research, Deemed to be University, Chennai, INDIA

Abstract: Hybrid electric vehicles (HEV) have great potential to reduce emissions and improve fuel economy.

The application of artificial intelligence-based control algorithms for controlling the electric motor speed and

torque yields excellent fuel economy by reducing the losses drastically. In this paper, a novel strategy to

improve the performance of an electric motor-like control system for Permanent Magnet Synchronous Motor

(PMSM) with the help of a sensorless vector control method where a trained reinforcement learning agent is

used and provides accurate signals which will be added to the control signals. Control Signals referred to here

are direct and quadrature voltage signals with reference quadrature current signals. The types of reinforcement

learning used are the Deep Deterministic Policy Gradient (DDPG) and Deep Q Network (DQN) agents.

Integration and implementation of these control systems are presented, and results are published in this paper.

The advantages of the proposed method over the conventional vector control strategy are validated by

numerical simulation results.

Key-Words: PMSM, Deep Learning, Reinforcement Learning, Intelligence Control, Agent.

Received: June 26, 2022. Revised: January 6, 2023. Accepted: February 13, 2023. Published: March 7, 2023.

1 Introduction

Hybrid electric vehicles have come into vogue due

to the limitations of the total IC engine vehicles and

fully electric vehicles. Even today, most of the

vehicles in India are powered by internal

combustion (IC) engines using petrol or diesel fuel

for commercial and long travel purposes, [1]. The

engine power capacity used in these vehicles caters

to the power required for maximum speed. In

hybrid electric vehicles, Permanent Magnet

Synchronous Motor (PMSM) plays a vital role due

to its compact size, lower torque ripple,

comfortable cooling technique, high ratio of torque

to volume, and efficiency in HEV and EV vehicles.

However, many motor control algorithms have

been developed for PMSM motors in the past

including classic PID control, adaptive control,

predictive control, robust control, fuzzy and neuro-

fuzzy, artificial neural networks, and advanced

intelligent control algorithms. The reinforcement

Learning

algorithm which is one of the most important

Machine learning algorithms that contribute to the

development of intelligent control systems is used

for PMSM control have been utilized in HEVs, [2],

[3], [4], [5], [9], [10]. Reinforcement learning is

characterized by the fact that the accurate

mathematical model of the motor does not need to

be given as input. In this paper, the mathematical

model of the PMSM is considered by assuming the

set of simplifications of the parameters in line with

the classical control algorithms. Signals

representing the state of the system are given as

actions to the motor with the involvement of

reward optimization. A reward is also consisting of

characteristics of signals which are added as a

driven process into the control strategy.

In this research, adaptive sensorless stator

field-oriented control (SFOC) technique for PMSM

motor with deep reinforcement learning agent for

creation, training, and testing have been analyzed

and discussed the same with results in this paper.

Deep reinforcement learning (DRL) proposed also

contributes towards improving and contributes

towards the improvement of the performance of the

PMSM control system. DDPG and DQN agents are

used in the proposed deep reinforcement algorithm

and an improved variant of this algorithm called the

Twin-Delayed Deep Deterministic Policy Gradient

(TD3) agent is mainly used for the presented SFOC

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

control system. TD3 algorithm is an effective

algorithm with an optimized process for precise

estimation of parameters. The advantages of the

PMSM control system using deep reinforcement

learning are analysed with the help of real-time

data in the Matlab/Simulink platform, [6], [7], [8],

[9]. The main contributions presented in this paper

are (i) Proposed Deep Reinforcement Learning, (ii)

Methods of optimizing the control signals for the

PMSM based on the SFOC control technique with

the different deep reinforcement learning agents

(iii) the Behaviour of proposed PMSM motor

control strategy in hybrid electric vehicle and (iv)

Analysis of real-time results. The results of real-

time simulations are presented in this paper and the

conclusion and ideas for further approaches have

also been presented.

2 Deep Reinforcement Learning

Algorithm

A deep reinforcement learning algorithm is mainly

used for a system where only a minimum of

information is available. DRL algorithm is applied

to closed-loop control of motors to execute the

tasks without explicitly using complex

programming. The learning process in the DRL

algorithm is based on the set of decisions made

which will be helpful to extend the cumulative

reward. In this algorithm, the deep deterministic

policy gradient is used in which an off-policy

reinforcement learning method is deployed and it is

highly suitable for HEVs. This gradient is a model-

free, online, and flexible one. DDPG is an actor-

critic agent that computes an optimal policy and

maximizes the long-term reward. DDPG agent is

highly suitable for HEV application. Figure 1

shows the generic schematic of the deep

reinforcement learning algorithm where

observation and reward are the input signals to the

policy update, [9], [10], [11], [12].

Fig. 1: Block diagram of Deep Reinforcement

Learning Algorithm

The environment gets the action from policy and

gives the observation which consists of a set of

predefined signals deriving the process, and the

reward is the output of the environment and

represents the success rate. The action is

represented by the control variables of the closed-

loop control system. Observations represent signals

visible to the agent and they are found in the form

of measured signals, their rates of change, and

associated errors. Usually, the reward is created as

part of the continuous actions in the form of a sum,

the square of the error of the signals of the present,

and the square of the past actions. Weight bias is

given to these terms and the same is determined by

the problem statement. In motor control, the reward

is expressed as a function to reduce the steady-state

error. The policy is a component of an agent that

implements the learning algorithm and it represents

the way of actions associated with the observations

it is described by a function with configurable

parameters. In the case of a motor control

application, the policy is the same as the operating

mode of the control system. The optimal policy is

determined by the configured learning algorithm

with the help of continuous functional parameter

configuration. These parameters are associated with

the policy depending on cumulative reward

maximization. The environment consists of

physical devices, reference & actual signals and

steady-state errors, filters, disturbances,

measurement noise, and A/D and D/A converters,

[9],[10].

Important steps of the RL Programming are:

(i) Problem Identification – Learning

agents and policy are defined and the

process integration is initiated

(ii) Creation of Process model as the

environment – Dynamic model of physical

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

systems and the interface between the

subsystems to be defined

(iii) Creation of rewards in DRL – Reward

in the form of mathematical equations has

been defined to measure the output of the

task assigned

(iv) To train the agent – Training of the

agent is to be done to accomplish a policy

as per reward, algorithm and the process

followed.

(v) Policy deployment – Integration of

agent and control system of HEV. In this

step, auto code generation is playing an

important role where executable code with

reference to the target embedded platform

is generated from the Simulink models.

DDPG is usually an agent used in the

continuous system and TD3 is a subvariant

reagent from DDPG considered for

simulation in this research work. It is an

actor-critical agent meant for long-term

reward maximization, [9].

The improved variant of the DDPG agent is a

continuous system and is used in this research

work. This agent calculates the long-term

maximization award. Training considered in this

work has the following phases:

(i) For the observation, the present state is S

and the action is “, where, N is the

stochastic noise level.

(ii) After the execution of A, rewards R and S’

are calculated. S’ is the next state

observation

(iii) The experience formulated as (S, A, R, S’)

and which is stored in the next step

(iv) “(Si, Ai, Ri, Si’)” are randomly generated,

[9]

(v)

 󰇛󰆒󰇛󰆒󰇛󰆒󰇡󰆒󰇻󰇢

󰇜󰆓󰇜

(1)

The main target value function given in equation

(1) is as below:

 Sum of experience reward (Ri) and the

minimum discounted feature reward

3 Optimization of SFOC Control

Strategy for PMSM with Deep

Reinforcement Learning

SFOC is an efficient method of controlling the

PMSM motor and is effectively integrated with a

deep reinforcement learning algorithm.

The following equations represent the dynamics of

PMSM:



  





 (2)



  





 



(3)



 



󰇛 󰇛 󰇜) - 1/J TL-B/J ω

(4)

The above equations represent the dynamics of

PMSM in the ‘d-q’ reference frame. Field-oriented

control of the PMSM motor along with the deep

learning algorithm is shown in Figure 2. DRL with

TD3 agent learns the behaviour of the PMSM

control system as given in Figure 2 is analysed in

this paper. It provides the reference signals as the

three control inputs to the cascade control system

(iqref, vdref, vqref) after the training phase, so that the

improved control system will have better

performance. These three control signals are

reference quadrature current, reference direct

voltage, and reference quadrature voltage

respectively.

TD3 agent is trained with 300 episodes for

PMSM control and the number of steps per episode

is 100. The sampling time taken for every agent is

10-4s. The training phase for the agent gets stopped

when the cumulative average reward is greater than

-150 for 100 consecutive episodes or after 300

episodes initially set training episodes have

elapsed. During the simulation, learning needs to be

improved to get the best training and for this

purpose, Gaussian noise intersects the signals

received and the same is transmitted by the agent.

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

Fig. 2: Block diagram of HEV with DRL Algorithm

4 Deep Reinforcement Learning with

Inner Current Control Loop

The inner loop is working based on the current

control operation with the TD3 agent as shown in

Figure 2. Once the learning is done, the TD3 agent

will provide the command/reference signals for the

voltage control signals vd and vq. Figure 3 shows

the Simulink implementation of the proposed deep

reinforcement learning for both the inner current

control loop (Torque Control) and outer voltage

control loop (Speed Control). In the inner current

control loop, the observation signals are id, iq, iderror,

and iqerror. To start with, the deep neural network is

created with two inputs and one output. The total

training time for this case is 7:12:5.

5 Deep Reinforcement Learning with

outer Speed control loop

Figure 3 shows the Matlab/Simulink

implementation diagram of PMSM control for

outer-loop speed/voltage control using a

TD3 agent. In this case, the

command/reference signal from the TD3

agent is added to the control signal iqref.

The observations consist of the signals and error

signals such as ω, ωerror, id, iq, iderror, and iqerror. In this

phase, the total training time taken is 2:54:45.

Fig. 3: PMSM Motor Control with outer Speed loop

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

In Figure 3, the speed controller and current

controller are cascaded in which the current

controller feeds the speed feedback to the speed

controller to derive the PWM pulses (uniform duty

cycle). Motor torque is estimated from stator

current components like id and iq and these

components are to be compared with the desired

components such as id* and iq * respectively. Figure

4 shows the Rewards r1,r2,r3, and r4 during the

deep learning training process, and episode rewards

are shown in Figure 5. Episode number

information, average results, training options, and

final results are shown in Figure 5.

Fig. 4: Rewards during the training of DRL

Fig. 5: DRL Training Progress

6 Results and Analysis

TD3 approximates reward from the environment

for Idref and Iqref was taken from the PI controller

model and actions such as actual Id and Iq using the

representations such as speed and torque values of

the closed-loop control system. Agent TD3 tunes

the feedback current and voltage values delivered

from FOC which would influence the given

reference current values using the actor

representation. Both DDPG and its agent TD3 are

using the same structure in the proposed Simulink

model. The DDPG agent maximizes the Q value

and the actor-network network is used to estimate

the action such as feedback values of current and

voltage. Since the TD3 uses the value of Q to

update the policy and the resulting policy may be

suboptimal and accumulating training errors may

lead to different behavior. The TD3 algorithm is an

extension of DDPG with improvements that make

it more robust by preventing over-estimation of Q

values, [13].

Figure 6 shows the proposed HEV with

PMSM control based on the DRL algorithm. The

main contribution of the DRL algorithm is to

optimize the control loop of PMSM in HEV to

achieve the energy management cycle. Inputs to

DRL blocks are actual Id and Iq, the reference value

of Id and Iq, and actual speed with reference speed.

Action from the DRL algorithm is the voltage

which has the components such as direct and

quadrature values. So, based on the training process

proposed in this paper, learning takes place, and

action is generated.

Figure 7 shows the speed tracking of the

PMSM motor in HEV where the actual speed of the

motor is following the reference and met the target

value. The target value of PMSM speed is set to

600 rpm and the actual speed attains the set value

by the combination of the TD3 agent and SFOC

algorithm. Figure 8 shows the current waveforms

such as id & idref and iq and iqref. Both the id and iq

values are tracking the reference respective values

and placing 90o apart.

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

Fig. 6: Simulink Diagram of Proposed DRL-based PMSM for HEV

Fig. 7: Speed of PMSM

Figure 9 shows the voltage profile Vd and Vq with

DRL learning and these components are also

following their respective values closely and 90o

phase apart. Figure 10 shows the logic analyzer

which shows the digital and analog signals for the

input and output characteristics of the proposed

HEV control system. From this figure, it is

observed that analog and digital signals with

reference to the sampling time, time offset, time

span values are well within the limit as per the

given speed control system parameters.

Fig. 8: Current Control

Fig. 9: Voltage profile

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

Fig. 10: Logic Analyzer

Figure 11 shows the output results with reference to

the performance of hybrid electric vehicles. The

following waveforms are captured from the HEV

scope:

(i) Drive cycle (mph)

(ii) Engine and Motor Speed

Comparison (RPM)

(iii) Engine and Motor Torque

Comparison (Nm)

(iv) Battery Current (A)

(v) Battery State of Charge (%)

(vi) Fuel Consumption (g/kWh)

Vehicle velocity varies from 0 to 60 mph as the

drive cycle and engine speed is in line with the

drive cycle profile. Motor speed is also tried to

follow the driving schedule as shown in the figure.

Engine speed and motor speed vary from 0 to 3500

rpm whereas torque delivered from the motor and

engine varies from 0 to 200 Nm. In most cases, the

engine and motor are dividing the power delivered

by contributing the torque. From the battery current

waveform, it is understood that currently varies

from -50A to +50A as per the drive schedule and

torque profile of HEV, and hence based on the

DRL control mechanism of the motor battery

current is also tuned to minimize the losses and

hence it is proven that the solution presented in this

paper is energy efficient one by using a deep

learning algorithm. In Figure 11, the battery state of

charge is also shown and it is evident that charging

is controlled within the band of 70%. Fuel

consumption is also controlled and 40% of fuel

saving is proven from the result.

Fig. 11: HEV Vehicle Parameters

Hence benefits of the proposed algorithm with a

speed control of the PMSM motor are given below:

(i) Motor and engine speed profiles are

inline with the HEV speed profile

(ii) Steady-state error of the control system

is less than 1% and the efficiency of

the control accuracy is more than

99.5%

(iii) Percentage of Fuel saving is improved

by 40%

(iv) Accuracy of estimation of the state of

charge (SoC) and state of health (SoH)

are improved by 99.9%

(v) High torque-to-speed ratio

(vi) Very high torque/volume ratio

7 Conclusions

In this paper, the SFOC-type control structure for

the PMSM for HEV energy management is

presented which shows improved performance

using the deep reinforcement learning

algorithm. Comparison results are thus presented

for a case where a deep reinforcement learning

agent is properly trained to provide the reference/

command signals that are added to actual control

signals vd, vq, and iqref. The main objective of this

research work is to improve the performance of the

HEV by incorporating the novel control technique

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

of the PMSM motor in order to save the energy that

is monitored by an energy management system.

Numerical simulations were used to

demonstrate the superiority of control systems

using deep reinforcement learning and subsequent

work exploring optimization possibilities

associated with implementing deep reinforcement

learning on PMSM controllers for HEVs. Proposed

algorithm is proven successful in Matlab/Simulink

platform but has not yet been implemented in real-

time passenger vehicles and that needs to be done

in a real-time vehicle to show the performance of

upcoming versions. Moreover, the suggested

algorithm may also be suggested for core electric

vehicles (EV) and to suggest the rugged energy

management system.

References:

[1] Hannan, Mahammad A., F. A. Azidin, and

Azah Mohamed. "Hybrid electric vehicles

and their challenges: A review." Renewable

and Sustainable Energy Reviews 29 (2014):

135-150.

[2] Liu, Xiangdong, Hao Chen, Jing Zhao, and

Anouar Belahcen. "Research on the

performances and parameters of interior

PMSM used for electric vehicles." IEEE

Transactions on Industrial Electronics 63,

no. 6 (2016): 3533-3545.

[3] Li, Yaohua, Dieter Gerling, Jian Ma, Jingyu

Liu, and Qiang Yu. "The Comparison of

Control Strategies for the interior PMSM

drive used in the Electric Vehicle." World

Electric Vehicle Journal 4, no. 3 (2010): 648-

654.

[4] Sheela, A., M. Suresh, V. Gowri Shankar,

Hitesh Panchal, V. Priya, M. Atshaya, Kishor

Kumar Sadasivuni, and Swapnil Dharaskar.

"FEA based analysis and design of PMSM

for electric vehicle applications using magnet

software." International Journal of Ambient

Energy 43, no. 1 (2022): 2742-2747.

[5] Luo, Yefeng, Jianqi Qiu, and Cenwei Shi.

"Fault detection of permanent magnet

synchronous motor based on deep learning

method." In 2018 21st International

Conference on Electrical Machines and

Systems (ICEMS), pp. 699-703. IEEE, 2018.

[6] Dhulipati, Himavarsha, Eshaan Ghosh,

Shruthi Mukundan, Philip Korta, Jimi Tjong,

and Narayan C. Kar. "Advanced design

optimization technique for torque profile

improvement in six-phase PMSM using

supervised machine learning for direct-drive

EV." IEEE Transactions on Energy

Conversion 34, no. 4 (2019): 2041-2051.

[7] Bilal, Ahmad, Asad Waheed, and Muhammad

Hassan Shah. "A Comparative Study of

Machine Learning Algorithms for

Controlling Torque of Permanent Magnet

Synchronous Motors through a Closed Loop

System." In 2019 Second International

Conference on Latest trends in Electrical

Engineering and Computing Technologies

(INTELLECT), pp. 1-6. IEEE, 2019.

[8] Song, Zhe, Jun Yang, Xuesong Mei, Tao Tao,

and Muxun Xu. "Deep reinforcement

learning for permanent magnet synchronous

motor speed control systems." Neural

Computing and Applications 33, no. 10

(2021): 5409-5418.

[9] Nicola, Marcel, Claudiu-Ionel Nicola, and

Dan Selișteanu. "Improvement of PMSM

sensorless control based on synergetic and

sliding mode controllers using a

reinforcement learning deep deterministic

policy gradient agent." Energies 15, no. 6

(2022): 2208.

[10] Nicola, Marcel & Nicola, Claudiu. (2021).

Improvement of PMSM Control Using

Reinforcement Learning Deep Deterministic

Policy Gradient Agent. 1-6.

10.1109/Ee53374.2021.9628371.

[11] Savant, Rushank, Abhiram Ajith Kumar, and

Aditya Ghatak. "Prediction and analysis of

permanent magnet synchronous motor

parameters using machine learning

algorithms." In 2020 Third International

Conference on Advances in Electronics,

Computers and Communications (ICAECC),

pp. 1-5. IEEE, 2020.

[12] Traue, Arne, Gerrit Book, Wilhelm

Kirchgässner, and Oliver Wallscheid.

"Toward a reinforcement learning

environment toolbox for intelligent electric

motor control." IEEE Transactions on Neural

Networks and Learning Systems (2020).

[13] MathWorks. (2023). MATLAB -

MathWorks. www.mathworks.com

[Accessed on 03/02/2023].

WSEAS TRANSACTIONS on POWER SYSTEMS

DOI: 10.37394/232016.2023.18.3

S. Muthurajan, Rajaji Loganathan, R. Rani Hemamalini

E-ISSN: 2224-350X

Volume 18, 2023

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US