Abstract
The efficiency and control accuracy of Interior Permanent Magnet Synchronous Motor (IPMSM) are the main factors affecting performance. Manual calibration has the disadvantage of high work intensity, long calibration period and high technical requirement, which leads to low calibration accuracy and motor efficiency. Thus, a novel calibration method based on Deep Deterministic Policy Gradient (DDPG) and Long Short-Term Memory (LSTM) is proposed. By constructing a deep reinforcement learning network, the self-optimization of the optimal working point under any working condition is realized, and the MAP for IPMSM in full speed-torque range is obtained. The method can be used to quickly realize the optimal matching of d-q axis current with arbitrary stator current. It focuses on solving the problem of motor overheating caused by long adjustment time of manually calibrated MAP when the motor is overloaded, to realize fast calibration in overload area. Moreover, the method reduces the dependence on the motor parameters and increases the adaptability of the calibration MAP data to the operating conditions. The simulation and bench test indicate that the method can meet the response requirements of motor torque, and results reveal that the motor efficiency is greatly improved.
Keywords
Introduction
With the development of new energy vehicles, interior permanent magnet synchronous motors (IPMSM) have been widely used due to its advantages of high efficiency, high power density and good torque characteristics. The motor controller for the electric vehicles requires high-precision torque control [1]. The inner loop of the control system is a current loop, which receives target torque from the accelerator pedal, and then the target torque is translated into current command through the synchronous rotating coordinate system. The core of the drive motor control is to output greater torque and reach higher speed within the constraints of DC voltage and voltage utilization [2, 3]. Therefore, it is necessary to match the d-q axis current reasonably [4]. The commonly used analytical principle is to minimize the stator current at a certain torque, and the current command corresponding to each torque command at different speeds is obtained, realizing the maximum torque per ampere (MTPA) control [5, 6].
Methods for realizing MTPA mainly included: model-based online calculation method [7, 8], curve fitting method based on experimental data [9] or the finite element analysis (FEA) simulation results [10] and manual calibration method based on bench experiment. The traditional MTPA calibration method is to construct a mathematical model of the motor based on motor parameters, such as the maximum current, the maximum output voltage, the maximum torque of the motor, and so on. Then the current values corresponding to the respective torques at different speeds are generated according to the mathematical model. This calibration method has high requirements on the accuracy of the mathematical model established, so it is necessary to comprehensively consider the influence of parameter changes on model accuracy to obtain the optimal MTPA curve [10–12]. However, the uncertainty of motor parameters and inverter nonlinearity [13, 14] increases the difficulty of the model establishment. In order to obtain a high-precision mathematical model, some parameters need to be calibrated and then repeatedly calibrated through tests. The accuracy does not be improved after a long time trial, which results in limitations in practical engineering applications. Moreover, IPMSM has strong coupling and nonlinearity, which makes it difficult to achieve accurate modeling. Therefore, the model-based calibration algorithm is very complex, and it is difficult to meet the real-time needs of automotive electric motors [15]. Curve fitting based on test data requires a certain amount of data to establish a functional relationship between input variables and responses results, which has a higher requirement on the source and accuracy of experimental data [16]. Otherwise, the fitting results cannot be applied to all working conditions.
The manual calibration based on bench experiment is a method of testing the motor and its adaptive controller before the product is put into service to obtain the relationship of torque-speed-current. Through matching calibration, some control parameters are solidified to form a MAP. Motor calibration is of great significance to improve torque control accuracy and efficiency of motor. The traditional manual calibration has high work intensity and long calibration cycle, which seriously affects the development efficiency and project progress [17, 18]. At the same time, the manual calibration of the motor is likely to cause a large calibration offset error. Even the calibration results of different operators are not the same, so the accuracy of the results cannot be guaranteed; the traditional calibration table method adopts the static method, that is, the principle of “one-time calibration and permanent use”. However, the parameters of the IPMSM vary greatly depending on the working conditions and the environment. The uncertainty of parameters is an important factor affecting the external characteristic output of the motor. When the motor parameters change with the operating conditions [19], the difference between the curve obtained by the static calibration method and the actual curve will become larger and larger, which will affect the overall efficiency of the motor. More importantly, due to the uncertainty of parameters, the use of manually calibrated MAP in the overload area of the motor will cause the phenomenon of motor overheating if the adjustment time is too long, which will affect the normal performance of the motor and even damage the motor. Since only one or several prototypes are calibrated in the calibration process, the traditional calibration method requires high consistency of mass production motor. When the consistency is lack, there will be a big gap between the calibration curve and the actual curve, which will affect the improvement of motor efficiency in the actual application process.
It is well known that intelligent control has the advantages of being without need of an exact system dynamic model, simplicity, less intensive mathematical design, and is suitable for dealing with the nonlinearities and uncertainties [20–22], such as Fuzzy logic control (FLC), artificial neural network control(ANNC), and neuro fuzzy control(NFC), etc. [23, 24] Theoretically, they can handle any nonlinear model for the power control system [25, 26], and more robust than the conventional controllers [27]. Some intelligent controller with the thoughts of human logic for IPMSM process control and obtain good performance. In recent years, researchers have tried to apply the intelligent controller in the IPMSM drives [28, 29]. Compared to many types of intelligent controllers, FLC is the one of the easiest to implement for high performance IPMSM drive.
Contrary to the conventional FLC of the IPMSM drive with zero d-axis current, a simplified fuzzy speed controller with MTPA incorporated for the IPMSM drive will obtain better performance. It through simplified the d-axis current around some operating point to get the electromagnetic torque. An online loss-minimization MTPA algorithm is further integrated with a FLC-based IPMSM drive to yield high efficiency and high dynamic performance over a wide speed range [30–32].
However, each of these controllers has its own performance or implementation shortcomings. On the one hand, FLC needs much more manual adjustment by trial and error to obtain high performance and for the most three-phase IPMSM, the FLC controller will be invalid because L d is not equal to L q and this will adding the coupling nonlinearity of the IPMSM, thus make the control design difficult [33]. On the other hand, it is extremely tough to create a series of training data for ANNC that can handle all the operating modes, and the same to other intelligent control algorithm. To the best of the authors’ knowledge, no work has been reported on the stand-alone online adaptive intelligent controller of IPMSM drives.
With the development of artificial intelligence technology, reinforcement learning (RL) has been widely used in the field of industry control [34–36]. RL is a trial-and-error learning method based on Markov decision process. By constructing the optimized bellman equation, the RL task with Markov decision process is completed. A RL system with Markov property includes at least state set, action set, strategy set and reward and punishment function [37]. RL does not need to rely too much on the mathematical model and ontology parameters of the controlled object. Moreover, this control method can ensure that the current optimal output characteristics can be truly and accurately reflected in each state [38, 39]. The control method based on RL has the advantage of not relying on accurate motor model to estimate the relationship between target torque and d-q axis current in advance and obtaining the best torque-speed-current MAP only through continuous trial and error learning. In this text, the method based RL is adopted to realize the automatic calibration for IPMSM, which overcomes the problems existing in manual calibration, such as long construction period, low efficiency, poor accuracy and poor robustness. This method transforms the static calibration mode of manual calibration into a dynamic search mode with self-learning and self-optimization, optimize the d-q axis current with maximum torque of arbitrary stator current in real time, and the search results are formed into a MAP.
The main contents of this text are organized as follows: In Section 2, the principle of IPMSM calibration based on constant magnetic field control is analyzed theoretically, and the importance of calibration work for improving motor control is clarified; In Section 3, an automatic calibration method for automotive electric motors based on Deep Deterministic Policy Gradient and Long Short-Term Memory (DDPG-LSTM) is proposed to automatically obtain the MAP in full speed-torque range of IPMSM. In Section 4 and 5, the effectiveness and feasibility of the method are verified by simulation analysis and experimental verification.
Analysis of calibration control for IPMSM
The mathematical model of IPMSM in the d-q reference frame can be expressed as follows [40]:
By using Equations (1)–(5), the time phase voltage equation can be expressed as:
Then the phase diagram according to Equation (6) is shown in Fig. 1. Generally, we control motor torque by adjusting the current with a closed loop. Fig. 1 depicts the relationship between the stator voltage U s and the current I s with the phase difference φ and the relationship between the stator current I s and the permanent magnet flux-linage Ψ f with the phase difference β. From Fig. 1, we can obtained using different β as well as different d-q current pairs (i.e. Id1, Iq1, Id2, Iq2, ...) under the conditions of constant stator current I s (i.e. Is1 = Is2 = ... .). Therefore, we can directly adjust the power angle β 1 or β 2 to control the motor torque.

The phasor diagram of composite voltage.
Fig. 1 show the phase difference β of the stator current phasor and the permanent magnet flux phasor, the phase difference φ between the stator voltage and the stator current phasor, and the phase difference γ =β–90° between the stator current phasor and the electromotive force phasor. Meanwhile, the correspondence between stator current phasor and d-q axis current with respect to β or γ can also be obtained [41], as shown in Equation (7).
Under the condition that the stator current is constant, with the change of phase difference β, countless pairs of d-q axis current combinations and torque values will be obtained. In all combinations, the maximum torque point and the minimum torque point will appear, respectively, as shown in Fig. 2. The power angle β corresponding to the maximum torque and minimum torque points are unique in the Fig. 2(b). Combined with Equation (7), the unique d-q axis current pair can be obtained [42–44]. It can be seen from Equation (7) that the optimal β is closely related to the parameters of the motor [45].

The relationship between the power angle and the combined torque.
However, the strong coupling characteristic of the IPMSM makes the d-q axis current coupling with each other, when the d-q axis voltage is changed, the d-q axis current will be affected simultaneously, and the degree of change is obviously different with the change of velocity. Moreover, with the change of working conditions, leading to a large error in the optimal working point of the theoretical calculation, resulting in poor control effect [46, 47].
On the one hand, experiments are performed to measure the d-axis and q-axis inductance and the result is shown in Fig. 3. It shows that L d and L q variations with I d and I q , L q and L d present different variation patterns. Furthermore, we can also obtain that L q decreases as I q increases from Fig. 3. However, L d exhibits a more sophisticated variation pattern. On the other hand, these nonlinear parameters may also be obtained via finite element studies and then experimentally validated [5]. As seen in Fig. 3, the tested IPMSM parameters become extreme nonlinearities in the discontinuous operation subregion. It is also shown that the L q is much more varied than L d if the system does not consider current phase. It is important that the L d is far from a correct value. Therefore, it is necessary to consider that the variations of L d and L q make a great impact on IPMSM control performance [48, 49].

Motor parameters (L d and L q ) change with the working conditions.
To sum up, the optimal operating point of motor based on traditional vector control calculation cannot overcome the influence of parameters uncertainty and the strong coupling characteristics, so the optimal d-q axis current calculated cannot truly reflect all the actual requirements. Therefore, in order to improve the control accuracy of motor torque, it is necessary to conduct a series of tests between the motor and the matching motor controller before the product is put into service. Accurate calibration of the corresponding relationship between d-q axis current and torque is of great significance to improve the working efficiency of the motor [50].
In order to obtain the optimal d-q axis current matching for arbitrary stator current and improve the working efficiency of motor, this text proposes a bench calibration method based on deep reinforcement learning method to quickly obtain MAP in full speed-torque range of automotive electric motors. Black control diagram of novel d-q axis current optimization approach based on reinforcement learning theory is shown in Fig. 4. The diagram mainly contains three parts: Agent: generate reference d-q axis voltage; Environment: generate series of target signal, such as torque, d-q axis current and so on; Reward: according to the feedback signal from Environment generate immediate reward;

Black diagram of d-q axis current optimization control based on DDPG-LSTM.
A: State set
A complete state space should contains all possible states of the controlled object. Generally, the key of motor control is to contorl torque, and the torque is determined by I d and I q . In any conditions, there are four states in the relationship between target value and reference value, as shown in Table 1, such as S1, S2, S3 and S4.
Four states of d-q axis current and corresponding control method (CM)
There are corresponding adjustment measures for the four different working states, as shown in Table 1. In order to obtain high performance control effect, the best control method should be applied to the motor under different working states. By establishing a regular control scheme based on vector control and expert experience, the efficient control of the motor can be realized by establishing corresponding control laws for four working states, such as CM_S1, CM_S2, CM_S3 and CM_S4 in the Table 1. The detailed control flow of CM_S1 is shown in Fig. 5, which briefly describes the idea of establishing the control law. In Fig. 5, ω indicates the real-time electrical angular velocity, and ω L , ω M , and ω H respectively indicate the low speed interval, intermediate speed, and high speed interval set according to the actual control requirements. u dmax and u qmax are the maximum of the d-q axis voltages. ɛ represents the minimum amount of d-q axis voltage adjustment. The setting of this value is related to the standard value range of the overall variable when the motor control algorithm is designed.

The control flowchart of CM_S1 control method.
B: Control strategy π
As shown in Fig. 3, agent generate the action of u
d
and u
q
, the limition of u
d
and u
q
is expressed as Equation (9):
C: Reward and punishments function
Considering the signal that the motor control needs to observe from the environment is not a single variable signal, it is necessary to consider the information such as I
q
, I
d
, and T
e
to formulate the next action simultaneously. The d-q axis immediately reward signal is defined as:

Calculation processes of case1d, case2d, and case3d.

Calculation processes of case1q, case2q, and case3q.
where errorLimit is the limit value of d-q axis current error, it can be set according to actual control requirements.
D: Improved DDPG-LSTM algorithm and parameter updates
Traditional DDPG structure contains the actor and the critic (AC) network, the actor network is used to generate the control strategy π (s), the critic network is used to evaluate and correct future performance of the control strategy. In order to further improve the effectiveness of the data when the actor and the environment interact and optimize the control strategy output, the LSTM layer is respectively added into the output of the actor network and the critic network to reduce the deviation produced by value function estimation, and thus solve the problem of data correlation and non-static distribution. So, a novel optimal control approach based on DDPG-LSTM is proposed as shown in Fig. 8.

Black diagram of IPMSM control based on DDPG-LSTM.
The relationship of the control variables u
d
and u
q
at two times before and after could shown as the following equation.
(1) Basic DDPG network
The basic framework of DDPG algorithm is AC network framework. In the process of building policy function π and value function Q, this text adopts a single neural network model to build actor network and critic network, respectively.
1) Objective function construction of DDPG algorithm
In DDPG algorithm, the input is the current state, the output is the deterministic action value.
Combining Equations (1)-(4), we can obtain the Equation (11):
As we all known, in discrete-time systems we can define
The control objective is to find the sequence of optimal control signal, denoted with yk, k = 1, 2, . . . , ∞. Suppose that the objective function associated with this system is:
The optimal control
It is possible to write Equation (12) in iteration form, as shown in Equation (16).
In summary, in this iterative algorithm, the value function sequence Ji +1 and control law sequence y ik are updated by implementing the recurrent iteration equation in (16) with the iteration number i increasing from 0 to ∞.
2) Convergence analysis
According to (16), we can derive
The initial value function
It can be seen that during the iteration process, the control actions for different control steps obey different control laws. After the iteration number i + 1, the obtained control law sequence is (y i , y i –1,..., y0). With the iteration number i increasing to ∞, the obtained control law sequence has a length of ∞. For the infinite-horizon problem, both the optimal value function and the optimal control law are unique. Therefore, it is desired that the control law sequence will converge when the iteration number i⟶ ∞.
3) Actor network frame and parameter update
The input state variable of the action network is defined as s, and the role of the actor network is to generate the controller’s control strategy π. During the process of motor control, the control strategy is the d-q axis voltage. Therefore, the actual output of the actor network can be described as Equation (19):
Let
The feedback error signal used for tuning action Neural Network (NN) is defined as:
A quadratic function based on the error of the Bellman equation is defined as the minimum objective function, as in Equation (22):
The weight update law for the action NN is designed based on the gradient descent algorithm, which is given by
4) Critic network frame and parameter update
In order to stabilize the closed-loop system and minimize the cost function, the critic network is used to implement the long-term minimization cost function J(x), thereby constructing the error function and cost function based on the Bellman equation:
Firstly, a neural network is utilized to approximate J(x) as follows:
The derivation of the cost function J(x) with respect to x is:
We can let
A quadratic function based on the error of the Bellman equation is selected as the minimum objective function. Given any admissible control law, it is desired to select
The weight update law for the critic NN is presented based on a gradient descent algorithm, which is given by
(2) DDPG-LSTM network
It should be mentioned that most of the above algorithm require the models of the controlled plants to be known or at least partially known. However, in practical applications, most models cannot be obtained accuracy in time. Therefore, it is necessary to reconstruct the non-linear systems with function approximators. In this text, we will present the specific method for modeling the IPMSM control systems with LSTM, which are widely used in the dynamical analysis of non-linear systems. Based on the LSTM model, the DDPG algorithm can be properly introduced to deal with the optimal control problems of unknown non-linear systems in time.
The framework of network design is shown in Fig. 6. The input parameters of the LSTM layer are set by X
a
= [xa1,xa2,xa3] and X
c
= [xc1,xc2,xc3], where xa1,xa2,xa3 and xc1,xc2,xc3 are the output by three convolutional neural networks (CNN), respectively. Inputs of each convolutional neural network are defined as Equations (29) and (30), respectively. After the LSTM layer, the control strategy u
ddpglstm
and the control strategy expectation value J
ddpglstm
with the enhanced correlation between the current time and the control target can be regarded as output. The schematic diagram is shown in Fig. 9(a) and (b).

Detailed black diagram of the proposed LSTM network.
In Equations (29) and (30), CNN represents a convolutional neural network; u ddpg represents the d-q axis control strategies u d and u q generated by the DDPG agent, and J ddpg represents the expected value of the control strategy; Δe(k-2), Δe(k-1), and Δe(k) represent the error of the d-q axis current at k-2, k-1, and k, respectively; u(k-2), u(k-1), and u(k) represent the k-2, k-1, and k time control strategy, respectively.
The main consideration of network design is that the magnitude of the control variable is determined by the error of the target variable. Therefore, the data input pair between the control voltage and the target current is established, and the empirical prediction based on LSTM is performed to increase the reliability of the output control strategy.
The new bench calibration method proposed in this text can automatically realize the optimal d-q axis current matching for arbitrary stator current, obtain the optimal d-q axis voltage and maximum torque output in the full speed-torque range only by traversing stator current and speed, respectively, and ensure uniform calibration accuracy. The basic calibration process is shown in Fig. 10.

Bench automatic calibration process based on DDPG-LSTM.
In order to theoretically verify the effectiveness of the proposed novel optimal control method, the simulation model is built in the Simulink environment and the structural framework is shown in Fig. 11, and the motor parameters are shown in Table 2. Wherein, the reference signal unit (➀)generates the required I d and I q from a given stator current I s , and the DDPG-LSTM agent (➁) outputs control voltages u d and u q .

Black diagram of simulation model based on DDPG-LSTM.
Parameters of IPMSM
A. Nominal simulation of DDPG-LSTM
Figure 12 is the simulation results of d-q axis current and torque based on DDPG-LSTM algotihm, simulation results show that there are two dynamic adjusting process during the whole simualtion process about 0.4 s through compare d-q axis current and torque, that are training process and stage process. It can be seen from Fig. 9 that with the d-q axis current dynamic optimal matching value and the torque given the relative response during the training process. When t = 0.15 s, I d and I q reach a stable stage and the same stage to torque. At the same time, the value of torque is coming to the peak value among the whole simualtion time. Simulation results verify that the characteristics of the trial-and-error optimization and make the maximum torque output while the stator current is constant.

Simulation results of d-q axis current dynamic regulating process.
Figures 13 and 14 show that the immediate reward values generated by the d axis agent and the q axis agent gradually increase and get closer to the preset value of 500 during the model control learning process. The increase of the immediate reward value means that the action generated by the agent plays a positive role in generating the maximun torque output, which is in line with the original intention of the reward function design, and realizes the optimization of d-q axis current matching to generate maximum torque output.

The immediate reward of d axis agent.

The immediate reward of q axis agent.
Simulation results verify the feasibility of the optimal matching method of d-q axis current. It further verifies that this method has the ability to derive the best d-q axis current under any given working condition through the traversal of multiple working conditions, which lays a good foundation for the automatic calibration of the bench.
B. Comparative simualtion
In this section, compare DDPG-LSTM with Feed-forward compensation decoupling control (FFCDC), DTC. The motor parameters are selected to be identical for all three control algorithms, so that the difference in the performance is only due to the difference in their torque generate capability and current optimal capability. Fig. 15 shows the response of d-q axis current and torque with three different algorithms. It can be seen from the figure that the DDPG-based system response lags the FOC and DTC algorithms. In the initial response phase of the system, DDPG needs more time to adapt to the environment through self-learning. When the system is stable, DDPG has the same adjustment capacity as FOC and DTC, as shown in Fig. 15(a). When t = 2.5 s, the system load changes, and all three have almost faster adjustment response capabilities. This is mainly reflected the fact that DDPG can generate greater torque output by optimizing the d-q axis currents. As shown in Fig. 15(b), the torque output capability based on DDPG is significantly higher than FOC and DTC.

The response of d-q axis current and torque under three different controller.
In order to further verify the advantages of the proposed method in achieving the optimal matching of d-q axis current. the proposed method is realized in the Simulink model compared with the traditional decoupling method. Under the same stator current command, traverse the motor multiple sets of working conditions, observe the maximum torque output and the corresponding d-q axis current. The statistics of multiple sets of simulation data are shown in Table 3. The relative error is caculated by (T eDDPG -T eFFCDC )/ T eFFCDC , changes in parameters such as inductance and flux linkage are not considered in the simulation process. These parameters will change to varying degrees during the actual motor operation.
Simulation results of DDPG-LSTM-NDC and FFCDC and DTC
It can be seen from Table 3 that the maximum torque output capability based on the DDPG-LSTM-NDC method is significantly higher than FFCDC and DTC algorithm. The relative error among three algorithms is change between 23.8% and 3.9%, and the average is about 20% higher than that in the FFCDC and DTC control; the absolute value of the optimal I d based on DDPG-LSTM-NDC is smaller than the optimal I d based on FFCDC, and the opposite in terms of optimal I q based on DDPG-LSTM-NDC is higher than the optimal I q based on FFCDC and DTC, which further proves that based on DDPG-LSTM-NDC, more stator current components can be guaranteed to generate torque, reduce the component of excitation current, and increase the component of torque current. The component of the moment current reflects its ability to matching the optimal d-q axis current.
In order to further verify the effectiveness and feasibility of the proposed calibration method, the bench test verification is divided into three parts: Firstly, comparative experiment under nominal condition is complete. Secondly, the feasibility of the proposed method in optimizing d-q axis current matching is verified; Thirdly, the calibration process of the whole range is verified. The whole bench experimental system consists of IPMSM, motor controller, the eddy current dynamometer and its supporting equipment, the structure of the motor test bench is shown in Fig. 16. CANape is the main tool to complete the whole calibration process, which is a measurement and calibration tool supporting the optimization, test and validation of software-based control systems.

The structure of the motor test bench.
A. Experiment with nominal and disturbance condition based on DDPG-LSTM
First, a dynamic response experiment based on DDPG-LSTM is carried out. Set the motor to work in the rated state, add load disturbances, monitor the current and torque response of the motor through CANape, and record the experimental results.
The three-phase current and d-q axis current are shown in Fig. 17 and Fig. 18, respectively. From the figures can find that the three-phase current and d-q axis current stabilizes after a period of adaptation. When the external load changes in 0.25 s(A1) and 0.55 s(A2), the system quickly responses to stable, the system comes to convergence within 0.3 s. Meanwhile, the non-decoupling method based on the DDPG-LSTM algorithm can also quickly be adjusted according to the error and robustness to the motor disturbance.

Diagram for the d-q axis current waveform.

Diagram for the three-phase current waveform.
The torque response results are summarized in Fig. 19, which indicates that the torque of the system has smooth response characteristics. When the learning process increase, the torque error gradually decreases until it reaches a stable value.

Variation of the torque error and the change point error with the iterative process.
B. Compare experiment with FFCDC, DTC and DDPG-LSTM
Firstly, in order to better verify the experimental results, all three control methods use the same external environment and only change the control method to ensure the authenticity of the experimental data. Among them, the initial parameters required in the neural network based on the DDPG algorithm are obtained by simulation, and then self-optimized through experiments. Comparative experiments mainly verify the ability of the three algorithms to generate torque at the same stator current.
It is observed from Fig. 20 that DDPG-LSTM is higher than FFCDC and DTC in generating maximum torque after the speed in applied. In other ways, we can find DTC algorithm has high torque ripples during steady state. Among all the generated torque, the FFCDC have the almost equal ability to DTC, and lower than DDPG-LSTM. Anyway, the DDPG-LSTM have the best capability of generate maximum output.

Comparison of maximum torque generation based on DDPG-LSTM, FFCDC, DTC.
Secondly, many groups stator currents are applied to three motor controller to observe the maximum torque output. The results of maximum torque and d-q axis current generated by multiple groups stator currents are summarized. Meanwhile, a control experiment using traditional feedforward decoupling control is set up, the maximum torque and d-q axis current are obtained by adjusting the PI regulator. The results of the two sets of test data are shown in Table 4, the relative error is caculated by (T eDDPG -T eFFCDC )/ T eFFCDC . Analysis of the data in Table 4 can get the fact that the torque output produced when the bench test is relatively small to the simulation results, the maximun torque output capacity based on DDPG-LSTM-NDC is higher than the conventional FFCDC control about 10%. Similarly, the DDPG-LSTM-NDC can ensure that more stator current components are used to generate torque for optimizing the d-q axis current matching.
The experiment results of DDPG-LSTM-NDC and FFCDC
The results of maximum torque and d-q axis current generated by multiple groups stator currents are summarized. Meanwhile, two control experiment using FFCDC and DTC control are set up. The analysis of the data is shown in Fig. 21. We get the fact that the maximun torque output capacity based on DDPG-LSTM is higher than the FFCDC and DTC control about 10%, and the DDPG-LSTM can ensure that more stator current components are used to generate torque for optimizing the d-q axis current matching.

Comparative analysis of experimental results based on DDPG-LSTM and FFCDC and DTC control.
In summary, simulation analysis and bench experiment verify that the proposed method achieves the purpose of optimizing the d-q axis current matching by optimizing the control voltage using the empirical trial-and-error method, and the results show that the DDPG-LSTM has advantages in optimizing the d-q axis current matching and the maximum torque-to-current ratio output, which can make the most advantages of the motor and generate larger torque output.
C. Experimental analysis of the efficiency test results of manual calibration MAP and automatic calibration MAP on bench
In order to verify the improvement effect of MAP in the range of full speed and full torque on the motor efficiency of the platform automatic calibration, the motor efficiency measurement of the platform automatic calibration MAP vector control and manual calibration MAP vector control were carried out, respectively, and the experimental data were compared and analyzed.
It can be seen from Fig. 22 that the vector control based on DDPG-LSTM in the range of full speed and full torque is generally more efficient than that based on manual calibration. In the full speed range, the global MAP shows unique advantages. Especially in the medium and high speed range, the traditional vector control efficiency is 82% ∼89%, while the global MAP vector control efficiency is 93% ∼97%, with a maximum difference of 8%. This is mainly because the global MAP vector control in the medium and high speed range of the current distribution is more reasonable.

The efficiency test results of manual calibration and automatic calibration on bench.
In this paper, the principle of IPMSM was analyzed, and got the conclusion that the high-precision calibration results are of great significance to improve the motor efficiency in the process of motor control. In order to achieve the optimal current matching of d-q axis and improve the work efficiency of IPMSM, this paper proposes an automatic calibration method of automotive electric motors platform based on DDPG-LSTM, which can quickly obtain the optimal control MAP in the full speed-torque range of IPMSM. Firstly, a DDPG network based on AC framework was built, and a reward function for optimizing the output control strategy performance of actor network was constructed. Secondly, in order to optimize the optimal control strategy output, the LSTM layer was added to the Actor network and Critic network outputs to enable the control strategy and expected evaluation generated by DDPG to have the ability of associative memory before and after the output. Simulation results show that DDPG-LSTM based control method has significant advantages in optimizing d-q axis current matching and maximum torque current ratio control. In order to further verify the effectiveness and feasibility of the method, the experimental verification was carried out on the bench with dynamometer and other equipment. Through the analysis of multiple experimental results, it is concluded that DDPG-LSTM can generate larger torque output, and the torque output increases by about 10% compared with that of the same stator current in PI control scheme, which greatly improve the efficiency of the motor. Finally, the proposed calibration method was verified by bench test, and the results show that the proposed method can realize the MAP calibration in full speed-torque range, and the precision of the calibration data was greatly improved compared with manual calibration. The results show that the motor efficiency was improved about 8% during medium and high speed range.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant Nos. 51775082, 61976039) and the China Fundamental Research Funds for the Central Universities (Grant Nos. DUT19LAB36, DUT20GJ207), and Science and Technology Innovation Fund of Dalian (2018J12GX061).
