Lyapunov fuzzy Markov game controller for two link robotic manipulator

Abstract

Markov game based controllers are robust but lack guarantee on the stability of the designed controller. In this work, we attempt to address this shortcoming by proposing a lyapunov fuzzy Markov game controller for safe and stable tracking control of two link robotic manipulators. Lyapunov theory has been used to generate fuzzy linguistic rules for implementing a reinforcement learning (RL) based Markov game controller. We employ fuzzy inference system as a generic function approximator to deal with the “curse of dimensionality” issue. Proposed RL based Markov game controller is self-learning, adaptive and optimal. We implement the proposed control paradigm on: a) Two link robot manipulator and b) SCARA manipulator for the cases: i) controller handles disturbances and parameter variations, and ii) disturbances and no parameter variations. We give comparative evaluation of our approach against: a) fuzzy Q learning controller, and b) fuzzy Markov game controller. Simulation results illustrate stable and superior tracking performance and advantage in terms of lower control torque requirements.

Keywords

Reinforcement learning fuzzy Q learning fuzzy Markov game control lyapunov fuzzy control two link robotic manipulator SCARA

1 Introduction

Robotic manipulator control has been of keen interest to researchers for decades and continues to be of focal interest due to manipulator’s highly nonlinear and coupled dynamics [1, 2], e.g., in a recent approach authors have employed nonlinear adaptive fractional order fuzzy PID technique on two link robotic control [3]. RL is an online technique with roots in machine learning and has been used quite effectively for robot manipulator control [4]. However, majority of researches have attempted manipulator control either by assuming a priori knowledge of the manipulator model [3] or by assuming an approximate model of the manipulator [5, 6] or by linearizing its dynamics [7]. Some major drawbacks of these approaches are: approximate model would invariably have modeling errors while linearization doesn’t work effectively for a wide operational range [2, 8]. An observer based control scheme has been proposed in [9] with uncertain kinematics and dynamics. However, authors use two sliding mode observers to handle uncertainty in kinematics leading to slow convergence.

In model free RL or Q learning [4] based control, one doesn’t need model of the system to be controlled [10] and non-linearity is taken care of by use of appropriate function approximator, e.g., neural networks or fuzzy systems [4]. The flip side is that no convergence guarantees are available when a function approximator is employed for generalization. However, advantages of function approximation outweigh disadvantages and model free RL has been increasingly employed for control of robot manipulators [4, 11]. A single link flexible robot manipulator controller has been designed using RL in [12] where authors have tried to nullify the effect of vibrations. In [13], authors have proposed an RL based adaptive neural controller for robotic manipulators considering bounded dead zone. RL, off late, has been applied in several other fields as well, e.g., for stabilizing real-time power system and control of transient voltages [14], and for decentralized control for large scale nonlinear systems [15].

Typically, RL framework assumes Markov decision process (MDP) as the underlying model. Markov game (MG) based controller is a generalized and robust form of MDP based RL controller [16]. In a recent variation of the MG based control on a two link robotic manipulator [17]; authors have used kernel recursive least squares algorithm to approximate the value function. Although MG based controllers are adaptive, optimal and robust, they lack any stability guarantee on the designed controller. Our present attempt is to fill this lacuna in the MG based controller formulation [18].

Lyapunov theory [1] can be used to design controllers for robot manipulators with guaranteed tracking stability. In [5, 19] authors have used lyapunov theory to design neural network based controllers for robot manipulators. Actor-critic RL configuration using lyapunov theory with two neural networks has also been employed in [12, 13]: one to approximate action and other to approximate value function with consequent drawback of slow convergence. A recently proposed game theoretic controller [17] for robotic manipulators manages better tracking sans stability.

MG controller possess proven robustness to disturbances as it attempts to find best possible controller response to worst possible disturbances. However, ensuring stability of this otherwise robust controller is a research gap, which we intend to fill in our current work. This is sought to be achieved by playing a “layapunov constrained Markov game” at each state rather than “pure Markov game” as in [20]. In specific, controller is only allowed to choose actions dictated by lyapunov theory. With this lyapunov constrained action set it plays out a zero sum game against the disturber, eventually giving a stable and robust MG controller. A key difference between the proposed approach and the fuzzy Markov game approach [20] is that the rule consequents in our approach are linguistic [21] rather than crisp. We apply our proposed control scheme on two benchmark two link manipulator problems: a) Two link robot arm manipulator (TLRAM), and b) SCARA. We also consider payload variations along with external disturbances for a more realistic implementation.

Rest of the paper is structured as: Section 2 gives brief but relevant details on fuzzy Q learning and fuzzy Markov game and our proposed linguistic lyapunov fuzzy Markov game control. Details of the manipulators used to demonstrate the effectiveness of the proposed scheme, i.e., a) TLRAM and b) SCARA along with the parameters thereof are given in Section 3. Proposed approach is compared against fuzzy Q learning and fuzzy Markov game approaches in Section 4 and, Section 5 concludes the paper.

2 Theoretical background

We give a brief background on fuzzy Q learning (FQL) and fuzzy Markov games as a prelude to our approach.

2.1 Fuzzy Q learning

FQL envisages an agent receiving a reward r^k on choosing an action a^k ∈ A in state s^k. The aim of the controller/agent is to discover an approximate optimal policy that optimizes long term cumulative reward. Principle idea is to use fuzzy inference systems (FIS) as an approximator for extending Q learning based RL to large or continuous state action space problems. For further details on FQL, we refer the interested reader to [22].

2.2 Fuzzy Markov games

Fuzzy Markov games generalize FQL to a two player scenario with controller and disturbances as opponents [20]. Disturber and controller are engaged in a two player zero sum dynamic game at each state. Fuzzy Markov game rule base is:

$R_{j} : If s_{1}^{k} is L_{1}^{j} and . . . . . . . and s_{n}^{k} is L_{n}^{j} then$

$\begin{matrix} a = a_{1} and d = d_{1} with q (j, 1, 1) \\ or & a = a_{1} and d = d_{2} with q (j, 1, 2) \\ . . . . . . . \\ or & a = a_{m} and d = d_{o} with q (j, m, o) \end{matrix}$ (1)

R_j is the j^th rule of rule base, action set for controller and disturber is denoted by A = {a₁, a₂, … a_m} and D = {d₁, d₂, … d_m}, respectively. q value corresponding to controller-disturber action pair for each rule R_j is denoted by q (j, a_j, d_j). For instance, if the cardinality of disturber action space D and that of controller action space A is 3 (as in our case), a game matrix can be constructed for each rule R_j (Fig. 1).

Fig.1

Game matrix.

At each state, the controller and disturber play out a game defined by the game matrix (Fig. 1). The solution of this game given best possible controller strategy against worst possible disturber. For further details on Markov games please see [20].

2.3 Linguistic lyapunov fuzzy Markov game control

Lyapunov theory has been used extensively in stability analysis of nonlinear systems, and is based on defining a Lyapunov function (LF) [21]. A Lyapunov function P(s) should satisfy:

$\begin{matrix} LC 1 . P (0) = 0; LC 2 . P (s) > 0, \forall s \notin 0; \\ LC 3 . \overset{•}{P (s)} < 0, \forall s \notin 0 \end{matrix}$ (2)

s = 0 is assumed to be the equilibrium state. Controller design involves choosing a positive definite LF (LC1 and LC2 conditions) and the controller chooses actions to ensure that the first derivative of LF turns out to be negative definite (LC3 condition). In this work, we use lyapunov theory to formulate a fuzzy rule base and use the MG controller design philosophy on this lyapunov constrained fuzzy rule base.

“Computing with words approach” [21, 23] has been used for designing the lyapunov fuzzy Markov game controller (LFMGC). Assuming $φ, \overset{•}{φ}, φ_{d}$ and $\overset{•}{φ_{d}}$ denote actual position, actual velocity, desired position and desired velocity for the manipulator, we can define position tracking error (e) as e = φ_d - φ and rate of change of position tracking error as $(\overset{•}{e})$ , i.e., velocity tracking error as $\overset{•}{e} = \overset{•}{φ_{d}} - \overset{•}{φ}$ . $(e, \overset{•}{e}) = (0, 0)$ being the equilibrium point.

A lyapunov constrained linguistic control can be generated by defining a cost function as: $P = P (e, \overset{•}{e}) = \frac{1}{2} (e^{2} + \overset{• 2}{e})$ (3)

Obviously, conditions LC1 and LC2 (2) are satisfied by the this function P, so if the selected function has to be an LF it needs to satisfy only the L3 (2) condition, i.e., $\overset{•}{P} < 0$ : $\overset{•}{P} = \overset{•}{ee +} \overset{•}{e} \overset{• •}{e}$ (4) or P can be considered as lyapunov function if: $\overset{•}{ee +} \overset{•}{e} \overset{• •}{e} < 0$ (5)

From the physics of manipulator like systems [20], we know that $\overset{• •}{e}$ depends on applied force ‘f’. Assuming that controller has access to position and velocity of the system, we can write (5) as $\overset{•}{ee +} \overset{•}{e} f < 0$ (6)

From (6) it can be concluded that when e and $\overset{•}{e}$ have different signs, f = 0 or f with a sign opposite to $\overset{•}{e}$ is enough to satisfy (6) and no exact value of f is required. Furthermore, a) when both e and $\overset{•}{e}$ are positive f < - e satisfies (6), and b) when e and $\overset{•}{e}$ are both negative f > - e satisfies (6).

Using the “computing with words” paradigm [23]: specifying f as “Big Positive” (for negative e) makes f > - e. Also assigning “Big Negative” as linguistic variable to f makes f < - e (for positive e). Thus the lyapunov condition (2) can be interpreted linguistically as $(\overset{•}{ee +} \overset{•}{e} f < 0)$ : $IF e is PS AND \overset{•}{e} is PS THEN f is Big_N S$ (7a) $\begin{matrix} IF e is PS AND \overset{•}{e} is NS THEN f is Zero or PS \end{matrix}$ (7b) $\begin{matrix} IF e is NS AND \overset{•}{e} is PS THEN f is Zero or NS \end{matrix}$ (7c) $IF e is NS AND \overset{•}{e} is NS THEN f is Big_P S$ (7d)

Here Zero, PS, Big_PS NS, and Big_NS denote Linguistic variables representing Zero, Positive, Big Positive, Negative and Big Negative, respectively. We combine the above linguistic rules (7) to generate simplified linguistic rules (8):

We first combine (7a) and (7c) to get an aggregated linguistic rule (8a): $IF \overset{•}{e} is PS THEN f is NS or Big_N S$ (8a)

Next, combining (7b) and (7d), we get rule (8b): $IF \overset{•}{e} is NS THEN f is PS or Big_P S$ (8b)

We give a brief explanation as to how we get rule (8a) from rules (7a) and (7c):

Considering (7a) (when e is PS) and (7c) (when e is NS) it is clear that for both these rules $\overset{•}{e}$ is PS. For $\overset{•}{P} = \overset{•}{ee +} \overset{•}{e} f < 0,$ we have linguistic rules: $\begin{matrix} (7 a) e is PS : \overset{•}{P} & = & (PS) (PS) + (PS) (Big_N S) \\ = & NS or Zero \end{matrix}$ (9a) $\begin{matrix} (7 c) e is NS : \overset{•}{P} & = & (NS) (PS) + (PS) (Zero or NS) \\ = & NS \end{matrix}$ (9b)

Similarly combining (7b) and (7d), we get $\begin{matrix} (7 b) e is PS : \overset{•}{P} & = & (PS) (NS) + (NS) (Zero or PS) \\ = & NS \end{matrix}$ (9c) $\begin{matrix} (7 d) e is NS : \overset{•}{P} & = & (NS) (NS) + (NS) (Big_P S) \\ = & NS or Zero \end{matrix}$ (9d)

From the interpretation of linguistic rules (7a) and (7c) as discussed above, we can infer that for any e (i.e. “PS” or “NS”) and $\overset{•}{e} = PS$ , we can have $\overset{•}{P} < 0$ by properly choosing control action f from linguist values “Zero”, “NS” or “Big_NS”. For selecting this Lyapunov linguistic action f for the controller, we define a negative linguistic action set A_NL: {Zero, NS, Big_NS} which is made available to the controller when $\overset{•}{e} \geq 0$ .

Next step is to employ MG framework wherein a game is played between the controller (lyapunov constrained) and disturber that simultaneously ensures that $\overset{•}{P}$ is most negative and controller action is a best response to the worst disturbances, i.e., robust and stable MG control.

Using the procedure outlined above when $\overset{•}{e} = NS$ ; we can ensure most negative $\overset{•}{P}$ by selecting action from a positive linguistic action set A_PL: {Zero, PS, Big_PS} available to the controller or for $\overset{•}{e} < 0$ . These linguistic action sets A_NL and A_PL have been derived from the lyapunov theory and are appropriately termed as “linguistic lyapunov action sets”. MG formulation is used to select most optimal controller action in the face of worst possible disturber (opponent) action by framing rules of the following form:

$\begin{matrix} R_{j} : If s_{1}^{k} is L_{1}^{j} and . . . . . . . and s_{n}^{k} is L_{n}^{j} then A = A_{jL} \\ and a = a_{1 L} and d = d_{1} with q_{L} (j, 1, 1) \\ or a = a_{1 L} and d = d_{2} with q_{L} (j, 1, 2) \\ . . . . . . \\ or a = a_{mL} and d = d_{o} with q_{L} (j, m, o) \end{matrix}$ (10)

Here (a_1L, a_2L, . . . . . , a_mL) are actions belonging to the linguistic lyapunov action set A_L at state s^k corresponding to rule R_j. q_L denotes q value corresponding to linguistic lyapunov consequent.

Rule base (10) corresponds to linguistic lyapunov fuzzy MG. This linguistic lyapunov rule base (10) makes $\overset{•}{P}$ most “negative” resulting a controller that may be termed as Lyapunov fuzzy Markov game controller (LFMGC).

In the roposed linguistic lyapunov Markov game, input state vector $s^{k} = {s_{1}^{k}, s_{2}^{k}, . . . . ., s_{n}^{k}}$ is matched with the rule antecedents (10) giving rule firing strength vector α (s^k) : [α₁ (s^k) α₂ (s^k) . . . . α_N (s^k)]. Global control policy π_{La
^*} (s^k) for each rule R_j is calculated as: $π_{{La}^{*}} (s^{k}) = \frac{\sum_{j = 1}^{N} π_{{La}^{*}}^{j} (s^{k}) α^{j} (s^{k})}{\sum_{j = 1}^{N} α^{j} (s^{k})}$ (11) where $π_{{La}^{*}}^{j} (s^{k})$ is calculated as:

$\begin{matrix} π_{{La}^{*}}^{j} (s^{k}) & = & \arg min_{π_{La}^{j} \in PD (A_{L})} max_{d_{j} \in D} \\ \sum_{a_{jL} \in A_{L}} q_{L} (j, a_{jL}, d_{j}) π_{La}^{j} \end{matrix}$ (12)

PD(A_L) is probability distribution over action set A_L of the controller.

Next ∈ - Minimax policy is used to implement EEP: $π_{{La}^{†}}^{j} (s^{k}) = ɛ - Minimax π_{La}^{j} (s^{k})$ (13)

Global EEP π_{La
^†} (s^k) is generated as: $π_{{La}^{†}} (s^{k}) = \frac{\sum_{i = 1}^{N} π_{{La}^{†}}^{j} (s^{k}) α^{j} (s^{k})}{\sum_{j = 1}^{N} α^{j} (s^{k})}$ (14)

Global continuous control action at each state s^k is computed as: $a_{L G C} (s^{k}) = \sum_{k^{'} \in A_{L}} π_{L a_{k^{'}}^{†}} (s^{k}) a_{k^{'}}$ (15)

Next game solver (linear program) is used to get $V_{L}^{j} (s^{k + 1})$ , the desired value function at state s^k+1:

$\begin{matrix} V_{L}^{j} (s^{k + 1}) \\ = min_{π_{La}^{j} \in PD (A_{L})} max_{d_{j} \in D} \sum_{a_{jL} \in A_{L}} q_{L} (j, a_{jL}, d_{j}) π_{La}^{j}; \\ \forall a_{jL} \in A_{L}, d_{j} \in D \end{matrix}$ (16)

Global desired value function V_L (s^k+1) is obtained as: $V_{L} (s^{k + 1}) = \frac{\sum_{j = 1}^{N} V_{L}^{j} (s^{k + 1}) α^{j} (s^{k + 1})}{\sum_{j = 1}^{N} α^{j} (s^{k + 1})}$ (17)

Global Q value is calculated as: $Q_{L} (s^{k}) = \frac{\sum_{j = 1}^{N} q_{L} (j, a_{jL}^{†}, d_{j}^{†}) α^{j} (s^{k})}{\sum_{j = 1}^{N} α^{j} (s^{k})}$ (18) where $d_{j}^{†}$ denotes disturber action in R_j rule giving inferred disturbance action as: $d^{†} (s^{k}) = \frac{\sum_{j = 1}^{N} q_{L} (j, a_{jL}^{†}, d_{j}^{†}) α^{j} (s^{k})}{\sum_{j = 1}^{N} α^{j} (s^{k})}$ (19)

Temporal difference (TD) error (ΔQ) is calculated as: $Δ Q_{L} = r^{k} + γ V_{L} (s^{k + 1}) - Q_{L} (s^{k})$ (20)

Finally, TD error is used for q-update as:

$\begin{matrix} q_{L} (j, a_{jL}^{†}, d_{j}^{†}) \leftarrow q_{L} (j, a_{jL}^{†}, d_{j}^{†}) \\ + η Δ Q_{L} \frac{α^{j} (s^{k})}{\sum_{j = 1}^{N} α^{j} (s^{k})} \end{matrix}$ (21)

This q update, under favorable conditions, makes q values converge to lyapunov game dictated optimal values or ΔQ_L → 0 and the corresponding controller is a linguistic lyapunov Markov game controller.

3 Linguistic lyapunov fuzzy Markov game controller formulation for robot manipulators

3.1 Robotic manipulator modeling

Dynamics of robot manipulator having n joints may be written as [20]: $M (φ) \overset{• •}{φ} + C (φ, \overset{•}{φ}) \overset{•}{φ} + g (φ) + f (\overset{•}{φ}) + τ_{d} = τ$ (22) where φ is a vector of size (n × 1) and represent joint angles, vector τ and τ_d represent input torques and external disturbances, respectively each having size of (n × 1). M (φ) represents inertial matrix of size (n × 1). Centrifugal and Coriolis matrix is represented by $C (φ, \overset{•}{φ}) . f (\overset{•}{φ})$ and g (φ) give frictional force and gravitational force vector of size n × 1, respectively.

Proposed LFMGC has been implemented as a decentralized controller whence each link of the manipulator has dedicated controller. $[φ (j), \overset{•}{φ} (j)]$ representing angle and velocity, is the input to j^th link of the manipulator. Input variables have been fuzzified into 3 partitions and producing 9 rules for each link with an overall 18 rules for the two link robot. We use membership function as in [20]:

$\begin{matrix} μ_{lp} (x_{j} (i)) & = & e^{\frac{- (x_{j} (i) - x_{j}^{l_{p}} (i))}{σ_{j} (i)}}; l_{p} = 1, 2, 3; \\ j = 1, 2; i = 1, 2 \end{matrix}$ (23)

Here fuzzy label in i^th joint for j^th link is denoted by l_p. $(\begin{matrix} x_{1} (1) = φ (1), x_{1} (2) = φ (2), x_{2} (1) = \overset{•}{φ} (1) \\ and x_{2} (2) = \overset{•}{φ} (2) \end{matrix})$

Centers of above mentioned fuzzy labels are defined as $x_{j}^{l_{p}} (i) = a_{j} (i) + b_{j} (i) (l_{p} - 1)$ where a₁ (1) = a₁ (2) = -2, a₂ (1) = a₂ (2) = -10, b₁ (1) = b₁ (2) = 2, and b₂ (1) = b₂ (2) = 10 and widths of fuzzy label used are: σ₁ (1) = σ₁ (2) = 1.5 and σ₂ (1) = σ₂ (2) = 8. The tracking errors (e_t) for links can be written as:

e_t (1) = φ_d (1) - φ (1) , e_t (2) = φ_d (2) - φ (2) and cost function used is $r^{k} = \frac{1}{2} ({(e_{t} (1))}^{2} + {(e_{t} (2))}^{2} + {(\overset{•}{e_{t}} (1))}^{2} + {(\overset{•}{e_{t}} (2))}^{2}) .$

Desired trajectory for robot manipulator is: $φ_{d} = [φ_{d} (1) φ_{d} (2)] .$

3.2 Linguistic lyapunov controller for robot manipulator

Consider a function $F = \frac{1}{2} ({(e_{t} (1))}^{2} + {(e_{t} (2))}^{2} + {(\overset{•}{e_{t}} (1))}^{2} + {(\overset{•}{e_{t}} (2))}^{2})$ (24)

Quite clearly, condition LC1 and LC2 (2) are satisfied for the function F defined in (24). For making F, a lyapunov function, LC3 condition has to be satisfied, i.e., $\overset{•}{F} < 0$

$\begin{matrix} \Rightarrow e_{t} (1) \overset{•}{e_{t}} (1) + e_{t} (2) \overset{•}{e_{t}} (2) + \overset{•}{e_{t}} (1) \overset{• •}{e_{t}} (1) \\ + \overset{•}{e_{t}} (2) \overset{• •}{e_{t}} (2) < 0 \\ \Rightarrow e_{t} (1) \overset{•}{e_{t}} (1) + e_{t} (2) \overset{•}{e_{t}} (2) + \overset{•}{e_{t}} (1) (\overset{• •}{φ_{d}} (1) - \overset{• •}{φ} (1)) \\ + \overset{•}{e_{t}} (2) (\overset{• •}{φ_{d}} (2) - \overset{• •}{φ} (2)) < 0 \\ \Rightarrow e_{t} (1) \overset{•}{e_{t}} (1) + e_{t} (2) \overset{•}{e_{t}} (2) + \overset{•}{e_{t}} (1) \overset{• •}{φ_{d}} (1) \\ + \overset{•}{e_{t}} (2) φ_{d}^{• •} (2) - \overset{•}{e_{t}} (1) \overset{• •}{φ} (1) - \overset{•}{e_{t}} (2) \overset{• •}{φ} (2) < 0 \end{matrix}$ (25)

Though, exact model of manipulator is not known; dynamics of manipulator suggest that $\overset{• •}{φ} (1) and \overset{• •}{φ} (2)$ depend on τ (1) - τ (2) and τ (2) - τ (1) directly (dynamic model of robot manipulator and required relations have been given in Appendices A, B and C). We can write (25) as:

$\begin{matrix} e_{t} (1) \overset{•}{e_{t}} (1) + e_{t} (2) \overset{•}{e_{t}} (2) + \overset{•}{e_{t}} (1) \overset{• •}{φ_{d}} (1) + \overset{•}{e_{t}} (2) \overset{• •}{φ_{d}} (2) \\ - \overset{•}{e_{t}} (1) (τ (1) - τ (2)) - \overset{•}{e_{t}} (2) (τ (2) - τ (1)) \\ < 0 \\ \Rightarrow e_{t} (1) \overset{•}{e_{t}} (1) + e_{t} (2) \overset{•}{e_{t}} (2) + \overset{•}{e_{t}} (1) \overset{• •}{φ_{d}} (1) \\ + \overset{•}{e_{t}} (2) \overset{• •}{φ_{d}} (2) + τ (1) (\overset{•}{e_{t}} (2) - \overset{•}{e_{t}} (1)) \\ + τ (2) (\overset{•}{e_{t}} (1) - \overset{•}{e_{t}} (2)) < 0 \end{matrix}$ (26)

Considering $\begin{matrix} e_{t} (1) \overset{•}{e_{t}} (1) + e_{t} (2) \overset{•}{e_{t}} (2) \overset{•}{+ e_{t}} (1) \overset{• •}{φ_{d}} (1) \\ + \overset{•}{e_{t}} (2) \overset{• •}{φ_{d}} (2) = c 1 \end{matrix}$ and $τ (1) (\overset{•}{e_{t}} (2) - \overset{•}{e_{t}} (1)) + τ (2) (\overset{•}{e_{t}} (1) - \overset{•}{e_{t}} (2)) = c 2$ (26) can be written as: $c 1 + c 2 < 0$ (27) (27) can be formulated in the form of linguistic rules:

$\begin{matrix} IF c 1 is NS THEN c 2 is Zero or NS or Big_N S \\ IF c 1 is PS THEN c 2 is Big_N S \\ IF c 1 is Zero THEN c 2 is NS or Big_N S \end{matrix}$ (28) (28) indicates that making c2 Big_NS will satisfy (27). Thus to make F (24) an LF, it is sufficient to ensure c2 is Neg_Big (irrespective of c1 value), i.e., $c 2 = τ (1) (\overset{•}{e_{t} (2)} - \overset{•}{e_{t} (1)}) + τ (2) (\overset{•}{e_{t} (1)} - \overset{•}{e_{t} (2)})$ should be a large negative number, giving linguistic rules (Table 1).

Table 1

Linguistic rules for stable manipulator control

$\overset{•}{e_{t}} (1)$	$\overset{•}{e_{t}} (2)$	τ (1)	τ (2)
PS	PS	for $\overset{•}{e_{t}} (1) \geq \overset{•}{e_{t}} (2) \Rightarrow$ Big_PS	for $\overset{•}{e_{t}} (1) \geq \overset{•}{e_{t}} (2) \Rightarrow$ Big_NS
		for $\overset{•}{e_{t}} (1) < \overset{•}{e_{t}} (2) \Rightarrow$ Big_NS	for $\overset{•}{e_{t}} (1) < \overset{•}{e_{t}} (2) \Rightarrow$ Big_PS
NS	PS	NS	PS
Zero	PS	NS	PS
PS	NS	PS	NS
NS	NS	for $\overset{•}{e_{t}} (1) \geq \overset{•}{e_{t}} (2) \Rightarrow$ Big_PS	for $\overset{•}{e_{t}} (1) \geq \overset{•}{e_{t}} (2) \Rightarrow$ Big_NS
		for $\overset{•}{e_{t}} (1) < \overset{•}{e_{t}} (2) \Rightarrow$ Big_NS	for $\overset{•}{e_{t}} (1) < \overset{•}{e_{t}} (2) \Rightarrow$ Big_PS
Zero	NS	PS	Neg
PS	Zero	PS	Neg
NS	Zero	Neg	PS
Zero	Zero	Zero	Zero

To implement these lyapunov linguistic rules, two lyapunov linguistic action sets have been defined: $\begin{matrix} A_{NL} & = & {Zero, NS, Big_N S}; \\ A_{PL} & = & {Zero, PS, Big_P S} \end{matrix}$

Next, we use Markov game to select most appropriate stochastic action from A_NL or A_PL (by using game the matrix) according to the case, i.e., if $\overset{•}{e_{t}} (1)$ is NS and $\overset{•}{e_{t}} (2)$ is PS then the controller must use action set A_NL for τ (1) and action set A_PL for τ (2) in the game matrix. Likewise, if $\overset{•}{e_{t}} (1)$ is PS and $\overset{•}{e_{t}} (2)$ is PS and if $\overset{•}{e_{t}} (1) \geq \overset{•}{e_{t}} (2)$ then τ (1) and τ (2) must be selected from A_PL and A_NL, respectively. Using Markov game in a lyapunov constrained setting, we are able ensure that $\overset{•}{F}$ is as negative as possible.

4 Simulating linguistic lyapunov fuzzy Markov game control

Proposed controller has been simulated on two benchmark robot manipulators problems: i) TLRAM and ii) SCARA. Parameters used in simulations are: sampling time = 20 msec, learning rate = 0.05, discount factor = 0.9 and exploration rate decreases gradually from 0.4 to 0.01. Action sets used: A_NL = [-20 - 10 0] Nm and A_PL = [20 10 0] Nm. These parameters remain same for both manipulator problems.

Details for the two link robot arm manipulator and SCARA are provided in Appendix A. The manipulators are simulated by 4th order Runge–Kutta method for 20 s. We apply the proposed controller on TLRAM for two different cases: i) with fixed payload and external disturbances, and ii) with a varying payload and external disturbances, and compare its performance against baseline FQL and FMGC controllers. ±20% of the applied torque (Gaussian distributed around mean) has been used as external disturbances. Payload m₂ variation for TLRAM are: $\begin{matrix} (a) t \leq 3 s, m_{2} = 1 kg, (b) 3 s < t \leq 6 s, m_{2} = 3 kg, \\ (c) 6 s < t \leq 10 s, m_{2} = 1 kg, (d) 10 s < t \leq 15 s, \\ m_{2} = 2 kg, and (e) 15 s < t \leq 20 s, m 2 = 1 kg \end{matrix}$ and for SCARA parameters (equivalent to payload) p = {p₁, p₂, p₃} variations are: $\begin{matrix} (a) t \leq 3 s, p_{1} = 5; p_{2} = 0 . 9; p_{3} = 0.3, \\ (b) 3 s < t \leq 6 s, p_{1} = 15; p_{2} = 2 . 7; \\ p_{3} = 0 . 9, (c) 6 s < t \leq 10 s, p_{1} = 5; p_{2} = 0.9; \\ p_{3} = 0 . 3, (d) 10 s < t \leq 15 s, p_{1} = 10; \\ p_{2} = 1 . 8; p_{3} = 0 . 6, and \\ (e) 15 s < t \leq 20 s, p_{1} = 5; p_{2} = 0 . 9; p_{3} = 0 . 3 \end{matrix}$

4.1 Two link robotic arm manipulator (disturbance only)

Figures 2 and 3 show trajectory tracking error (in degrees) for link1 & link2, respectively with disturbance only. It is evident from figures that the trajectory tracking error is least with the proposed controller as compared to fuzzy Q learning controller (FQLC) & fuzzy Markov game controllers (FMGC). Moreover, performance of LFMGC is significantly superior as compared to FMGC. Figures 4 and 5 give torque required to attain desired trajectory for link1 & link2. Torque requirement for our proposed controller is almost half as compared to torque required by FQLC. Torque required is higher in FMGC as compared to LFMGC, besides there is more chattering in FMGC than LFMGC. We conclude that LFMGC has superior trajectory tracking and requires less torque to achieve the required trajectory.

Fig.2

Trajectory tracking error for theta1 (TLRAM) with disturbances only.

Fig.3

Trajectory tracking error for theta2 (TLRAM) with disturbances only.

Fig.4

Torque required by link1 (TLRAM) with disturbances only.

Fig.5

Torque required by link2 (TLRAM) with disturbances only.

4.2 Two link robotic arm manipulator (disturbances and payload variations)

Tracking errors for link1 & link2 with external disturbances and payload variations is shown in Figs. 6 and 7. Here also LFMGC exhibits superiority over baseline controllers, i.e., FQLC and FMGC. Figures 8 and 9 show torque requirements for link1 and link2. Least torque is required by LFMGC while FQLC and FMGC both exhibit more chattering as compared to LFMGC. This showcases superior disturbances and payload variation handling by LFMGC.

Fig.6

Trajectory tracking error for theta1 (TLRAM) with disturbances and payload variations.

Fig.7

Trajectory tracking error for theta2 (TLRAM) with disturbances and payload variations.

Fig.8

Torque required by link1 (TLRAM) with disturbances and payload variations.

Fig.9

Torque required by link2 (TLRAM) with disturbances and payload variations.

4.3 SCARA (disturbances only)

Figures 10 and 11 give trajectory tracking efficiency for SCARA (with external disturbances only). We observe that tracking error is achieved by LFMGC is least. Moreover, this error is not very abrupt in nature as observed in the case of FQLC and FMGC. Figures 12 and 13 show torque requirement for link1 and link2 and indicates that LFMGC has chattering problems as compared to FQLC and FMGC. The required torque is comparable (and even greater for link2) to the FQLC and FMGC. We conclude that although our controller is good trajectory tracking but its performance is jerkier than others in case of SCARA.

Fig.10

Trajectory tracking error for theta1 (SCARA) with disturbances only.

Fig.11

Trajectory tracking error for theta2 (SCARA) with disturbances only.

Fig.12

Torque required by link1 (SCARA) with disturbances only.

Fig.13

Torque required by link2 (SCARA) with disturbances only.

4.4 SCARA (disturbances and payload variations)

Tracking errors for link1 and link2 with varying pay load and external disturbances are least and pretty smooth for LFMGC (Figs. 14 and 15). Torque requirement for link1 is minimum with LFMGC (Fig. 16). Figure 17 gives torque required by link2 which is comparable to others and but has chattering, particularly when payload is minimal for LFMGC. Overall, with payload variations, LFMGC performs better in terms torque requirement and its performance is best amongst the three controller for trajectory tracking. Chattering can be overcome or reduced by use of appropriate filters in the control mechanism.

Fig.14

Trajectory tracking error for theta1 (SCARA) with disturbances and payload variations.

Fig.15

Trajectory tracking error for theta2 (SCARA) with disturbances and payload variation.

Fig.16

Torque required by link1 (SCARA) with disturbances and payload variations.

Fig.17

Torque required by link2 (SCARA) with disturbances and payload variations.

5 Conclusions and future scope

We proposed a fuzzy Markov game controller for robot manipulators wherein controller actions are lyapunov constrained. Lyapunov theory based action generation mechanism lends much needed stability to the robust Markov game framework. To the best of our knowledge, ours is a first attempt at framing a linguistic version of Markov game control for robot manipulators. Our proposed MG control achieves superior tracking, albeit with higher but comparable torque over the baseline FQLC and FMGC controllers. In future, we intend to incorporate fuzzy sliding-mode control (FSMC) [24] in our proposed scheme to eliminate torque chattering and employ it on two link manipulators in rotating co-ordinate systems [25] and for managing intelligent buildings [26].

Footnotes

APPENDIX

References

Dixon

W.E.

, Behal

, Dawson

D.M.

and Nagarkatti

S.P.

, Nonlinear control of engineering systems: A Lyapunov-based approach: Springer Science & Business Media; 2013.

Fradkov

A.L.V.

, Miroshnik

I.I.A.V.E.

and Nikiforov

V.O.

, Nonlinear and adaptive control of complex systems: Springer Science & Business Media; 2013.

Kumar

and Rana

, Nonlinear adaptive fractional order fuzzy PID control of a 2-link planar rigid manipulator with payload, Journal of the Franklin Institute354(2) (2017), 993–1022.

Kober

, Bagnell

J.A.

and Peters

, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research32(11) (2013), 1238–1274.

, David

A.O.

, Yin

and Sun

, Neural network control of a robotic manipulator with input deadzone and output constraint, IEEE Transactions on Systems, Man, and Cybernetics: Systems46(6) (2016), 759–770.

, Ouyang

and Hong

, Vibration control of a flexible robotic manipulator in the presence of input deadzone, IEEE Transactions on Industrial Informatics13(1) (2017), 48–59.

Mahil

S.M.

and Al-Durra

, Modeling analysis and simulation of 2-DOF robotic manipulator, IEEE 59th International Midwest Symposium Circuits and Systems (MWSCAS), pp. 1–4, 2016.

Wang

and Yang

, Dynamic learning from adaptive neural control of robot manipulators with prescribed performance, IEEE Transactions on Systems, Man, and Cybernetics: SystemsPP(99) (2017), 1–12.

Xiao

, Yin

and Kaynak

, Tracking control of robotic manipulators with uncertain kinematics and dynamics, IEEE Transactions on Industrial Electronics63(10) (2016), 6439–6449.

10.

Lewis

F.L.

and Vrabie

, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits and Systems Magazine9(3) (2009), 32–50.

11.

Khan

S.G.

, Herrmann

, Lewis

F.L.

, Pipe

and Melhuish

, Reinforcement learning and optimal adaptive control: An overview and implementation examples, Annual Reviews in Control36(1) (2012), 42–59.

12.

Ouyang

, He

and Li

, Reinforcement learning control of a single-link flexible robotic manipulator, IET Control Theory & Applications11(9) (2017), 1426–1433.

13.

Tang

, Liu

Y.-J.

and Tong

, Adaptive neural control using reinforcement learning for a class of robot manipulator, Neural Computing and Applications25(1) (2014), 135–141.

14.

Yousefian

and Kamalasadan

, Design and real-time implementation of optimal power system wide-area system-centric controller based on temporal difference learning, IEEE Transactions on Industry Applications52(1) (2016), 395–406.

15.

Zhao

, Wang

, Shi

, Liu

and Li

, Decentralized Control for Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections via Policy Iteration, IEEE Transactions on Systems, Man, and Cybernetics: SystemsPP(99) (2017), 1–11.

16.

Yang

and Gu

, Multiagent reinforcement learning for multi-robot systems: A survey, tech. Rep; 2004.

17.

Shah

and Gopal

, Kernel recursive least squares function approximation in game theory based control, Procedia Technology23 (2016), 264–271.

18.

Busoniu

, Babuska

and De Schutter

, Multi-agent reinforcement learning: A survey, IEEE 9th International Conference on Control, Automation, Robotics and Vision (ICARCV) (2006), 1–6.

19.

, Dong

and Sun

, Adaptive neural impedance control of a robotic manipulator with input saturation, IEEE Transactions on Systems, Man, and Cybernetics: Systems46(3) (2016), 334–344.

20.

Sharma

and Gopal

, A markov game-adaptive fuzzy controller for robot manipulators, IEEE Transactions on Fuzzy Systems16(1) (2008), 171–186.

21.

Kumar

and Sharma

, Lyapunov theory based intelligent fuzzy controller for Inverted Pendulum, IEEE International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES) (2016), 1–5.

22.

Kumar

and Sharma

, Fuzzy lyapunov reinforcement learning for non linear systems, ISA Transactions67 (2017), 151–159.

23.

Margaliot

and Langholz

, Fuzzy control of a benchmark problem: A computing with words approach, IEEE Transactions on Fuzzy Systems12(2) (2004), 230–235.

24.

Saghafinia

, Ping

H.W.

, Uddin

M.N.

and Gaeid

K.S.

, Adaptive fuzzy sliding-mode control into chattering-free IM drive, IEEE Transactions on Industry Applications51(1) (2015), 692–701.

25.

and Kong

, Two-degree-of-freedom control of a two-link manipulator in the rotating coordinate system, IEEE Transactions on Industrial Electronics62(9) (2015), 5598–5607.

26.

Martirano

, Parise

and Manganelli

, A fuzzy-based building automation control system: Optimizing the level of energy performance and comfort in an office space by taking advantage of building automation systems and solar energy, IEEE Industry Applications Magazine22(2) (2016), 10–17.