Solving nonconvex economic thermal power dispatch problem with multiple fuel system and valve point loading effect using fuzzy reinforcement learning

Abstract

We propose a fuzzy Reinforcement learning (FRL) framework for an efficient solution to the Economic thermal power dispatch (ETPD) considering multiple fuel options along with valve point loading effect concerning with thermal power generating units. The objective of ETPD is optimizing operating cost for specified power demand meet and to satisfy the generation capacity limits of each unit. In the presented work, We cast the ETPD as a multi agent FRL (MAFRL) problem wherein individual thermal generators act as players for minimizing operational cost and also satisfying the generation limits of each units to obtain a specified power demand. To prove supremacy and validity of proposed multi agent fuzzy reinforcement learning technique, two benchmark test systems involving 10 and 40 units integrated using numerous fuel systems with valve point loading effect have been simulated. Simulation results and comparison against several other existing solution approaches showcases the efficacy of MAFRL technique in solving the ETPD problem.

Keywords

Economic thermal power dispatch fuzzy reinforcement learning multiple fuel system valve point loading

1 Introduction

Economic thermal power dispatch (ETPD) refers to a constrained load optimization and energy management task for power scheduling operation and control. Main objective of solving ETPD have to find best possible arrangement for power generation among the units for minimizing total operating cost of fuels and thereby satisfying various constraints of equality and inequality; power balance, transmission loss and power demand constraints. ETPD problem shows highly non-linear characteristics because of modern power generating units are operated with multiple fuel systems. Therefore, ETPD is consider as non-continuous, non-convex and non-differentiable optimization task for finding the optimal powerdispatch trough each generating unit.

In the last two decade, researchers have implemented several conventional, meta heuristic and hybrid techniques for finding solution of ETPD incorporating valve point effect combining with numerous fuel options e.g., gradient technique [1], dynamic programming (DP) [2], linear programming [3], quadratic programming [4], Lagrange relaxation (LR)[5], hybrid GA (HGA) [6], real coded GA (RCGA) [7], advanced real coded GA (ARCGA) [8], evolutionary programming (EP) [9], ant colony optimization (ACO) [10], biogeography based optimization (BBO) [11], differential evolution (DE) [12], differential evolution integrated biogeography based optimizer (DEBBO) [13], bacteria foraging optimization (BFO) [14], modified BFO [15], group search optimization (GSO) [16], differential evolution based PSO (DEPSO) [8],craziness based PSO (CRPSO) [17], seeker based optimization (SOA) [18], Taguchi algorithm (TSA) [19], krill herd algorithm (KHA) [20], Grey wolf optimization algorithm (GWO) [21], cuckoo search algorithm (CSA) [22], one rank cuckoo search algorithm (ORCSA) [23], improved GA (IGA) [24], enhanced augmented Lagrange Hopefield network [25], crisscross optimization (CSO) [26], dynamic search space strategy (DSSS) [27], and social spider method (SSM) [28].

This work is an effort to enclose a multi agent adaptation of fuzzy reinforcement learning framework [29] for an competent solution to the ETPD problem. In our view, it is a foremost effort at ETPD problem via the FRL. Here every player (generating unit) attempts to optimize its output based on an RL signal emanated by the system while satisfying several system constraints. In other words, We solve a constrained optimization task using FRL. Reinforcement learning is a paradigm which aims to optimize behavior of a single or several agents operating in an Markov decision process (MDP) [30] environment. The agents/players seek to find the most optimal policy based on a reinforcement learning signal emitted by the environment. This RL signal is a heuristic signal signifying the profitability or otherwise of an action taken by an agent. Thus, RL concerns optimization of a sequential decision making problem.

In [30], FRL has been used to generate approximate optimal controllers for non-linear systems/plants. In this work, we extend FRL to a multi generator optimization scenario. Recently we have implemented FRL to solve complex constrained thermal unit commitment problem in [31]. Our attempt here is to look at ETPD as a sequential decision making problem in a self learning framework. Here different generators learn to output correct power level by a trial and error mechanism. The aim of each generator is to output power so that total cost of generation to meet a given load is minimized. In an RL framework, this is achieved by minimization of a cumulative cost function called the Q function for each generator. Reinforcement learning basically rewards actions that lead to reduction in generation cost while discouraging higher cost actions.

Rest of the manuscript is planned as follows: in Section 2, we describe the formulation of ETPD problem, in Section 3 fuzzy reinforcement learning approach is briefly described, Section 4 explains implementation details of MAFRL approach as applied to the ETPD problem, in Section 5 experimental consequences and evaluation with various contemporary approaches are specified and Section 6 concludes the paper.

2 Mathematical modeling of economic thermal power dispatch problem

Aim of ETPD solution is to minimize cost of fuel for each unit while meeting power balance constraints and total load demand.

2.1 Fuel cost function

The fuel cost function for solving ETPD is given in form of a quadratic expression as: $F_{total} = \sum_{j = 1}^{NG} F_{j} (U_{P_{j}}) = \sum_{j = 1}^{NG} a_{j} U_{P_{j}}^{2} + b_{j} U_{P_{j}} + c_{j}$ (1)

where a_j, b_j and c_j represents fuel cost coefficients for j^th generating. unit, F_total represents summation of fuel cost, U_{p
_j} is power meet due to j^th generating. unit, F_j (U_{p
_j}) represents function of fuel cost for j^th unit, NG represents overall number of generating units.

2.1.1 Objective cost function with multiple fuel systems

Modern thermal power generating units have multiple fuel options (MFO) from several fuel sources. Each power generating unit may have different fuel cost with numerous fuel options. Here choice of fuel depends on load demand and power generation availability. Here objective is to search for an appropriate fuel option for each unit so that total operating cost will have minimum value while fulfilling equality constraints due to load balance and inequality constraints due to generating capacity. More realistic and accurate ETPD solutions can be obtained by incorporating MFO with VPE and mathematically represented as:

$F_{total} = \sum_{j = 1}^{NG} F_{j} (U_{p_{j}}) = \sum_{j = 1}^{NG} a_{j, h} U_{p_{j}}^{2} + b_{j, h} U_{p_{j}} + c_{j, h} + | e_{j, h} sin {f_{j, h} (U_{p_{j}, min} - U_{p_{j}})} |$ (2)

where a_j,h,b_j,h,c_j,h,e_g,h and f_j,h are representing fuel cost coefficients of h^th fuel system for j^th generating units.

We formulate MFO in a manner different from earlier ones [32], where selection of fuel options are predefined at different power levels as:

$F_{p_{j}} (U_{p_{j}}) = {\begin{matrix} a_{p_{j}, 1} U_{p_{j}}^{2} + b_{p_{j}, 1} U_{p_{j}} + c_{p_{j}, 1} + | e_{p_{j}, 1} sin {f_{p_{j}, 1} (U_{p_{j}, min} - U_{p_{j}})} |; if U_{p_{j}} \in [U_{p_{j}, min}, U_{1}] \\ a_{p_{j}, 2} U_{p_{j}}^{2} + b_{p_{j}, 2} U_{p_{j}} + c_{p_{j}, 2} + | e_{p_{j}, 2} sin {f_{p_{j}, 2} (U_{p_{j}, min} - U_{p_{j}})} |; if U_{p_{j}} \in [U_{1}, U_{2}] \\ . . . \\ a_{p_{j}, h} U_{p_{j}}^{2} + b_{p_{j}, h} U_{p_{j}} + c_{p_{j}, h} + | e_{p_{j}, h} sin {f_{p_{j}, h} (U_{p_{j}, min} - U_{p_{j}})} |; if U_{p_{j}} \in [U_{h - 1}, U_{p_{j}, max}] \end{matrix}$ (3)

where U₁, U₂, …,U_h-1 are several predefined levels of power for selecting fuel options, and U_{p_j,max} represents maximum power generation limit for j^thsunit. We formulate MFO in a more practical manner so that selection of fuel options for generating units could be free from any predefined levels of power. For accurate results, we compare fuel cost of each unit for all fuel options available. For illustration, if power generated from unit 1 is U₁ and available fuel options are of 3 types then F_total is calculated as: $\begin{matrix} F_{p_{j}} (U_{p_{j}}) = \\ min (a_{p_{j}, 1} U_{1}^{2} + b_{p_{j}, 1} U_{1} + c_{p_{j}, 1} + | e_{p_{j}, 1} \\ sin {f_{p_{j}, 1} (U_{p_{j}, min} - U_{1})} |, \\ a_{p_{j}, 2} U_{1}^{2} + b_{p_{j}, 2} U_{1} + c_{p_{j}, 2} + | e_{p_{j}, 2} \\ sin {f_{p_{j}, 2} (U_{p_{j}, min} - U_{1})} |, \\ a_{p_{j}, 3} U_{1}^{2} + b_{p_{j}, 3} U_{1} + c_{p_{j}, 3} + | e_{p_{j}, 3} \\ sin {f_{p_{j}, 3} (U_{p_{j}, min} - U_{1})} |) \end{matrix}$ (4)

2.2 Constraints

2.2.1 Unit capacity constraint

Each thermal power generating unit has to generate active power within boundary of its generating capacity. Mathematically: $U_{p_{j}, min} \leq U_{p_{j}} \leq U_{p_{j}, max}$ (5)

2.2.2 Power equilibrium constraint

Generating power from all units should be equivalent to sum of power demand plus the total transmission losses. Mathematically:

$\sum_{j = 1}^{NG} U_{p_{j}} = P_{demand} + P_{loss}$ (6)

where P_demand represent power demand P_loss represents transmission loss given by Kron’s in [33] as:

$P_{loss} = B_{00} + \sum_{j = 1}^{NG} B_{p_{j} 0} U_{p_{j}} + \sum_{j = 1}^{NG} \sum_{g = 1}^{NG} U_{p_{j}} B_{p_{j} g} U_{g}$ (7)

where B_{p
_j
g}, B_{p_j0}, and B₀₀ are the power loss coefficients.

2.2.3 Power balance constraint handling

Handling of power balance constraint is done by eliminating one variable like other equality constraint handling in optimization task. The eliminating variable is called as slack variable. Here total number of committed units is NG and out of which only NG - 1 variables/ committed units are controlled using MAFRL approach. The power of slack variable/generator is computed by rearranging the equality constraint as: $U_{p_{NG}} = P_{demand} - \sum_{j = 1}^{NG - 1} U_{p_{j}}^{*}$ (8)

If the slack generator output U_{p
_NG} violates its limit then external penalty is added in objective cost function to hold the violation as:

$\begin{matrix} {Pen}_{G} = \\ {\begin{matrix} ζ \times (U_{p_{NG}} - U_{{p_{NG}}_{, max}})^{2}; if U_{p_{NG}} \geq U_{{p_{NG}}_{, max}} \\ ζ \times (U_{{p_{NG}}_{, min}} - U_{p_{NG}})^{2}; if U_{p_{NG}} \leq U_{{p_{NG}}_{, min}} \\ 0; if U_{{p_{NG}}_{, min}} \leq U_{p_{NG}} \leq U_{{p_{NG}}_{, max}} \end{matrix} \end{matrix}$ (9)

where ζ is penalty factor with large valve.

3 Multi agent fuzzy reinforcement learning approach

We look at ELD problem as a constrained optimization task in an RL setup. The RL procedure tries to get optimal power output of each generator while satisfying power balance and load demand constraints which minimizes the total fuel cost. First basic algorithm on which our research based is the Q learning.

3.1 Q learning

Q learning framework optimizes decision making in sequential assessment task modeled as Markov decision making process. Q learning involves repeated action generation at each stage and an evaluation of the action choice to figure out best possible action at each stage. We have implemented a multi agent adaptation of Q learning for solving ETPD.

In Q learning procedure each generating unit gives a particular power from within it constrains and the power generated by all generators must satisfy system constraints and total power demand. The power outputs of generators are represented by y^k ∈ Y (k = 0, 1, 2…). The goal is to meet load demand while satisfying constraints. If the combined action of generators is satisfactory; all generators are rewarded or else they are penalized for making wrong choices. Thus, it is a learning problem to discover correct power outputs. We judge this using a Q value estimate Q (y^k, a (y^k)) where a (y^k) ∈ A (y^k) is the power output of the generators. These Q values are tuned incrementally in an online manner for identifying optimal power generation. Next, we digress briefly to describe the fuzzy Q learning approach.

3.2 Fuzzy Q Learning

Standard Q learning algorithm uses a look-up table for storing (state action) values or the Q (y^k, a (y^k)) values. However, as the state-space dimensionality increases, look up table based storing of values becomes computationally intractable. To overcome this hurdle; we can use function approximator to store Q values.

Function approximation could be carried out using neural networks or fuzzy inference systems. In this work, we have used fuzzy systems for implementing multi agent Q learning. At each instant, the state vector $y^{k} = {y_{1}^{k}, y_{2}^{k}, . . . ., y_{n}^{k}}$ ; (generator output at each instant k) are matched using fuzzy labels generated through each unit’s power capacity. Due to fuzzification a rule firing strength vector F_i : α_i (y^k) is created. In each rule, a generator can choose how much to generate from the set of available actions A = {u₁, u₂,…, u_m}. The quality of each action is defined by a q_g value associated with it. The rule base of FIS is defined by Rl_g: $\begin{matrix} {Rl}_{g} : If y_{1}^{k} is T_{1}^{g} and y_{n}^{k} is T_{n}^{g} \\ then u = u_{1} for q_{g} (g, 1) \\ or u = u_{2} for q_{g} (g, 2) \\ . . . . . . . . . . \\ or u = u_{m} for q_{g} (g, m) \end{matrix}$ (10)

where $T_{s}^{g}$ = linguistic expression for $y_{s}^{k}$ under rule Rl_g and $α_{T_{s}^{g}}$ represents membership function.

The agent/generator has to select appropriate action corresponding to maximum q_g value from amongst the m actions. Overall action or combined action of the generator is obtained as weighted average of generator actions under chosen each rule as: $u (y^{k}) = \frac{\sum_{g = 1}^{NG} α_{g} (y^{k}) u_{g}}{\sum_{g = 1}^{NG} α_{g} (y^{k})}; u_{g} \in A$ (11)

u_g being the optimal action in rule Rl_g.

In RL, we need to explore for more profitable actions from the action set. This is achieved by following an exploration-exploitation policy (EEP) which selects random action from the set of available actions with a small probability. This EEP action is designated as ε - greedy and is obtained as $u_{g}^{†} = ε - greedy u_{i}$ and $u_{g}^{*}$ represents action for maximizing power output, i.e., $q_{g} (g, u_{g}^{*}) = \max_{b \leq m} q (g, b)$ , finally the action u (y^k) is being given by Q-value as: $Q_{g} (y^{k}, u (y^{k})) = \frac{\sum_{g = 1}^{NG} α_{g} (y^{k}) q_{g} (g, u_{g}^{†})}{\sum_{g = 1}^{NG} α_{g} (y^{k})}$ (12)

Next, we calculate state value as: $V_{g} (y^{k}) = \frac{\sum_{g = 1}^{NG} α_{g} (y^{k}) q_{g} (g, u_{g}^{*})}{\sum_{g = 1}^{NG} α_{g} (y^{k})}$ (13)

The generated action u (y^k) is initial output of generating unit and corresponding signal “r” is generated. By using this signal we obtained a temporal difference (TD) and using TD updated output of each generating unit is calculated as per the following: $TD = r + γ V_{g} (y^{k + 1}) - Q (y^{k}, u (y^{k}))$ (14)

$q_{g} (g, u_{g}^{†}) \leftarrow q_{g} (g, u_{g}^{†}) + η TD \frac{α_{g} (y^{k})}{\sum_{g = 1}^{NG} α_{g} (y^{k})}$ (15)

here 0 ≤ γ < 1 refers to discount factor; basically discount factor relates updated cost to present cost and η (learning-rate factor) is used to blend old estimates with new ones.

4 Multiagent fuzzy reinforcement learning framework for ETPD problem

We attempt a multi agent FRL framework for solving ETPD problem. First point is to place fuzzy sets above each units power limits and partitioning within their operating range. We impose Gaussian membership functions through the universe of discourse. For example, if the power limit of a unit is [U_{p_j,min}, U_{p_j,max}], We fix 3 fuzzy sets having centers on $[U_{p_{j}, min}, \frac{U_{p_{j}, min} + U_{p_{j}, max}}{2}, U_{p_{j}, max}]$ and whose standard deviation is calculate as, $σ = [\frac{U_{p_{j}, max} - U_{p_{j}, min}}{5}]$ . Here total numbers of units in system are NG, so we match their power outputs within their operating range, and a rule firing strength vector (α₁, α₂,………….., α_N) is considered. $N = \prod_{j = 1}^{NG} N_{L} (j)$ and N_L (j) is taken as number of linguistic variables of j^th unit. The power output from unit i.e $j^{k} = {j_{1}^{k}, j_{2}^{k}, . . . . . . . . . . . ., j_{NG}^{k}}$ is taken as input vector to the fuzzy inference system. Where $j_{1}^{k}, j_{2}^{k}, . . . . . . . . . . . ., j_{NG}^{k}$ is generated power with units 1, 2,………., NG at time k. This input vector is labeled through fuzzy sets over each unit’s power limit. Therefore, every rule Rl_g : α_g (j^k) is having n probable actions as u ={ u₁, u₂,………., u_n }, and for every action a q_g value is connected to output.

In MAFRL, we organized all rules Rl_g : in the succeeding form:

If j₁ is $G_{1}^{i}$ AND j₂ is $G_{2}^{i}$ AND…….AND j_N is $G_{N}^{i}$ THEN

$gen_out = j_{1} and u = u_{1} for q_{g} (j_{1}, g, u_{1}) OR gen_out = j_{1} and u = u_{2} for q_{g} (j_{1}, g, u_{2}) OR \dots gen_out = j_{1} and u = u_{n} for q_{g} (j_{1}, g, u_{n}) OR AND gen_out = j_{2} and u = u_{1} for q_{g} (j_{2}, g, u_{1}) OR gen_out = j_{2} and u = u_{2} for q_{g} (j_{2}, g, u_{2}) OR \dots gen_out = j_{2} and u = u_{n} for q_{g} (j_{2}, g, u_{n}) OR AND \dots AND gen_out = j_{N} and u = u_{1} for q_{g} (j_{N}, g, u_{1}) OR gen_out = j_{N} and u = u_{2} for q_{g} (j_{N}, g, u_{2}) OR \dots gen_out = j_{N} and u = u_{n} for q_{g} (j_{N}, g, u_{n}) OR$ (16)

where $G_{s}^{j}$ = Linguistic term for unit 1 with output $j_{s}^{k} (s = 1, . . . . . . . . ., NG)$ in rule Rl_g. Its membership is represented by $μ_{G_{s}^{g}}$ . Therefore every rule in MAFRL gives a q_g value for each unit action.

Thus at each state g^k, we obtain an optimal action vector ${\bar{u}}_{g_{p}}^{g} = (u_{g_{1}^{*}}^{g}, u_{g_{2}^{*}}^{g}, . . . . . . u_{g_{NG}^{*}}^{g})$ where $u_{g_{p}^{*}}^{g}$ is the generating power of unit g_psquo (p = 1………. NG) for rule Rl_g. Finally optimal powers of each unit have obtained for minimizing total production cost as: $u_{g_{p}^{*}}^{g} = arg min_{u \in U} q_{g} (g_{p}, g, u)$ (17)

Takagi Sugano FIS rule aggregator is used to create simplified optimal power for each unit as: $U_{g_{p}}^{*} = \frac{\sum_{g = 1}^{NG} α_{g} (g^{k}) u_{g_{p}^{*}}^{g}}{\sum_{g = 1}^{NG} α_{g} (g_{p}^{k})} \forall g_{p} \in (1 . . . . . . . . g_{NG})$ (18)

Where $U_{g_{p}}^{*}$ is simplified optimal power unit g_p. Exploration-exploitation (EEP) has implemented for exploring best values of optimal power as: $u_{g_{p}^{†}}^{g} = \in - greedy u_{g_{p}^{*}}^{g}$ (19)

where $u_{g_{p}^{†}}^{g}$ ∈-greedy action of each unit g_p using EEP technique.

Optimal EEP power is specified by: $U_{g_{p^{†}}} = \frac{\sum_{g = 1}^{NG} α_{g} (g^{k}) u_{g_{p}^{†}}^{g}}{\sum_{g = 1}^{NG} α_{g} (g^{k})} \forall g_{p} \in (1 . . . . . . . . g_{NG})$ (20)

Global Q_{g
_p} power equivalent to each unit g_p is specified by:

$\begin{matrix} Q_{g_{p}} (g_{p}, U_{g_{p}}) \\ = \frac{\sum_{g = 1}^{NG} α_{g} (g^{k}) q_{g} (g_{p}, g, u_{g_{p}^{†}}^{g})}{\sum_{g = 1}^{NG} α_{g} (g^{k})} \forall g_{p} \in (1 . . . . . g_{NG}) \end{matrix}$ (21)

Generating unit produces a power vector g^k+1 for corresponding optimal action as per (18). Next, we calculate target value V_{g
_p} (g^k+1) (for stage k + 1), corresponding to each generator as:

$\begin{matrix} V_{g_{p}} (g^{k + 1}) \\ = \frac{\sum_{g = 1}^{NG} α_{g} (g^{k + 1}) q_{g} (g_{p}, g, u_{g_{p}^{*}}^{g})}{\sum_{g = 1}^{NG} α_{g} (g^{k + 1})} \forall g_{p} \in (1 . . . . . . . . g_{NG}) \end{matrix}$ (22)

Finally, overall global cost for generator outputs g^k is calculated’ as:

$\begin{matrix} F_{t} = \sum_{g_{p} = 1}^{NG} (a_{g_{p}} U_{g_{p}}^{2} + b_{g_{p}} U_{g_{p}} + c_{g_{p}} \\ + | e_{g_{p}} sin {f_{g_{p}} (U_{{g_{p}}_{, min}} - U_{g_{p}})} | \\ + λ \times (P_{demand} - \sum_{g_{p} = 1}^{NG} U_{g_{p}}) \end{matrix}$ (23)

where λ = Penalty factor(10000 $/MW, in our case).

Next, temporal difference (TD) is calculated for each unit as: $\begin{matrix} {TD}_{g_{p}} = COST + γ V_{g_{p}} (g^{K + 1}) \\ - Q_{g_{p}} (g_{p}, u_{g_{p}}) \forall g_{p} \in NG \end{matrix}$ (24)

Finally updated q_g for all unit’s is obtained as:

$\begin{matrix} q_{g} (g_{p}, g, u_{g_{p}^{†}}^{g}) \leftarrow q_{g} (g_{p}, g, u_{g_{p}^{†}}^{g}) \\ + \frac{η {TD}_{g_{p}} α_{g} (g^{k})}{\sum α_{g} (g^{k})} \forall g_{p} \in NG \end{matrix}$ (25)

Therefore, we get updated q_g for discovering optimal power of unit g^k.

5 Test system results and analysis

For evaluating the efficiency of proposed MAFRL, two standard test systems incorporating valve point effect along with multiple fuel systems have been solved. We simulate the MAFRL for ETPD problem solution using MATLAB 7.12.0 (R2011a) on 3.50 GHz INTEL Core-i5 processor. Comparison has been made with various other contemporary methods for the following test cases:

10-units generator system with multiple fuel system and valve point loading effect for a power demand of 2700 MW.

40-units generator system with multiple fuel system and valve point loading effect for a power demand of 10800 MW.

5.1 Flowchart for MAFRL technique for ETPD problem

5.2 Test case-1

In Test system 1, we consider a 10-unit generator system having multiple fuel selection options. Generator fuel coefficients have been taken from [34] and are provided in Table 1 and load demand is considered as 2700 MW. Table 2 shows type of fuel, optimal power scheduling and fuel cost by the proposed MAFRL technique and we find that our method is superior to other techniques. Proposed approach is able to satisfy all system constraints and provides feasible results. In Table 3, we show comparison of MAFRL with various optimization techniques listed in literature. The result obtained with MAFRL has least operational cost when compared to various techniques described in literature. A fuzzy set laid over the generating range of 10-generator system is shown in Fig. 2 and Convergence characteristics of MAFRL approach for a 10-generator system is presented in Fig. 3.

Table 1
Fuel coefficients for 10-generator system with MFO and VPE

Unit no. Fuel type a _g b _g c _g e _g f _g

1 1 0.002176 –0.3975 26.97 0.02697 –3.9750

1 2 0.001861 –0.3059 21.13 0.02113 –3.0590

2 1 0.004194 –1.2690 118.4 0.11840 –12.690

2 2 0.001138 –0.0399 1.865 0.00187 –0.3988

2 3 0.001620 –0.1980 13.65 0.01365 –1.9800

3 1 0.001457 –0.3116 39.79 0.03979 –3.1160

3 2 0.0001176 0.4864 –59.14 –0.05914 4.8640

3 3 0.0008035 0.0339 –2.876 –0.00288 0.3389

4 1 0.001049 –0.0311 1.983 0.00198 –0.3114

4 2 0.002758 –0.6348 52.85 0.05285 –6.3480

4 3 0.005935 –2.3380 266.8 0.26680 –23.380

5 1 0.001066 –0.0873 13.92 0.01392 –0.8733

5 2 0.001597 –0.5206 99.76 0.09976 –5.2060

5 3 0.0001498 0.4462 –53.99 –0.05399 4.4620

6 1 0.002758 –0.6348 52.15 0.05285 –6.3480

6 2 0.001049 –0.0311 1.983 0.00198 –0.3114

6 3 0.005935 –2.3380 266.6 0.26680 –23.380

7 1 0.001107 –0.1325 18.93 0.01893 –1.3250

7 2 0.001165 –0.2267 43.77 0.04377 –2.2670

7 3 0.0002454 0.3559 43.55 –0.04335 3.5590

8 1 0.001049 –0.0311 1.983 0.00198 –0.3114

8 2 0.002758 –0.6348 52.85 0.05285 –6.3480

8 3 0.005935 –2.3380 266.8 0.266680 –23.380

9 1 0.001554 –0.5675 88.53 0.08853 –5.6750

9 2 0.007033 –0.0451 15.32 0.01423 –0.1817

9 3 0.0006121 –0.0182 14.23 0.01423 –0.1817

10 1 0.0011102 –0.0994 13.97 0.01397 –0.9938

10 2 0.00004164 0.5084 –61.13 –0.06113 5.0840

10 3 0.001137 –0.2024 46.71 0.04671 –2.0240

Unit no.	Fuel type	a _g	b _g	c _g	e _g	f _g
1	1	0.002176	–0.3975	26.97	0.02697	–3.9750
1	2	0.001861	–0.3059	21.13	0.02113	–3.0590
2	1	0.004194	–1.2690	118.4	0.11840	–12.690
2	2	0.001138	–0.0399	1.865	0.00187	–0.3988
2	3	0.001620	–0.1980	13.65	0.01365	–1.9800
3	1	0.001457	–0.3116	39.79	0.03979	–3.1160
3	2	0.0001176	0.4864	–59.14	–0.05914	4.8640
3	3	0.0008035	0.0339	–2.876	–0.00288	0.3389
4	1	0.001049	–0.0311	1.983	0.00198	–0.3114
4	2	0.002758	–0.6348	52.85	0.05285	–6.3480
4	3	0.005935	–2.3380	266.8	0.26680	–23.380
5	1	0.001066	–0.0873	13.92	0.01392	–0.8733
5	2	0.001597	–0.5206	99.76	0.09976	–5.2060
5	3	0.0001498	0.4462	–53.99	–0.05399	4.4620
6	1	0.002758	–0.6348	52.15	0.05285	–6.3480
6	2	0.001049	–0.0311	1.983	0.00198	–0.3114
6	3	0.005935	–2.3380	266.6	0.26680	–23.380
7	1	0.001107	–0.1325	18.93	0.01893	–1.3250
7	2	0.001165	–0.2267	43.77	0.04377	–2.2670
7	3	0.0002454	0.3559	43.55	–0.04335	3.5590
8	1	0.001049	–0.0311	1.983	0.00198	–0.3114
8	2	0.002758	–0.6348	52.85	0.05285	–6.3480
8	3	0.005935	–2.3380	266.8	0.266680	–23.380
9	1	0.001554	–0.5675	88.53	0.08853	–5.6750
9	2	0.007033	–0.0451	15.32	0.01423	–0.1817
9	3	0.0006121	–0.0182	14.23	0.01423	–0.1817
10	1	0.0011102	–0.0994	13.97	0.01397	–0.9938
10	2	0.00004164	0.5084	–61.13	–0.06113	5.0840
10	3	0.001137	–0.2024	46.71	0.04671	–2.0240

Table 2

Best power generation for 10-generator system with power demand of 2700 MW

Generator no.	U_{g_p,min} (MW)	U_{g_p,max} (MW)	Optimal Power generation	Fuel type
1	100	250	218.1050	2
2	50	230	211.9070	1
3	200	500	280.6570	1
4	99	265	239.2830	3
5	190	490	279.9347	1
6	85	265	239.6607	3
7	200	500	287.7273	1
8	99	265	239.6864	3
9	130	440	427.0351	3
10	200	490	276.0038	1
Total operating cost ($/hr)			623.8191

Table 3

Comparison of numerical simulation using various techniques for 10-generator (power demand 2700 MW)

Key	Algorithm name	Reference	Operating cost ($/hr)
KHA	Krill herd algorithm	[20]	605.7582,(628.30260)^a
BBO	Biogeography based optimization	[35]	605.6387,(628.74880)^a
DE/BBO	Hybrid differential evolution integrated BBO	[13]	605.6230,(725.07140)^a
IGA-MU	Integrated genetic algorithm	[27]	624.5178
RCGA	Real coded genetic algorithm	[36]	623.83,(624.52)^a
SBOA	Seeker based optimization algorithm	[18]	598.16,(686.273)^a
GWO	Grey wolf optimization algorithm	[21]	605.6263,(628.8205)^a
MAFRL	Multiagent Fuzzy Reinforcement learning		623.8191

^aActual generation cost calculated with given power.

Fig.1

MAFRL architecture for economic load dispatch problem.

Fig.2

Fuzzy sets through power range of 10-generator system.

Fig.3

Convergence characteristics for 10-generator system.

5.3 Test case-2

In Test system 2, we consider a 40-unit generator having multiple fuel options. Load demand is taken as 10800 MW. Adding of number of units in the ETPD problem require powerful global search ability to discover optimal power output and to overcome problem of premature convergence. The best optimal power generation schedule, type of fuel and fuel cost calculated by the MAFRL technique is given in Table 4. Our proposed MAFRL has ability to fulfill equality and inequality constraints and provides feasible solution of this large power system of 40 units having multiple fuel systems. These results confirm that MAFRL is powerful in exploiting local optimum search. In Table 5 we compare MAFRL with various other techniques listed in literature wherein we see that MAFRL achieves least cost of production. A convergence characteristic of the MAFRL approach (40-generator system) is presented in Fig. 4.

Table 4
Best power generation for 40-generator system with power demand of 10800 MW

Unit no Power generation Fuel type Unit no Power generation Fuel type Unit no Power generation Fuel type Unit no Power generation Fuel type

1 219.1296 2 2 211.1619 1 3 278.6420 1 4 239.2829 3

5 280.3452 1 6 238.9891 3 7 289.7162 1 8 240.7609 3

9 428.9146 3 10 276.0060 1 11 218.1289 2 12 212.1551 1

13 278.7852 1 14 239.6860 3 15 280.6090 1 16 238.3181 3

17 288.5156 1 18 238.4754 3 19 430.3806 3 20 273.4746 1

21 217.6981 2 22 212.6620 1 23 281.7161 1 24 240.2229 3

25 282.5441 1 26 239.1230 3 27 288.5895 1 28 238.7590 3

29 429.9306 3 30 272.7525 1 31 219.1302 2 32 211.6623 1

33 280.5679 1 34 240.0905 3 35 275.0805 1 36 239.1228 3

37 285.8837 1 38 241.0300 3 39 427.7581 3 40 274.1845 1

Total power generation: 10800 MW Total fuel cost: 2495.6942 $/h

Unit no	Power generation	Fuel type	Unit no	Power generation	Fuel type	Unit no	Power generation	Fuel type	Unit no	Power generation	Fuel type
1	219.1296	2	2	211.1619	1	3	278.6420	1	4	239.2829	3
5	280.3452	1	6	238.9891	3	7	289.7162	1	8	240.7609	3
9	428.9146	3	10	276.0060	1	11	218.1289	2	12	212.1551	1
13	278.7852	1	14	239.6860	3	15	280.6090	1	16	238.3181	3
17	288.5156	1	18	238.4754	3	19	430.3806	3	20	273.4746	1
21	217.6981	2	22	212.6620	1	23	281.7161	1	24	240.2229	3
25	282.5441	1	26	239.1230	3	27	288.5895	1	28	238.7590	3
29	429.9306	3	30	272.7525	1	31	219.1302	2	32	211.6623	1
33	280.5679	1	34	240.0905	3	35	275.0805	1	36	239.1228	3
37	285.8837	1	38	241.0300	3	39	427.7581	3	40	274.1845	1
Total power generation: 10800 MW	Total fuel cost: 2495.6942 $/h

Table 5

Comparison of results of various methods for 40-generator system with power demand of 10800 MW

Key	Algorithm name	Reference	Operating cost ($/hr)
CSA	Cuckoo search algorithm	[22]	2495.9664
CGA-MU	Hybrid real coded genetic algorithm	[24]	2500.9220
IGA-MU	Integrated genetic algorithm	[24]	2499.8243
ORCSA	One rank cuckoo search algorithm	[23]	2495.9573
MAFRL	Multiagent Fuzzy Reinforcement learning		2495.6942

Fig.4

Convergence characteristics for 40-generator system with power demand of 10800 MW.

6 Conclusion

In this research, we applied a fuzzy reinforcement learning framework for solving economic thermal power dispatch. In this technique we analyze all thermal generating units as a participant in a reinforcement learning setup for optimizing optimal power task and to satisfy various operational constraints. To evaluate the efficiency of our Proposed MAFRL approach, we employed it on 10 and 40 generator test systems integrated with valve point effect and multiple fuel systems. Numerical simulations indicate accuracy of solution and mature convergence ability of the proposed MAFRL approach. Overall, MAFRL emerges as a superior and alternativesolution approach to solving economic thermal power dispatch problem.

References

Dodu

J.C.

, Martin

, Merlin

, Pouget

An optimal formulation and solution of short-range operating problems for a power system with flow constraints, Proc IEEE60 (1972), 54–63doi: 10.1109/PROC.1972.8557.

Liang

Z.-X.

, Glover

J.D.

A zoom feature for a dynamic programming solution to economic dispatch including transmission losses, IEEE Trans Power Syst7 (1992), 544–550doi: 10.1109/59.141757.

Parikh

, Chattopadhyay

A multi-area linear programming approach for analysis of economic operation of the Indian power system, IEEE Trans Power Syst11 (1996), 52–58doi: 10.1109/59.485985.

Fan

Ji-Yuan

and Zhang

Lan

, Real-time economic dispatch with line flow and emission constraints using quadratic programming, IEEE Trans Power Syst13 (1998), 320–325doi: 10.1109/59.667345.

El-Keib

A.A.

, Ma

, Hart

J.L.

Environmentally constrained economic dispatch using the LaGrangian relaxation method, IEEE Trans Power Syst9 (1994), 1723–1729doi: 10.1109/59.331423.

, Wang

, Mao

A hybrid genetic algorithm approach based on differential evolution for economic dispatch with valve-point effect, Int J Electr Power Energy Syst30 (2008), 31–38doi: 10.1016/j.ijepes.2007.06.023.

Amjady

, Nasiri-Rad

Solution of nonconvex and nonsmooth economic dispatch by a new Adaptive Real Coded Genetic Algorithm, Expert Syst Appl37 (2010), 5239–5245doi: 10.1016/j.eswa.2009.12.084.

Selvakumar

A.I.

, Thanushkodi

Anti-predatory particle swarm optimization: Solution to nonconvex economic dispatch problems, Electr Power Syst Res78 (2008), 2–10doi: 10.1016/j.epsr.2006.12.001.

Somasundaram

, Kuppusamy

Application of evolutionary programming to security constrained economic dispatch, Int J Electr Power Energy Syst27 (2005), 343–351doi: 10.1016/j.ijepes.2004.12.006.

10.

Pothiya

, Ngamroo

, Kongprawechnon

Ant colony optimisation for economic dispatch problem with non-smooth cost functions, Int J Electr Power Energy Syst32 (2010), 478–487doi: 10.1016/j.ijepes.2009.09.016.

11.

Roy

P.K.

, Ghoshal

S.P.

, Thakur

S.S.

Biogeography-based Optimization for Economic Load Dispatch Problems, Electr Power Components Syst38 (2009), 166–181doi: 10.1080/15325000903273379.

12.

Varadarajan

, Swarup

K.S.

Solving multi-objective optimal power flow using differential evolution, Gener Transm Distrib IET1 (2007), 324doi: 10.1049/iet-gtd.

13.

Bhattacharya

, Chattopadhyay

P.K.

Hybrid Differential Evolution With Biogeography-Based Optimization for Solution of Economic Load Dispatch, IEEE Trans Power Syst25 (2010), 1955–1964doi: 10.1109/TPWRS.2010.2043270.

14.

Tang

W.J.

, Li

M.S.

, Wu

Q.H.

, Saunders

J.R.

Bacterial Foraging Algorithm for Optimal Power Flow in Dynamic Environments, IEEE Trans Circuits Syst I Regul Pap55 (2008), 2433–2442doi: 10.1109/TCSI.2008.918131.

15.

Hota

P.K.

, Barisal

A.K.

, Chakrabarti

Economic emission load dispatch through fuzzy based bacterial foraging algorithm, Int J Electr Power Energy Syst32 (2010), 794–803doi: 10.1016/j.ijepes.2010.01.016.

16.

Moradi-Dalvand

, Mohammadi-Ivatloo

, Najafi

, Rabiee

Continuous quick group search optimizer for solving non-convex economic dispatch problems, Electr Power Syst Res93 (2012), 93–105doi: 10.1016/j.epsr.2012.07.009.

17.

Chaturvedi

K.T.

, Pandit

, Srivastava

Particle swarm optimization with crazy particles for nonconvex economic dispatch, Appl Soft Comput9 (2009), 962–969doi: 10.1016/j.asoc.2008.11.012.

18.

Shaw

, Mukherjee

, Ghoshal

S.P.

Seeker optimisation algorithm: Application to the solution of economic load dispatch problems, IET Gener Transm Distrib5 (2011), 81. doi:10.1049/iet-gtd.2010.0405

19.

Khamsawang

, Jiriwibhakorn

DSPSO–TSA for economic dispatch problem with nonsmooth and noncontinuous cost functions, Energy Convers Manag51 (2010), 365–375doi: 10.1016/j.enconman.2009.09.034.

20.

Mandal

, Roy

P.K.

, Mandal

Economic load dispatch using krill herd algorithm, Int J Electr Power Energy Syst57 (2014), doi: 10.1016/j.ijepes.2013.11.016.

21.

Pradhan

, Roy

P.K.

, Pal

Grey wolf optimization applied to economic load dispatch problems, Int J Electr Power Energy Syst83 (2016), 325–334doi: 10.1016/j.ijepes.2016.04.034.

22.

D.N.

, Schegner

, Ongsakul

Cuckoo search algorithm for non-convex economic dispatch, IET Gener Transm Distrib7 (2013), 645–654doi: 10.1049/iet-gtd.2012.0142.

23.

Nguyen

T.T.

, Vo

D.N.

The application of one rank cuckoo search algorithm for solving economic load dispatch problems, Appl Soft Comput37 (2015), 763–773doi: 10.1016/j.asoc.2015.09.010.

24.

Chiang

C.-L.

Improved Genetic Algorithm for Power Economic Dispatch of Units With Valve-Point Effects and Multiple Fuels, IEEE Trans Power Syst20 (2005), 1690–1699doi: 10.1109/TPWRS.2005.857924.

25.

Shanti Swarup

and Simi

P.V.

, Neural computation using discrete and continuous Hopfield networks for power system economic dispatch and unit commitment, Neurocomputing70 (2006), 119–129doi: 10.1016/j.neucom.2006.05.002.

26.

Meng

, Li

, Yin

An efficient crisscross optimization solution to large-scale non-convex economic load dispatch with multiple fuel types and valve-point effects, Energy113 (2016), 1147–1161doi: 10.1016/j.energy.2016.07.138.

27.

Barisal

A.K.

Dynamic search space squeezing strategy based intelligent algorithm solutions to economic dispatch with multiple fuels, Int J Electr Power Energy Syst45 (2013), 50–59doi: 10.1016/j.ijepes.2012.08.049.

28.

J.J.Q.

, Li

V.O.K.

A social spider algorithm for solving the non-convex economic load dispatch problem, Neurocomputing171 (2016), 955–965doi: 10.1016/j.neucom.2015.07.037.

29.

Wiering

, van Otterlo

Reinforcement learning: State-of-the-art, Springer2012.

30.

Buoniu

, Babušska

, De Schutter

and Ernst

, Reinforcement learning and dynamic programming using function approximators, (2010).

31.

Navin

N.K.

, Sharma

A fuzzy reinforcement learning approach to thermal unit commitment problem, Neural Comput Appl (2017) doi: 10.1007/s00521-017-3106-5.

32.

Roy

P.K.

, Bhui

, Paul

Solution of economic load dispatch using hybrid chemical reaction optimization approach, Appl Soft Comput24 (2014), 109–125doi: 10.1016/j.asoc.2014.07.013.

33.

Srinivasa Reddy

and Vaisakh

, Shuffled differential evolution for economic dispatch with valve point loading effects, Int J Electr Power Energy Syst46 (2013), 342–352doi: 10.1016/j.ijepes.2012.10.012.

34.

Sinha

, Chakrabarti

, Chattopadhyay

P.K.

Evolutionary programming techniques for economic load dispatch, IEEE Trans Evol Comput7 (2003), 83–94doi: 10.1109/TEVC.2002.806788.

35.

Bhattacharya

, Chattopadhyay

P.K.

Solution of Economic Power Dispatch Problems Using Oppositional Biogeography-based Optimization, Electr Power Components Syst38 (2010), 1139–1160doi: 10.1080/15325001003652934.

36.

Sayah

, Hamouda

A hybrid differential evolution algorithm based on particle swarm optimization for nonconvex economic dispatch problems, Appl Soft Comput13 (2013), 1608–1619doi: 10.1016/j.asoc.2012.12.014.