Adaptive action-prediction cortical learning algorithm under uncertain environments

Abstract

The cortical learning algorithm (CLA) is a time series prediction algorithm. Memory elements called columns and cells discretely represent data with their state combinations, whereas linking elements called synapses change their state combinations. For tasks requiring to take actions, the action-prediction CLA (ACLA) has an advantage to complement missing state values with their predictions. However, an increase in the number of missing state values (i) generates excess synapses negatively affect the action predictions and (ii) decreases the stability of data representation and makes the output of action values difficult. This paper proposes an adaptive ACLA using (i) adaptive synapse adjustment and (ii) adaptive action-separated decoding in an uncertain environment, missing multiple input state values probabilistically. (i) The proposed adaptive synapse adjustment suppresses unnecessary synapses. (ii) The proposed adaptive action-separated decoding adaptively outputs an action prediction separately for each action value. Experimental results using uncertain two- and three-dimensional mountain car tasks show that the proposed adaptive ACLA achieves a more robust action prediction performance than the conventional ACLA, DDPG, and the three LSTM-assisted reinforcement learning algorithms of DDPG, TD3, and SAC, even though the number of missing state values and their frequencies increase. These results implicate that the proposed adaptive ACLA is a way to making decisions for the future, even in cases where information surrounding the situation partially lacked.

Keywords

Cortical learning algorithm action prediction uncertain environment adaptive synapse adjustment adaptive action-separated decoding

1. Introduction

The time-series data prediction is one of the important information technologies attracting much attention, especially in the computational intelligence domain, to make valuable decision makings for the future. The cortical learning algorithm (CLA) [1, 2, 3] is a time-series prediction algorithm based on hierarchical temporal memory (HTM) [4, 5], which is inspired by the structure and behavior of the human neocortex. The CLA predictor is composed of two memory elements, column and cell, and one linking element, synapse. The algorithm uses discrete data representations inside the predictor. The input value at each time step is represented by a column combination. In a time-series context, this input value is represented by a cell combination; a predictive input value at the next time step is represented by another cell combination. CLA has been reported to achieve higher prediction accuracy than long short-term memory (LSTM) [6], a variant of the recurrent neural network, on a taxi-demand prediction task [7] and an electricity load prediction task [8].

The action-prediction CLA (ACLA) [9] is an extension of CLA for action prediction tasks. ACLA has two CLA predictors: the state predictor and action predictor. The state predictor receives the state values from the environment at each time step, and predicts the state values at the next time step. The action predictor receives (self-recognizes) the action values made at each time step and predicts the action values to be made at the next time step. The state and action predictors are connected by synapses, and the action values to be obtained at the next time step are determined by both the state and action predictors. Even in an uncertain environment where the state values are missing probabilistically, ACLA predicts the action values because the missing state values can be complemented by their predictions in the state predictor. ACLA has been reported to achieve better performance than deep deterministic policy gradient (DDPG) [10] on a two-dimensional mountain car task with probabilistically uncertain missing state values [9]. Real-world tasks often require action, even in uncertain environments with missing state values. For these situations, action decision methodology including not only ACLA but also reinforcement learning algorithms including DDPG, TD3 (twin delayed DDPG), and SAC (soft actor-critic) combined with LSTM that deals with uncertainty due to missing state values with their predictions have been proposed so far [11, 12, 13, 14, 15].

This study focused on two issues in conventional ACLA. (i) Synaptic adjustment: Conventional ACLA repeats the addition of synapses to establish successful predictions of predictors. However, in an uncertain environment, missing state values generate excess synapses that negatively affect the appropriate action predictions. (ii) Action output: Conventional ACLA uses a fixed threshold to extract action prediction values from the action predictor. However, the uncertainty of missing state values decreases the stability of data representation in the action predictor, and the fixed threshold is not always appropriate for extracting action prediction values. In addition, conventional ACLA cannot effectively treat multiple action values. Because the conventional ACLA processes multiple actions simultaneously, a case in which actions cannot be extracted partially arises when the data representation in the action predictor is biased owing to uncertainty.

To address the above two issues and improve the action prediction accuracy of ACLA in an uncertain environment with missing multiple state values, in this work, we propose an adaptive ACLA introducing an adaptive synapse adjustment (ASA) [17] and an adaptive action-separated decoder (AAD). To suppress excess synapses that negatively affect action prediction in uncertain environments, ASA assesses the partial prediction accuracy of each synapse set and adjusts the number of synapses based on the partial prediction accuracy. To output appropriate action values even in uncertain environments, AAD determines each action value separately and adaptively in the action predictor. For the experiments, we used uncertain two-dimensional mountain car tasks providing two state values probabilistically and requiring one action value, and uncertain three-dimensional mountain car tasks providing four state values probabilistically and requiring two action values. We compared the proposed adaptive ACLA using ASA and AAD with conventional ACLA [9], DDPG [10], LSTM-DDPG [14], LSTM-TD3, and LSTM-SAC [15] including an LSTM assist that deals with uncertainty due to missing state values.

This paper is an extended version of our previous work [18] presented at the NaBIC 2022 conference for a special issue. The differences between this study and the previous one are as follows. (1) The previous study used uncertain two-dimensional mountain car tasks with two state values and one action value. This study used three-dimensional mountain car tasks with four state values and two action values. (2) For tasks requiring multiple action values, this work generalizes the action decoder to treat multiple action values, whereas the previous paper treated a single action value. (3) For performance comparison, the previous study employed conventional DDPG [10] and LSTM-DDPG [14]. In addition, this study employs conventional LSTM-TD3 [15] and LSTM-SAC [15].

2. Cortical learning algorithm

2.1 Predictor

Figure 1.

CLA predictor.

Figure 1 shows the CLA predictor [3]. At each time step $t$ , the predictor takes an input bitstring ${\bm{x}}=(x_{1},x_{2},\dots,x_{n_{x}})\in\{0,1\}^{n_{x}}$ converted from an input real value $X(t)$ . The predictor has $n_{c}$ columns, $c_{1},c_{2},\dots,c_{n_{c}}$ . Each column transitions between two states, normal and active. Each column $c_{i}$ contains $n_{r}$ cells $r_{i,1},r_{i,2},\dots,r_{i,n_{r}}$ . Each cell transitions between three states: normal, active, and predictive. Figure 1 shows an example with $n_{x}=11$ bits, $n_{c}=5$ columns, and $n_{r}=4$ cells. Each column synapse $y$ links a column and bit. Each cell synapse $y$ links one cell to another. Each synapse, $y$ has a permanence value $p=[0,1]$ . For a given connection threshold $\theta_{p}$ , synapse $y$ with $p\geqslant\theta_{p}$ is connected and that with $p<\theta_{p}$ is disconnected. Figure 1 shows connected synapses as solid lines and disconnected synapses as dashed lines. A synapse set is called segment $Y$ . Each segment $Y$ transitions between two states: normal and active. The state of the column synapse segment affects the column state. The state of the cell synapse affects the state of the cell.

2.2 Algorithm

Figure 2 shows an example of the CLA procedure. At each time step $t$ , the CLA receives a real value ${X}(t)$ and predicts a real value $\tilde{X}(t+1)$ incoming at the next time step $t+1$ using the following four procedures [3].

Figure 2.

CLA Procedure.

(1) Encode converts a real input value $X(t)$ at time step $t$ into a bitstring, ${\bm{x}}=(x_{1},x_{2},\dots,x_{n_{x}})\in\{0,1\}^{n_{x}}$ . In this study, chunked encoding [8] converts a real value $X(t)=[X^{\text{min}},X^{\text{max}}]$ into a bitstring $\vec{x}$ with continuous $\omega$ chunked bits of one. Each data bit $x_{i}$ ( $i=1,2,\dots,n_{n_{x}}$ ) is given by

$\displaystyle x_{i}=\begin{cases}1,&\text{if }h(X(t))<i\leqslant h(X(t))+% \omega,\\ 0,&\text{otherwise,}\end{cases}(i=1,2,\dots,n_{x}),$ (1)

where $h(X(t))=0.5+\frac{(X(t)-X^{\text{min}})\cdot(n_{x}-\omega)}{X^{\text{max}}-X^{% \text{min}}}$ . Figure 2a shows an example with $\omega=3$ chunked bits of one.

(2) Spatial Pooling represents the input bitstring ${\bm{x}}$ as a combination of columns in the active state and attempts to establish it in the predictor by column synapses.

First, we create a set of columns in the active state. For each column $c_{i}$ , we counted the number of connected column synapses linked to bits with one as the overlap value $c^{ol}_{i}$ . Subsequently, the top $\theta_{c}$ columns the active state in descending order of $c^{ol}_{i}$ ( $i=1,2,\ldots,n_{c}$ ). Figure 2a considers the number of active columns $\theta_{c}=2$ and shows that two columns $c_{1}$ and $c_{2}$ transit to the active state. Specifically, column $c_{1}$ has $c^{ol}_{1}=2$ connected synapses linked to $x_{2}=1$ and $x_{3}=1$ ; column $c_{2}$ also has $c^{ol}_{2}=2$ connected synapses linked to $x_{3}=1$ and $x_{4}=1$ ; and the other columns $c_{i}$ ( $i=3,4,5$ ) have $c^{ol}_{i}=0$ ( $i=3,4,5$ ) synapses. The active-column combination represents the input bitstring ${\bm{x}}$ inside the predictor.

Next, we attempted to establish an active column combination by adjusting column synapses. For each column synapse of the active column, we increase its permanence value $p$ by $\Delta p^{+}_{c}$ if it is linked to a bit of one, and we decrease its permanence value $p$ by $\Delta p^{-}_{c}$ if it is linked to a bit of zero. Figure 2a shows the permanence values of the column synapses of the active columns $c_{1}$ and $c_{2}$ linked to bits $x_{2}=x_{3}=x_{4}=1$ are increased by $\Delta p^{+}_{c}$ , and ones linked to bits $x_{1}=x_{5}=0$ are decreased by $\Delta p^{-}_{c}$ .

(3) Temporal Pooling represents the input bitstring ${\bm{x}}$ at time step $t$ in a time-series context as a combination of cells in the active state and attempts to establish it in the predictor by adjusting cell synapses. In addition, a combination of cells in the predictive state is used for the prediction.

First, we create a set of cells in the active state. Each active column formed at least one cell in the active state. Each of the active columns, including the predictive cells, makes one of them active. Each active column, not including predictive cells, places all the cells in the active state. In Fig. 2a, two cells $r_{1,4}$ and $r_{2,1}$ transition to the active state because they were previously in the predictive state and are currently involved in active columns $c_{1}$ and $c_{2}$ , respectively. The active cell combination represents the input bitstring ${\bm{x}}$ at time step $t$ in the current time series context.

Next, we created a set of cells in a predictive state. For each cell synapse segment $Y$ , we counted the number of connected synapses linked to the active cells as the overlap value $Y^{ol}$ . For a given active threshold $\theta_{Y}$ , each segment $Y$ transitions to the active state if $Y^{ol}\geqslant\theta_{Y}$ . The state of the cell synapse affects the state of the cell. Each cell with at least one active segment, $Y$ transitions to the predictive state. The predictive cell combination represents the prediction at time step $t+1$ in the time-series context. Figure 2a considers active threshold $\theta_{Y}=1$ and shows cells $r_{4,3}$ and $r_{5,1}$ transit to the predictive state since synapse segment $Y$ on $r_{4,3}$ has $Y^{ol}=2$ $(\geqslant\theta_{Y})$ and synapse segment $Y$ on $r_{5,1}$ also has $Y^{ol}=1$ $(\geqslant\theta_{Y})$ .

For each predictive cell at time step $t$ , if it becomes an active cell at the next time step $t+1$ , the prediction succeeds; if it does not become an active cell at the next time step $t+1$ , the prediction fails. For each successful cell in the prediction, we increased the permanence values of the cell synapses connected to the active cells in segment $Y$ , which made the cell the predictive state by $\Delta p^{+}_{r}$ and decreased the other ones by $\Delta p^{-}_{r}$ . For each failed cell in the prediction, we decreased the permanence values of the cell synapses in segment $Y$ which made the cell the predictive state by $\Delta p^{p}$ . In Fig. 2b, we focus on the segment of cell $r_{4,3}$ that succeeds in the prediction. We observe the permanence value of a cell synapse connected to the active cell $r_{1,4}$ which makes cell $r_{4,3}$ the predictive state increased by $\Delta p^{+}_{r}$ , and that connected to the normal (not active) cell $r_{2,2}$ decreased by $\Delta p^{-}_{r}$ . In the case of the segment of cell $r_{5,1}$ that failed in the prediction, we see that the permanence value of a cell synapse connected to the active cell $r_{2,1}$ that makes cell $r_{5,1}$ the predictive state is decreased by $\Delta p^{p}$ .

To further establish the successful predictive cell combination in the predictor, we added cell synapses between a segment $Y$ of successful predictive cells and randomly selected $n_{\text{add}}=\max\{n_{\text{max}}-n_{\text{cur}},0\}$ cells in the active state, where $n_{\text{max}}$ is the given maximum number of synapse additions and $n_{\text{cur}}$ is the current number of connected cell synapses linked to active cells. In Fig. 2b, we observe that segment $Y$ of the successful predictive cell $r_{4,3}$ adds a new synapse linked to cell $r_{2,1}$ in the active state.

(4) Decode converts the predictive cell combination in the predictor into a predictive value, $\tilde{X}(t+1)$ , which is a real value. This study used a column-based decoder [19].

For each data bit $x_{i}$ ( $i=1,2,\ldots,n_{x}$ ), we counted the number of predictive cells in columns linked with the connected column synapses as $d_{i}$ ( $i=1,2,\ldots,n_{x}$ ). Then, we determine the continuous $\omega$ bit indices $j,j+1,\ldots,j+\omega-1$ that maximize $\sum_{k=j}^{j+\omega-1}{d_{k}}$ . As a result, we obtain the predictive bitstring $\tilde{{\bm{x}}}$ with continuous $\omega$ chunked bits $\tilde{x_{k}}=1$ ( $k=j,j+1,\ldots,j+\omega-1$ ) and other bits of zero. The predictive bitstring $\tilde{\bm{x}}$ can be inversely converted to a predictive value $\tilde{X}(t+1)$ based on the position of the continuous $\omega$ chunked bits using the chunked encoding described previously. In Fig. 2 c, the 7, 10, and 11th bits have $d_{7}=d_{10}=d_{11}=1$ predictive cells, and the 9th bit has $d_{9}=2$ predictive cells. The continuous 9, 10, and 11th bits maximize the sum of $d_{9}+d_{10}+d_{11}=4$ , and the predictive bitstring $\tilde{\bm{x}}$ with $x_{9}=x_{10}=x_{11}=1$ and other bits of zero can be obtained.

3. Action-prediction CLA (ACLA)

ACLA is a CLA extension for action prediction tasks. The time series prediction in CLA is utilized to predict the action to be made at the next time step and to complement missing state values owing to the uncertainty of their prediction values.

Figure 3.

ACLA Procedure.

3.1 Predictor

The ACLA predictor is composed of the state predictor and action predictor. ACLA receives $n_{\text{st}}$ state values $s_{1}(t),s_{2}(t),\ldots,s_{n_{\text{st}}}(t)$ from the environment and outputs $n_{\text{act}}$ action prediction values $\tilde{a}_{1}(t),\tilde{a}_{2}(t),\ldots,\tilde{a}_{n_{\text{act}}}(t)$ at each time step $t$ . Figure 3a shows an ACLA predictor, which receives $n_{\text{st}}=2$ state values from the environment and outputs $n_{\text{act}}=2$ action values to the environment.

3.2 Algorithm

For each time step $t$ , the state predictor receives $n_{\text{st}}$ bitstrings of ${\bm{x}}$ converted from the input state values $s_{i}(t)$ ( $i=1,2,\ldots,n_{\text{st}}$ ) and creates a set of columns and a set of cells in the active state in the same manner as in the conventional CLA. In Fig. 3a, cells $r_{1,1}$ , $r_{2,2}$ , $r_{5,3}$ , and $r_{6,1}$ in the state predictor transition to the active state. In addition, cells $r_{3,3}$ and $r_{5,2}$ in the action predictor transition to the predictive state.

For action decoding, the conventional ACLA uses predictive cells in the action predictor. Note that the action predictor treats multiple action values simultaneously. The conventional ACLA uses predictive cells distributed in the entire action predictor for action decoding without considering multiple actions separately. To obtain the $i$ -th action value element $\tilde{a}_{i}(t)$ , we select a column subset $C_{i}$ associated with the data bits of action value $a_{i}(t)$ . The column-based decoder uses the predictive cells in the column subset $C_{i}$ and obtains the action prediction value $\tilde{a}_{i}(t)$ . In Fig. 3a, for the action prediction value $\tilde{a}_{1}(t)$ , the conventional ACLA selects column subset $C_{1}=\{c_{1},c_{2},c_{3}\}$ associated with data bits of action value $a_{1}(t)$ , and the column-based decoder uses predictive cell $r_{3,3}$ in $C_{1}$ to obtain the action prediction value $\tilde{a}_{1}(t)$ . In addition, for the action prediction value $\tilde{a}_{2}(t)$ , the conventional ACLA selects column subset $C_{2}=\{c_{4},c_{5},c_{6}\}$ associated with the data bits of action value $a_{2}(t)$ , and the column-based decoder uses predictive cell $r_{5,2}$ in $C_{2}$ to obtain the action prediction value $\tilde{a}_{2}(t)$ .

ACLA needs training based on imitation learning, which requires expert action. The learning process updates the permanence values of the cell synapses such that each action prediction value $\tilde{a}_{i}(t)$ approaches the corresponding expert action value $a_{i}(t)$ ( $i=1,2,\ldots,n_{\text{act}}$ ). ACLA makes a set of columns in the action predictor the active state based on expert action values $a_{i}(t)$ ( $i=1,2,\ldots,n_{\text{act}}$ ) at time step $t$ . Figure 3b shows columns $c_{3}$ and $c_{6}$ in the action predictor transit to the active state. ACLA then updates the permanence values of the cell synapses and adds new cell synapses in the action predictor, as with the conventional CLA. Figure 3b shows two predictive cells $r_{3,3}$ and $r_{5,2}$ in the action predictor. Column $c_{3}$ , which includes the predictive cell $r_{3,3}$ becomes the active state. Then, the permanence values of cell synapses from the active cells $r_{1,1}$ , $r_{2,2}$ , $r_{5,3}$ in the state predictor, and $r_{1,3}$ in the action predictor at time step $t-1$ are increased by $\Delta p_{r}^{+}$ . In addition, two segments of the predictive cell $r_{3,3}$ add two synapses to active cell $r_{1,1}$ at time step $t$ . On the other hand, column $c_{5}$ including the predictive cell $r_{5,2}$ in the action predictor, becomes the normal (non-active) state. Then, the permanence values of cell synapses from active cells $r_{5,3}$ and $r_{6,1}$ in the state predictor at time step $t$ are decreased by $\Delta p^{p}$ .

When the state values $s_{i}(t)$ ( $i=1,2,\ldots,n_{\text{st}}$ ) are missing because of uncertainty, ACLA are complemented with the predicted state value $\tilde{s_{i}}(t)$ .

3.3 Issue focus

This study focused on two issues in conventional ACLA.

(i)
The cell synapse adjustment is the first issue. Excess cell synapses owing to the uncertainty of the missing state values make appropriate action prediction difficult. The conventional ACLA repeats the addition of cell synapses up to the maximum number $n_{\text{max}}$ at each time step $t$ . Because linked cells are randomly chosen among the active cells, excess cell synapses tend to increase. In uncertain environments, particularly those with multiple missing state values, excess cell synapses harm an appropriate action prediction.
(ii)
The second issue is action decoding. Conventional ACLA uses a fixed threshold $\theta_{Y}$ to extract the predictive cells considered in action decoding. However, because the uncertainty of missing state values decreases the stability of data representations in the action predictor, the fixed threshold $\theta_{Y}$ is not always appropriate for extracting the predictive cells considered in action decoding. The cells considered in action decoding should be adaptively determined. In addition, action decoding in conventional ACLA can treat multiple action variables, but their care is left to be desired. Action decoding in conventional ACLA is action-mixed decoding, which processes the entire column treating multiple actions simultaneously. Because the predictive cells considered in the action decoding may be biasedly distributed in the action predictor, a case in which actions cannot be extracted partially arises. The cells considered in the action decoding should be selected for each action value.

4. Proposed adaptive ACLA

To address the above two issues in the conventional ACLA particularly when multiple state values are missing due to uncertainty, in this work, we propose an adaptive ACLA using (i) an adaptive synapse arrangement (ASA) based on partial prediction accuracy to suppress excess cell synapses and (ii) an adaptive action-separated decoder (AAD) that robustly outputs each prediction action value separately.

Figure 4.

Adaptive synapse adjustment (ASA) in the proposed adaptive ACLA.

4.1 Adaptive synapse adjustment (ASA)

ASA assesses the partial prediction accuracy $\rho$ of each cell synapse segment $Y$ and adjusts the number of synapses in segment $Y$ based on its partial prediction accuracy $\rho$ . Figure 4 shows a conceptual figure of ASA.

Each cell synapse segment $Y$ has a partial prediction accuracy $\rho=[0,1]$ as shown by the yellow tag in Fig. 4. The higher the $\rho$ , the higher is the partial prediction accuracy. The partial prediction accuracy is set to $\rho=1$ initially and is updated after every segment $Y$ transits to the active state as

$\displaystyle\rho^{\prime}=\begin{cases}\frac{(\tau-1)\cdot\rho+\kappa}{\tau},% &\text{if}\;\tau\leqslant\theta_{\tau},\\ \frac{(\theta_{\tau}-1)\cdot\rho+\kappa}{\theta_{\tau}},&\text{otherwise},\end% {cases}$ (2)

where $\rho$ is the partial prediction accuracy before the update and $\rho^{\prime}$ is the accuracy after the update. $\tau$ is the number of times segment $Y$ transitions to the active state. We set $\kappa=1$ when segment $Y$ successfully predicted its cell. We set $\kappa=0$ when segment $Y$ made its cell-failure prediction. The threshold $\theta_{\tau}$ suppresses the decrease in the influence of $\kappa$ .

ASA adjusts the number of synapses to be added $n_{\text{add}}$ and the number of synapses to be deleted $n_{\text{del}}$ based on the partial prediction accuracy $\rho$ .

The synapse is added to each segment $Y$ of the cells that made successful predictions. In Fig. 4b, two segments of cell $r_{3,3}$ become the target because cell $r_{3,3}$ in the predictive state is involved in column $c_{3}$ in the active state. That is, the prediction of cell $r_{3,3}$ is successful. The number of synapses to be added $n_{\text{add}}$ is given by:

$\displaystyle n_{\text{add}}=\lfloor(1-\varphi\cdot\rho^{w})\cdot\max\{n_{% \text{max}}-n_{\text{cur}},0\}\rfloor,$ (3)

where $\varphi=\min\{|Y|/n_{\text{max}},1\}$ is the synapse-holding ratio in segment $Y$ , $|Y|$ is the number of synapses in segment $Y$ , $n_{\text{max}}$ is the maximum number of synapse additions, $n_{\text{cur}}$ is the number of synapses connected to cells in the active state, $w$ is the exponential weight parameter in the range $[0,1]$ . The smaller the synapse-holding ratio $\varphi$ , the larger the number of additional synapses $n_{\text{add}}$ . The lower the partial prediction accuracy $\rho$ , the larger is the number of additional synapses $n_{\text{add}}$ . The exponential weight parameter $w=[0,1]$ was employed to control the impact of the partial prediction accuracy $\rho=[0,1]$ on the number of additional synapses $n_{\text{add}}$ .

Synapse deletion is performed on each segment $Y$ of cells that make failure predictions. In Fig. 4b, the segment on cell $r_{5,2}$ is the target, because cell $r_{5,2}$ in the predictive state is involved in column $c_{5}$ in the normal state. In other words, the prediction of cell $r_{5,2}$ fails. The number of synapses to be deleted $n_{\text{del}}$ is given by

$\displaystyle n_{\text{del}}=\lfloor(1-\rho^{w})\cdot|Y|\rfloor.$ (4)

The lower the partial prediction accuracy $\rho$ , the larger is the number of deletion synapses $n_{\text{del}}$ . The exponential weight parameter $w=[0,1]$ was employed to control the impact of the partial prediction accuracy $\rho=[0,1]$ on the number of deletion synapses $n_{\text{del}}$ . In ASA, we delete the worst $n_{\text{del}}$ cell synapses on the permanence values $p$ in segment $Y$ .

4.2 Action-separated adaptive decoding (AAD)

Figure 5.

Adaptive action-separated decoding (AAD) in the proposed adaptive ACLA.

The proposed adaptive action-separated decoding (AAD) uses adaptive segment selection using relative segment ranking instead of segment selection using the fixed threshold $\theta_{Y}$ in the conventional ACLA. In addition, AAD uses action-separated decoding instead of action-mixed decoding in the conventional ACLA.

For action decoding, AAD does not simply use the predictive cells in the action predictor, because columns treating each action value may not have predictive cells. Figure 5 shows a conceptual figure of AAD. To obtain $i$ -th action prediction value $\tilde{a}_{i}(t)$ , we select a column subset $C_{i}$ associated with the data bits of action value $a_{i}(t)$ . For each segment $Y$ in the selected column subset $C_{i}$ , we counted the number of synapses connected to the active cells as $Y_{c}$ . We then select $n_{a}$ segments in order of the counted number $Y_{c}$ , and they are used for column-based decoding to obtain $i$ -th action element $\tilde{a}_{i}(t)$ . In this manner, cells for column-based decoding are obtained by the relative segment ranking in column subset $C_{i}$ . The above process is performed separately for each $i=1,2,\ldots,n_{\text{act}}$ and obtains the action vector $\tilde{\vec{a}}(t)=(\tilde{a}_{1}(t),\tilde{a}_{2}(t),\ldots,\tilde{a}_{n_{% \text{act}}}(t))$ .

Figure 5 considers $n_{a}=2$ action values, threshold $\theta_{Y}=2$ , and segment selection size $n_{a}=2$ . In the case of conventional ACLA, the action-mixed decoder treats all columns $c_{1},c_{2},\ldots,c_{n_{c}=6}$ in the action predictor simultaneously, even though they treat $n_{a}=2$ different action values. Cells $r_{3,3}$ and $r_{5,2}$ with green segments, including $Y_{c}=2$ synapses, are considered in the column decoder because $Y_{c}\geqslant\theta_{Y}$ . In the case of the proposed adaptive ACLA, AAD separately treats columns associated with each action value element. For 1st action prediction value $\tilde{a}_{1}(t)$ , we select the column subset $C_{1}=\{c_{1},c_{2},c_{3}\}$ associated with the data bits of action element $a_{1}(t)$ . For each segment $Y$ in the column subset $C_{1}=\{c_{1},c_{2},c_{3}\}$ , we count the number of synapses connected to the active cells as $Y_{c}$ . Cell $r_{3,3}$ has $n_{a}=2$ segments with $Y_{c}=\{2,1\}$ and is considered in the column-based decoder, whereas the segment with $Y_{c}=1$ is not considered in the conventional ACLA because $(Y_{c}=1)<(\theta_{Y}=2)$ . Next, for 2nd action element $\tilde{a}_{2}(t)$ , we select the column subset $C_{1}=\{c_{4},c_{5},c_{6}\}$ associated with the data bits of action element $a_{2}(t)$ . For each segment $Y$ in the column subset $C_{2}=\{c_{4},c_{5},c_{6}\}$ , we count the number of synapses connected to the active cells as $Y_{c}$ . Cell $r_{5,2}$ has a segment with $Y_{c}=2$ , and cell $r_{6,3}$ has a segment with $Y_{c}=1$ , and these two cells are considered in the column-based decoder, whereas cell $r_{6,3}$ with a segment of $Y_{c}=1$ is not considered in the conventional ACLA because $(Y_{c}=1)<(\theta_{Y}=2)$ .

5. Experimental setting

5.1 Benchmark tasks

Table 1
Mountain car task setting

		Variable	Description	Value range	Initial value	Missing probability
2D	State	$s_{1}$	Position on x-axis	$[-1.2,0.6]$	$[-0.4,-0.6]$	$P_{1}\in\{0.0,0.2,0.4,0.6,0.8,1.0\}$
		$s_{2}$	Velocity for x-axis	$[-0.07,0.07]$	0	$P_{2}\in\{0.0,0.2,0.4,0.6,0.8,1.0\}$
	Action	$a_{1}$	Pushing force for x-axis	$[-1,1]$	–	–
3D	State	$s_{1}$	Position on x-axis	$[-1.2,0.6]$	$[-0.4,-0.6]$	$P_{1}\in\{0.0,0.2,0.4,0.6,0.8,1.0\}$
		$s_{2}$	Velocity for x-axis	$[-0.07,0.07]$	0	$P_{2}\in\{0.0,0.2,0.4,0.6,0.8,1.0\}$
		$s_{3}$	Position on y-axis	$[-1.2,0.6]$	$[-0.4,-0.6]$	$P_{3}\in\{0.0,0.2,0.4,0.6,0.8,1.0\}$
		$s_{4}$	Velocity for y-axis	$[-0.07,0.07]$	0	$P_{4}\in\{0.0,0.2,0.4,0.6,0.8,1.0\}$
	Action	$a_{1}$	Pushing force for x-axis	$[-1,1]$	–	–
		$a_{2}$	Pushing force for y-axis	$[-1,1]$	–	–

Figure 6.

2D mountain car task.

Figure 7.

3D mountain car task.

We used two- and three-dimensional mountain car tasks under uncertainty with missing state values as benchmarks. Figure 6 illustrates these mountain car tasks. Table 1 shows specifications used in this work. We used the implementation of OpenAI Gym [20]. The task is for the car to reach the top of the mountain from the bottom of the valley by recognizing the position and velocity as a state and pushing the car as an action. The 2D mountain car task provides $n_{\text{st}}=2$ state values $s_{1}(t)$ and $s_{2}(t)$ and requires $n_{\text{act}}=1$ action value $a_{1}(t)$ for each time step $t$ . $a_{1}(t)<0$ is for the left side and $a_{1}(t)>0$ is for the right side of the x-axis. The 3D mountain car task [21] provides $n_{\text{st}}=4$ state values, $s_{1}(t),s_{2}(t),s_{3}(t)$ and $s_{4}(t)$ which require $n_{\text{act}}=2$ action values, $a_{1}(t)$ and $a_{2}(t)$ for each time step, $t$ . $a_{2}(t)<0$ is for the downside and $a_{2}(t)>0$ is for the upside on the y-axis. We refer to a single execution from the start to the end of one epoch. Each state value $s_{i}(t)$ is not provided with missing probability $P_{i}$ ( $i=1,2,\ldots,n_{\text{st}}$ ). The task success difficulty increases as $P_{i}$ ( $i=1,2,\ldots,n_{\text{st}}$ ) increases owing to uncertainty. Note that the start position and velocity are given, even if $P_{m}=1$ . For a single-task setting with a missing state probability combination of $P_{i}$ $(i=1,2,\ldots,n_{\text{st}})$ , we generated 50 instances with 50 different initial position state values that were randomly chosen in $[-0.4,-0.6]$ .

5.2 Algorithms

Table 2
Eight algorithms

Algorithms		Adaptive synapse adjustment (ASA)	Adaptive action- seperated decoding (AAD)
Conventional	DDPG [10]	–	–
	LSTM-DDPG [14]	–	–
	LSTM-TD3 [15]	–	–
	LSTM-SAC [15]	–	–
	ACLA [9]	–	–
Proposed	Adaptive ACLA (ASA)	$\surd$	–
	Adaptive ACLA (AAD)	–	$\surd$
	Adaptive ACLA (ASA $+$ AAD)	$\surd$	$\surd$

Table 3

Parameter values for ACLA-based algorithms

Name	Value
Number of columns for each variable $n_{c}$	2,048
Number of cells in each column $n_{r}$	32
Length of bitstrings for each variable $n_{x}$	421
Number of active columns for each variable $\theta_{c}$	40
Connection threshold of column synapses $\theta_{p}$	0.1
Increase in permanence value for column synapses $\Delta p_{c}^{+}$	0.1
Decrease in permanence value for column synapses $\Delta p_{c}^{-}$	0.00025225
Connection threshold of cell synapses $\theta_{p}$	0.3
Increase in permanence value for cell synapses $\Delta p_{r}^{+}$	0.1
Decrease in permanence values for cell synapses $\Delta p_{r}^{-}$ , $\Delta p^{p}$	{0.1, 0.005}
Maximum number of cell synapse additions $n_{\text{max}}$	50

Table 2 shows eight algorithms compared in this work. DDPG, TD3, and SAC are reinforcement learning algorithms that are combined with LSTM to address uncertainty due to missing state values.

To verify each contribution and the combined contribution of ASA and AAD, we used three adaptive ACLAs with only ASA, with only AAD, and with both ASA and AAD. These are denoted as the proposed adaptive ACLA (ASA), adaptive ACLA (AAD), and adaptive ACLA (ASA $+$ AAD).

5.3 Parameters

Table 3 shows parameter specifications for the conventional ACLAs. Table 4 shows parameter specifications for the proposed adaptive ACLA. As the expert actions for the training, we sampled 1,000 epochs from the results of a trained DDPG in the environment without missing state values, while excluding poor epochs taking 80 steps or more to reach the goal.

Table 4
Parameter values for the proposed adaptive ACLA

Name	Value
Threshold in ASA $\theta_{\tau}$	1,000
Maximum number of cell synapse additions $n_{\text{max}}$ in ASA	40
Number of synapse segment $n_{a}$ in AAD	50

5.4 Metric

Each algorithm was executed 31 times on each of the 50 task instances with different initial positions in a single-task setting with a missing-state probability combination of $P_{i}$ $(i=1,2,\ldots,n_{\text{st}})$ . For each task setting, we calculated the mean step number required to reach the goal as the performance metric. That is, for a single-task setting, we calculated the mean step number of 50 instances $\times$ 31 run results. The shorter the step number, the better was the performance.

6. Experimental results and discussion

6.1 2D mountain car problem

Figure 8.

Step numbers to reach the goal when exponential weight $w$ in ASA varies in the 2D mountain car task with missing probabilities $P_{1}=P_{2}=0.8$ .

Figure 8 shows the mean step numbers to reach the goal in the 2D mountain car task with missing probabilities $P_{1}=P_{2}=0.8$ when we vary the exponential weight parameter $w$ in the proposed ASA. Error bars indicate the standard deviation. The horizontal lines represent the results of the conventional algorithms without ASA for reference. The results show that the proposed ACLA (ASA) and ACLA (ASA $+$ AAD) with any $w$ achieve shorter step numbers than the others. This reveals that ASA decreases the step number in this task. We can see that $w$ has an impact on the number of steps. However, we can see that the adaptive ACLA (ASA $+$ AAD) is better than the adaptive ACLA (ASA) with any $w$ . This reveals that ASA works to decrease the step numbers and well when combined with AAD. In the following experiments using 2D mountain car tasks, we used $w=0.4$ for adaptive ACLA (ASA $+$ AAD).

Figure 9.

Mean step numbers to reach the goal in the 2D mountain car tasks with different position missing probability $P_{1}$ and velocity missing probability $P_{2}$ .

Figure 10.

Mean step numbers to reach the goal in the 2D mountain car tasks with different missing probabilities of state values.

Figure 9 shows heatmaps of the mean step numbers achieved by the conventional ACLA and the proposed adaptive ACLA (ASA $+$ AAD) using $w=0.4$ in the 2D mountain car tasks with combinations of the missing probability $P_{1}$ of the position $s_{1}$ and the missing probability $P_{2}$ of the velocity $s_{2}$ . Note that the initial position $s_{1}(t=0)$ and velocity $s_{2}(t=0)$ are given for all task settings with different missing probability combinations of $P_{i}$ ( $i=1,2,\ldots,n_{\text{st}}$ ). In other words, the task in the extreme case of $P_{1}=P_{2}=1.0$ does not provide any state values during the task, and each algorithm conducts the task only with the initial position $s_{1}(t=0)$ and velocity $s_{2}(t=0)$ . From Fig. 9a, we see that the step numbers of the conventional ACLA increase, particularly as the position missing probability $P_{1}$ increases. From Fig. 9b, we can see that the proposed adaptive ACLA (ASA $+$ AAD) achieves lower step numbers than the conventional ACLA in all missing probability combinations. These results reveal that ASA and AAD in the proposed adaptive ACLA work robustly when the number of missing state values increases and their probabilities increase.

Figure 10a–c show the mean step numbers when missing only the position, only the velocity, and both the position and the velocity, respectively. Error bars indicate the standard deviation. From Fig. 10a, we can see that all algorithms tend to deteriorate the step numbers to reach the goal as the missing probability $P_{1}$ of the position state $s_{1}$ increases. The proposed adaptive ACLA (AAD) achieved shorter step numbers than the conventional ACLA without AAD. In addition, the proposed adaptive ACLA (ASA) achieves shorter step numbers than the conventional ACLA without AAD. That is, each AAD and ASA contributed to improving the task achievement. The proposed adaptive ACLA (ASA $+$ AAD) achieves the best performance among all algorithms. This tendency is also observed in Fig. 10b and c.

Figure 11.

State trajectories in the 2D mountain car task with the position missing probability $P_{1}=0.8$ and the velocity missing probability $P_{2}=0.8$ .

Figure 11 shows the state trajectories in the 2D mountain car problem with the position missing probabilities $P_{1}=P_{2}=0.8$ . The horizontal axis represents the position $s_{1}=[-1.2,0.6]$ . The vertical axis represents the velocity, $s_{2}=[-0.07,0.07]$ . The goal position is $s_{1}=0.45$ on the horizontal axis. The lines in each two-dimensional space represent the state trajectories of 50 randomly chosen epochs. The green lines represent successful trajectories that reached the goal. The red lines represent failure trajectories that did not reach the goal. From Fig. 11a and e, we can see that the conventional DDPG and ACLA fail to reach the goal in many epochs. From Fig. 11b and d, we can see that the conventional LSTM-DDPG and LSTM-SAC succeed in reaching the goal in many epochs. From Fig. 11c and f, we can see that the conventional LSTM-TD3 and the proposed ACLA (ASA $+$ AAD) succeed in reaching the goal in all epochs. However, the proposed ACLA (ASA $+$ AAD) shows compact state trajectories and succeeds with shorter step numbers than the conventional LSTM-TD3.

These results revealed that the proposed ACLA (ASA $+$ AAD) could predict accurate actions leading to the goal in uncertain 2D mountain car tasks with missing state values.

6.2 3D mountain car tasks

Figure 12.

Step numbers to reach the goal when exponential weight $w$ in ASA varies in the 3D mountain car task with missing probabilities $P_{1}=P_{2}=P_{3}=P_{4}=0.8$ .

Figure 13.

Mean step numbers to reach the goal in the 3D mountain car tasks with missing single state value.

Figure 12 shows the mean step numbers to reach the goal in the 3D mountain car task with missing probabilities $P_{i}=0.8$ ( $i=1,2,\ldots,n_{\text{st}}=4$ ) when we vary the exponential weight parameter $w$ in the proposed ASA. Figure 12 is represented as the similar form as Fig. 8 in the 2D mountain car task. From the results, we can see that the step numbers of the adaptive ACLA (ASA) deteriorate with increasing exponential weight parameter $w$ . However, the proposed adaptive ACLA (ASA $+$ AAD) maintains short step numbers even when the exponent weight parameter $w$ changes. We see that the adaptive ACLA (ASA $+$ AAD) with $w=0.8$ achieves the best mean step numbers, while $w=0.4$ used in 2D mountain car tasks is also promising. In the following experiments using 3D mountain car tasks, we used $w=0.0$ for adaptive ACLA (ASA) and $w=0.8$ for adaptive ACLA (ASA $+$ AAD).

Figure 14.

Mean step numbers to reach the goal in the 3D mountain car tasks with missing two state values.

Figure 15.

Mean step numbers to reach the goal in the 3D mountain car tasks with missing three state values.

Figure 16.

Mean step numbers to reach the goal in the 3D mountain car tasks with missing four state values.

Figure 13a–d respectively show mean step numbers to reach the goal when a single state value is missed. Error bars indicate the standard deviation. The results show that the step numbers of both conventional DDPG and LSTM-DDPG deteriorate as each of the missing probabilities $P_{i}$ ( $i=1,2,\ldots,n_{\text{st}}=4$ ) increases. However, for the conventional LSTM-TD3, LSTM-SAC, ACLA, and the three proposed adaptive ACLAs, we see that their mean steps do not deteriorate significantly even when each missing probability $P_{i}$ ( $i=1,2,\ldots,n_{\text{st}}=4$ ) increases. Specifically, we see that the adaptive ACLA (ASA) and ACLA (ASA $+$ AAD) maintain short step numbers to achieve the goal of increasing each missing probability. Furthermore, Fig. 13b shows that the proposed adaptive ACLA (ASA $+$ AAD) maintains lower step numbers than the adaptive ACLA (ASA) in tasks with high missing probabilities $P_{2}$ .

Figure 14a–f respectively show mean step numbers to reach the goal when two state values are missed. These results show that the step numbers of conventional DDPG, LSTM-DDPG, LSTM-TD3, and LSTM-SAC deteriorate as the two missing probabilities increase. In contrast, the conventional ACLA and the three proposed adaptive ACLAs tend to maintain the step numbers even when two missing probabilities increase. From Fig. 14e, we can see that the proposed ACLA (ASA $+$ AAD) tends to decrease the number of steps as the two missing probabilities increase. This tendency is affected by the combinations of missing state values, their importance for the task, and their predictions. From these results, we can see that the advantage of the proposed adaptive ACLA (ASA $+$ AAD) is the ability to perform tasks with high missing probabilities.

Figure 17.

Trajectories in the 3D mountain car task with missing probabilities $P_{1}=P_{2}=P_{3}=P_{4}=0.8$ .

Figure 15a–d respectively show mean step numbers to reach the goal when three state variables are missed. From these results, we can see that the conventional ACLA and the three proposed adaptive ACLAs suppress the increase in the number of steps required to reach the goal when the three missing probabilities increase. However, the adaptive ACLA (ASA $+$ AAD) is worse than the adaptive ACLA (ASA) in tasks with the three missing probabilities 1.0 in Fig. 15a and b. This supposes that some state variables that have a negative effect on successful action prediction exist for AAD.

Finally, Fig. 16 shows the mean number of steps required to reach the goal when four state variables are missed. We see that the proposed ACLA (ASA $+$ AAD) achieves lower step numbers than the adaptive ACLA (ASA) in tasks with four missing probabilities $\{0.2,0.4,0.6,0.8\}$ .

Figure 17 shows the trajectories in the 3D mountain car task with all missing probabilities $P_{i}=0.8$ ( $i=1,2,\ldots,n_{\text{st}}=4$ ). In each figure, the horizontal axis is the x-axis and the vertical axis is the y-axis. The goal area is greater than $0.5$ for both the x- and y-axes. The lines in each two-dimensional space represent the trajectories of 50 randomly chosen epochs. The green lines represent successful trajectories that reached the goal. The red lines represent failure trajectories that did not reach the goal. From Fig. 17a and e, we can see that the conventional DDPG and ACLA fail to reach the goal in many epochs. From Fig. 17b, c and d, the conventional LSTM-DDPG, LSTM-TD3, and LSTM-SAC succeeded in reaching the goal in many epochs. However, they required a number of steps, and most epochs were far from the goal. Conversely, from Fig. 17f, we see that the proposed adaptive ACLA (ASA $+$ AAD) takes short steps to reach the goal, whereas some epochs fail.

These results reveal that the proposed ACLA (ASA $+$ AAD) could output accurate action predictions to reach the goal in 3D mountain car tasks with uncertainty.

7. Conclusions

To improve the action prediction accuracy of the action-prediction cortical learning algorithm (ACLA) in an uncertain environment, missing multiple state variables simultaneously and probabilistically, we propose an adaptive ACLA that introduces the adaptive synapse adjustment (ASA) and the adaptive action-separated decoding (AAD). Experimental results showed that the combination of ASA and AAD worked well in the 2D and 3D mountain car tasks, and the proposed adaptive ACLA with both achieved lower step numbers to reach the goal, particularly with a high missing probability of state variables. These results implicate that the proposed adaptive ACLA contributes to making decisions for the future, even in cases where information surrounding the situation partially lacked. However, this work supposed that input state values were probabilistically missing, but non-missing state values are true (without any noises).

In future work, we will verify the proposed adaptive ACLA in another uncertain environment with noise state values.

References

Hawkins

Subutai

and Dubinsky

, Hierarchical Temporal Memory Including HTM Cortical Learning Algorithms, Technical report, Numenta Inc., 2010.

Hawkins

and Subutai

, Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex, Frontiers in Neural Circuits 10 (2016), 1–13.

HTM.core, 2020 April 21. Available from: https://github.com/htm-community/htm.core.

Hawkins

and Blakeslee

, On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines, Times Books, 2005.

Ahmad

and Hawkins

, Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory, Technical report, Numenta Inc., 2015, pp. 1–18.

Hochreiter

and Schmidhuber

, Long Short-Term Memory, Neural Computation 9(8) (1997), 1735–1780.

Cui

Ahmad

and Hawkins

, Continuous Online Sequence Learning with an Unsupervised Neural Network Model, Neural Computation 28(11) (2016), 2474–2504.

Aoki

Takadama

and Sato

, Adaptive Synapse Arrangement in Cortical Learning Algorithm, Journal of Advanced Computational Intelligence and Intelligent Informatics (JACIII) 25(4) (2021), 450–466.

Fujino

Aoki

Takadama

and Sato

, A Preliminary Study on Cortical Learning Algorithm for Action Decision Using Forecast, in: The 17th Computational Intelligence Workshop, The Society of Instrument and Control Engineers, 2021, pp. 63–67.

10.

Lillicrap

Hunt

Pritzel

Heess

Erez

Tassa

Silver

and Wierstra

, Continuous Control with Deep Reinforcement Learning, in: Proc. in the International Conference on Learning Representation (ICLR 2016), 2016.

11.

Yang

and Nguyen

, Recurrent Off-policy Baselines for Memory-based Continuous Control, arXiv preprint arXiv:2110.12628, 2021.

12.

Heess

Hunt

Lillicrap

and Silver

, Memory-based Control with Recurrent Neural Networks, arXiv preprint arXiv:1512.04455, 2015.

13.

Meng

Gorbet

and Kulič

, Memory-based Deep Reinforcement Learning for POMDPs, in: Proc. in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021), 2021, pp. 5619–5626.

14.

Jia

Gao

and Peng

, LSTM-DDPG for Trading with Variable Positions, Sensors 21(19) (2021), 1–12.

15.

Eysenbach

and Salakhutdinov

, Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs, in: Proc. in the 39th International Conference on Machine Learning (ICML 2022), vol. 162, 2022, pp. 16691–16723.

16.

POMDP Baselines, 2022 July 26. Available from: https://github.com/twni2016/pomdp-baselines.

17.

Fujino

Aoki

Takadama

and Sato

, Adaptive Synapse Adjustment for Multivariate Cortical Learning Algorithm, in: Proc. in 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS & ISIS 2022), 2022.

18.

Fujino

Aoki

Takadama

and Sato

, Adaptive Synapse Adjustment and Decoding in Action-prediction Cortical Learning Algorithm, in: Proc. in the 12th World Congress on Nature and Biologically Inspired Computing (NaBIC 2022), 2022.

19.

Aoki

Takadama

and Sato

, Column-based Decoder of Internal Prediction Representation in Cortical Learning Algorithms, in: Proc. in 2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems (SCIS & ISIS 2020), 2020, pp. 1–7.

20.

OpenAI Gym, 2020 August 18. Available from: https://github.com/openai/gym.

21.

Taylor

Jong

and Stone

, Transferring Instances for Model-Based Reinforcement Learning, in: Machine Learning and Knowledge Discovery in Databases 2008, pp. 488–505.

22.

Fujimoto

Hoof

and Meger

, Addressing Function Approximation Error in Actor-critic Methods, in: Proc. in the International Conference on Machine Learning (ICML 2018), 2018, pp. 1587–1596.

23.

Haarnoja

Zhou

Hartikainen

Tucker

Tan

Kumar

Zhu

Gupta

Abbeel

and Levine

, Soft Actor-Critic Algorithms and Applications, ArXiv abs/1812.05905, 2018.

24.

Stable Baselines, 2022 July 26. Available from: https://github.com/Stable-Baselines-Team/stable-baselines.

Adaptive action-prediction cortical learning algorithm under uncertain environments

Abstract

Keywords

1. Introduction

2. Cortical learning algorithm

2.1 Predictor

3.2 Algorithm

3.3 Issue focus

5.1 Benchmark tasks

Table 1 Mountain car task setting

Table 2 Eight algorithms

Table 4 Parameter values for the proposed adaptive ACLA

6. Experimental results and discussion

6.1 2D mountain car problem

References

Table 1
Mountain car task setting

Table 2
Eight algorithms

Table 4
Parameter values for the proposed adaptive ACLA