Understanding joint range of motion development in robotic learning

Abstract

Joint Range of Motion (JROM) development has been shown to facilitate learning motor control in human beings. This developmental strategy has been applied in robotics to improve learning performance with different outcomes: sometimes it is favourable, others irrelevant, and others, even detrimental. The reasons that underpin this variability in the results are still not well understood. In this paper, we seek to better understand the principles underlying the application of JROM based morphological development to make its use more straightforward. To this end, empirical studies were carried out over two representative use cases: quadruped and bipedal robot morphologies learning to walk. Different parameters of the application of JROM development (morphological configuration, JROM developmental strategy, etc.) have been evaluated to elucidate their effects over learning. The results show that there are significant connections between the reduction of the motor space induced by JROM and the way the exploration and exploitation of the solution space is carried out by the learning algorithm, and the performance achieved. Through these connections, we have identified a set of conditions that must be satisfied for JROM development to be effective as a tool for learning improvement.

Keywords

cognitive robotics developmental robotics morphological development motor development joint range of motion

1. Introduction

Most of the work in developmental robotics focuses on cognitive development. That is, a robot autonomously learns complex skills by interacting with its environment, using developmental principles extracted from the developmental psychology literature but, unlike what happens in human beings, using a fixed morphology.^1–6 Only a few authors have started to include the development of the morphology in ontogenetic time scales as a parameter to consider during learning to take advantage of the relevance of the morphology for learning.^7–10

Different classes of strategies can be found in the robotics literature that seek developmental variations in the morphology to improve learning. According to Naya et al.,¹¹ these can be grouped into three categories: physical body development,^12,13 sensor development,^14,15 and motor development.^16–18 Out of these, in this paper we concentrate on motor development.

The application of motor development is inspired by Bernstein’s studies on the “Degrees of Freedom problem”.¹⁹ He postulates that in the early stages of motor control, the central nervous system reduces the level of involvement of some of the elements that contribute to body motion (muscles, tendons, etc.) in order to facilitate a tighter control over the remaining ones.¹⁸ Progressively, these limitations become less restrictive through an increase in the range of motion of specific joints or by releasing them completely, until all the constraints disappear.^20,21 This developmental strategy generally follows a proximodistal trend,²² as reported by several studies in the literature on human motor control and coordination.^23–26

In robotics, motor development is usually achieved in two ways: By reducing the maximum Range of Motion available to the Joint (JROM) in the early stages of development and increasing it up to the “adult” or “mature” range of motion as development progresses. Alternatively, by a drastic reduction of the JROM available at the beginning of learning to 0, and then releasing it abruptly at some point in time. The latter is usually called development through “Freezing and freeing Degrees of Freedom (DOF)”.^25,27,28

Looking at the robotics literature that tries to adapt these ideas to improve learning in robots, there is not that much work on the application of JROM based development, and the work addresses two main tasks, as we will see now: the walking tasks as examples of clear dynamic control processes and the reaching task, or the combined reaching and grasping task, which are less critical in temporal terms.

Considering the walking task, Bongard²⁹ showed that gradual morphological changes in quadrupeds and hexapods outperform the learning performance of a fixed morphology when the task is challenging (e.g. learning to walk diagonally instead of in a straight line). He observed that an early limitation of the robot DOFs and their gradual release, accelerates the acquisition of optimal behaviours, but it is not clear why this happens. In addition, if the morphology changes abruptly, learning performance degrades when using development with respect to not using it. This suggests that the abrupt change in the controller-morphology relationship may be a distorting factor.

In a quadruped morphology, Baranes et al.³⁰ show how motor development, plays an important role in maximizing learning efficiency. This is attributed to the simplification of the learning process caused by the reduction of the motor space, which is gradually increased when a certain level of mastery is achieved. Addressing a bipedal morphology in a swinging task, which is more dynamically challenging, Lungarella and Berthouze,¹⁷ suggested that starting with frozen DOFs helps to stabilize the system and allows finding more robust behaviours due to “a more efficient exploration of the sensorimotor space”. The authors suggest that this happens because of the physical entrainment of the body, which helps to find a high-yield area in the parameter space. Nevertheless, this more efficient exploration is not enough to find stable behaviours under system perturbations, requiring a process of alternating freezing and freeing DOFs.²⁸

Lapeyre et al.,³¹ showed how motor development “leads to a faster and safer way for learning”, in a bipedal walking task helped by a trolley. This was particularly relevant in cases where the motor space was strongly constrained and the DOFs were slowly released. Seemingly, because in less restricted scenarios, the motor space was too large and the optimization method could not find good solutions without many iterations.

Other authors considered use cases that did not contemplate walking. For instance, the work done by Qiang Shen and his colleagues in different experiments involving robotic arms and vision systems (learning to write Chinese stroke,³² object perception and recognition,³³ or feature perception³⁴) has shown how morphological development, based on the “Lift-Constraint Act and Saturate” (LCAS) algorithm,³⁵ can improve learning. The LCAS algorithm imposes maturational constraints not only on the motor system but also on the sensor system and the computational one (controller or learning algorithm). These constraints are lifted once a certain level of mastery is achieved. These experiments were inspired by the developmental psychology literature and pursued the goal of creating an infant-like learning mechanism, rather than a goal of understanding the mechanisms and implications of the developmental sequences. Little information is provided about why these infant-like learning mechanisms have favoured learning. blackIn this line, Campos-Alfaro et al.³⁶ propose an approach to integrate open-ended learning in modular robotics. This work emphasizes the importance of equipping robots with morphological adaptability and the ability to autonomously learn utility models specific to each morphology through a motivational system designed for open-ended learning.^5,37 This contribution is relevant because it suggests that morphological adaptability not only enhances learning performance, but also aligns with the need to understand how different parameters interact in the learning process in dynamic environments.

The objective of developing an infant-like learning system was shared by Savastano and Nolfi¹⁴ while learning to reach and grasp using an iCub robot. The authors addressed the development of the sensor and motor system separately and jointly. They encountered that the development of the sensor system was irrelevant, while the development of the motor system reducing the number of DOFs improved learning. In this case, the robot initially starts with basic primitive movements whose complexity increases as development progresses and DOFs are released, which is aligned with Bernstein’s hypothesis about motor control: simplifying learning at the beginning through constrained motion, and once a good level of proficiency is attained, the constraints can be freed.¹⁹

Bernstein’s hypotheses are also supported by Ivanchenko and Jacobs,³⁸ in a simulated three-joint arm learning specific trajectories. An early freezing of the elbow and wrist joints (the furthest ones) contributes to obtaining better results than in the no development case. The authors argue that “the knowledge gained during these early training stages would provide a useful foundation for further learning at later training stages”. In addition, they also found that when the task is simple (the reference trajectory is easy to learn) development is irrelevant, and it only makes sense when the trajectory is hard to learn (complex task). Nevertheless, the authors did not focus on identifying the parameters or variables that cause this improvement, and no clear information about that is presented.

In this line, there are articles in the literature that report positive outcomes addressing motor development in the reaching task^27,39 but as it seems that they are preliminary results, they do not study the causes or reasons that produce this advantage in any depth. Table 1 summarizes the different authors’ claims, what they considered, and how their experiments were carried out. Based on the literature and what is shown in Table 1, we observe that motor development has favoured learning based on two factors that are related to each other: 1) A reduction in the motor space^30,30,31,38 and 2) the greater stability provided by such reduction in the motor space.^28,31 However, certain limitations are also observed in the conclusions provided: 1) Stability can affect walking tasks, but not tasks that only address reaching, where stability is intrinsic to the morphology; 2) The implementation of motor development in the literature has been quite heterogeneous and there does not seem to be a unified framework for the problem: different definitions for parameters and aspects relevant to the topic, different experimental conditions and, in most cases, and there is no clear explanation about why JROM development has influenced learning except in very general terms. In addition, some experiments address motor development based on time constraints^14,29,38 and others, on performance.^17,30,33 All of this makes it difficult to identify which parameters are really important for the successful implementation of motor development.

Table 1.
Summary of the representative articles found in literature.

Article Morphology Task Type of developmental experiment Oportunities

Bongard²⁹ Simulation. Quadruped (10DOF) and hexapod (16DOF) Learning to walk Summary: Comparison among 1 no-dev. and 3 dev. experiments. Each experiment has 4 learning phases (phylogenetic time) in which learning performance is evaluated (ontogenetic time). Dev. implementation: The joint initial position and JROM change in phylogenetic time (exp1), during ontogenetic time (exp2) and the joint position and the JROM change between phases and the legs also grow (exp3) Development is not useful in phylogenetic time due to abrupt sensorimotor changes (exp1), but it is favourable in the case of gradual changes (ontogenetic time, exp2 and exp3)

Baranes et al.³⁰ Simulation. Quadruped (12DOF) Visiting positions in the 2D space Summary: Comparison between different exploratory algorithms (5 no-dev experiments) vs. 1 dev. experiment, with an initial reduction and gradual increase of the motor space. Dev. implementation: The motor space is defined by the phase and amplitude of CPG of the controller, whose value increases as development progresses Learning is favoured thanks to an initial reduction of the motor space, which simplifies the exploration of the Solution Space

Lungarella and Berthouze¹⁷ Real robot. Biped (12DOF) Finding physical entrainment Summary: 3 experiments. Comparing 2DOF independent control (no-development) vs. 1DOF and 2DOF bootstrapped control, in a non-perturbed and perturbed system. Dev. implementation: Freezing (maintaining the joints stiff) and abruptly freeing DOFs of the knee (1DOF) and those of hip and knee (2DOF) Bootstrapping DOF favours physical entrainment, providing higher stability in the case without perturbation. An extra sequence of freezing and freeing DOFs is needed when the system is perturbed

Lapeyre et al.³¹ Simulation. Biped (17DOF) Learning bipedal locomotion supported by a trolley Summary: Comparison of 3 dev. experiments and the control one (no-dev). Dev. implementation: The motor space is defined by the phase and amplitude of the CPG of the controller. Different degrees of limitation of the motor space, by reducing the degree of movement (exp1), the DOFs (exp2), and both (exp3) Learning is favoured thanks to the initial reduction of the motor space. Strong constraints lead to a faster and safer learning (exp3) compared to other dev. cases.

Qiang Shen et al.^32–35 Real robot. Arm robots. Several configurations Learning multiple tasks based on reaching Summary: Different articles that solve tasks based on infant-like developmental systems. Dev. implementation: Several constraints implemented (sensor, motor) but not clearly specified Infant like learning systems are deployed, but no performance comparison with a no-developmental system is provided

Savastano and Nolfi¹⁴ Simulation. iCub (53DOF) Learning an infant-like model to reach Summary: Incremental learning process that modelled the reaching and grasping capabilities of infants. Dev. implementation: Sensor and motor constraints. Motor constraint implemented by locking DOFs (freezing weight connections of a neural network to 0) Infant like learning system implemented. Motor development favours learning, being sensor development irrelevant

Invanchenko and Jacobs³⁸ Mathematical model. Robot arm (3DOF) Learning trajectories in a 3D space Summary: Comparison among 2 no-dev experiments and 4 dev ones. Dev. implementation: Motor development by limiting and progressively increasing the development of the trajectory (exp1), the feedback gain (exp2) and both (exp3) Development favours learning, especially when combining both developmental strategies (exp3), suggesting that it simplifies learning at the beginning

Article	Morphology	Task	Type of developmental experiment	Oportunities
Bongard²⁹	Simulation. Quadruped (10DOF) and hexapod (16DOF)	Learning to walk	Summary: Comparison among 1 no-dev. and 3 dev. experiments. Each experiment has 4 learning phases (phylogenetic time) in which learning performance is evaluated (ontogenetic time). Dev. implementation: The joint initial position and JROM change in phylogenetic time (exp1), during ontogenetic time (exp2) and the joint position and the JROM change between phases and the legs also grow (exp3)	Development is not useful in phylogenetic time due to abrupt sensorimotor changes (exp1), but it is favourable in the case of gradual changes (ontogenetic time, exp2 and exp3)
Baranes et al.³⁰	Simulation. Quadruped (12DOF)	Visiting positions in the 2D space	Summary: Comparison between different exploratory algorithms (5 no-dev experiments) vs. 1 dev. experiment, with an initial reduction and gradual increase of the motor space. Dev. implementation: The motor space is defined by the phase and amplitude of CPG of the controller, whose value increases as development progresses	Learning is favoured thanks to an initial reduction of the motor space, which simplifies the exploration of the Solution Space
Lungarella and Berthouze¹⁷	Real robot. Biped (12DOF)	Finding physical entrainment	Summary: 3 experiments. Comparing 2DOF independent control (no-development) vs. 1DOF and 2DOF bootstrapped control, in a non-perturbed and perturbed system. Dev. implementation: Freezing (maintaining the joints stiff) and abruptly freeing DOFs of the knee (1DOF) and those of hip and knee (2DOF)	Bootstrapping DOF favours physical entrainment, providing higher stability in the case without perturbation. An extra sequence of freezing and freeing DOFs is needed when the system is perturbed
Lapeyre et al.³¹	Simulation. Biped (17DOF)	Learning bipedal locomotion supported by a trolley	Summary: Comparison of 3 dev. experiments and the control one (no-dev). Dev. implementation: The motor space is defined by the phase and amplitude of the CPG of the controller. Different degrees of limitation of the motor space, by reducing the degree of movement (exp1), the DOFs (exp2), and both (exp3)	Learning is favoured thanks to the initial reduction of the motor space. Strong constraints lead to a faster and safer learning (exp3) compared to other dev. cases.
Qiang Shen et al.^32–35	Real robot. Arm robots. Several configurations	Learning multiple tasks based on reaching	Summary: Different articles that solve tasks based on infant-like developmental systems. Dev. implementation: Several constraints implemented (sensor, motor) but not clearly specified	Infant like learning systems are deployed, but no performance comparison with a no-developmental system is provided
Savastano and Nolfi¹⁴	Simulation. iCub (53DOF)	Learning an infant-like model to reach	Summary: Incremental learning process that modelled the reaching and grasping capabilities of infants. Dev. implementation: Sensor and motor constraints. Motor constraint implemented by locking DOFs (freezing weight connections of a neural network to 0)	Infant like learning system implemented. Motor development favours learning, being sensor development irrelevant
Invanchenko and Jacobs³⁸	Mathematical model. Robot arm (3DOF)	Learning trajectories in a 3D space	Summary: Comparison among 2 no-dev experiments and 4 dev ones. Dev. implementation: Motor development by limiting and progressively increasing the development of the trajectory (exp1), the feedback gain (exp2) and both (exp3)	Development favours learning, especially when combining both developmental strategies (exp3), suggesting that it simplifies learning at the beginning

At this point, it is important to note that in the cases where intrinsic dynamicity was not the key aspect of the task, such as in reaching, most of the work considered the simultaneous development of the controller, the motor and the sensory system, thus making the problem even more untraceable.^14,15,40 This was not the case in walking tasks, where in some cases, and given the fact that open-loop solutions to walking exist, no sensory system is even considered, just an external observer to measure the final performance. This nicely decouples motor development from sensory-motor development, allowing for a more nuanced study of the problem in hand.

Hence, in this article, our goal is to try to understand the implications of JROM-based development, decoupled from sensing, as an aid for learning in robotic systems by exploiting the embodied characteristics of the morphology through the different developmental stages. Thus, we seek to identify those parameters or approaches that allow a finer control of the learning process when applying JROM morphological development to explore learning paths that traditional learning algorithms ignore. This objective aims to complement our previous work^13,41 where we studied the influence that morphological development based on physical changes in the robot’s body has on learning, and analyzed reasons that cause it. In that article, experiments were carried out on 2, 4, 6, and 8-legged robots.

Thus, the remainder of the article is organized as follows. Section 2 provides a formalization of morphological development we use in this article, and explains how JROM-based development is applied. Section 3 presents the methodology we have followed. It describes in detail the characteristics of the selected morphologies and the experimental framework. Section 4 providess the structure and logic of the different experiments performed. Section 5 and 6 are devoted to the presentation of the results of the experiments that were carried out over the quadruped and biped morphology, related to the hypotheses established in the methodology. A discussion of these results is provided in Section 7. Finally, Section 8 is concerned with the main conclusions extracted from this work.

2. Morphological development framework

For coherence with our previous work, we consider the formalization of general morphological development that we introduced in Naya-Varela et al.¹¹ and particularized for growth in Naya-Varela et al.¹³ In this case, the formalization is particularized to JROM development as this is the main topic of this paper.

To formalize morphological development, it is always necessary to define what we understand by morphology. In this line, we consider that a robot is made up of a set of $l$ links $L = {l_{1}, l_{2}, \cdot, l_{l}}$ , a set of $j$ joints $J = {j_{1}, j_{2}, \cdot, j_{j}}$ , which can be actuated or not, and a set of $s$ sensors $S = {s_{1}, s_{2}, \dots, s_{s}}$ , each one with their corresponding properties, $^{L} P$ for the links, $^{J} P$ for the joints, and $^{S} P$ for the sensors. Therefore, a robot morphology can be defined as the set of links, joints and sensors that make up the robot together with their properties: $M = {L, J, S,^{L} P,^{J} P,^{S} P}$ .

Additionally, regarding robot operation, the robot functions over the time interval $t \in [0, T]$ , where $t = 0$ marks the beginning of its lifetime and $t = T$ represents its end. In the literature on the development of the morphology, we assume that morphological development will take place during part of that lifetime. This gives rise to the general definition of morphological development as a function $M D (t)$ . This function describes how the values of the properties in the property sets defining the robot’s morphology evolve over time. In this article, as we are specifically considering the development of the joint range of motion, only the properties of the joints change ( $^{J} P$ ). Thus, development can be formally defined as a function $M D (t)$ that describes the values of these properties in time for the lifetime of the robot as: $M D (t) = {^{J} P_{t}} \forall t \in [0, T]$ .

From the perspective of robot operation and learning, as shown in Figure 1, the morphology is managed by a control system.^42,43 This control system is defined by a set of parameters (in our experiments, by the structure and weight values of an artificial neural network, but they can be any other parameters depending on the implementation of the controller) that are optimized along the learning process to achieve the goals given to the robot using any learning algorithm (e.g., neuroevolution).

Figure 1.

Flow control in a typical robotic learning problem. In white, the environment. In green, the fitness value that represents the performance of a given solution. In orange, the physical elements that constitute the robot morphology. In yellow, transfer functions that map a set of parameters (e.g. perceptions in the form of images to inputs to the controller). In blue, the set of elements that constitute the controller of the robot. In grey the learning algorithm.

The control system receives data (Inputs, $I$ ) through the sensors. This data can be preprocessed through a function $u : R S V \to I$ . When $u$ is the identity function, the controller works directly with Raw Sensor Values, $R S V$ . On the other hand, the controller produces a series of values (raw outputs, $O$ ) that control the joints of the robot. In some controllers, such as NN controllers, the outputs are generally normalized between 0 and 1, and the values sent to the joints (Joint Commands, $J C$ ) need to be denormalized to adapt them to the specific range of motion (e.g., from $-$ 90° to 90°) or actuator characteristics. This denormalization/adaptation function is called here $v$ $(v : O \to J C)$ , and in most applications that do not involve JROM development, it is constant during the whole learning period.

Limiting the range of motion of a joint implies changing the mapping between the output of the controller and the command sent to the joint, that is, changing $v$ . In this context, changing the JROM over time means that the joint commands also change over time, showing a temporal dependence of $v$ $(v (t))$ . Thus, the set of available joint commands (Joint Command Space or $J C S$ ) is not constant in time. It is also time dependent $(J C S (t))$ . In other words, this implies a time variation in the joint properties of the morphology $(^{J} P_{t})$ as previously stated. Finally, it must be mentioned that, based on our definition, the motor space that is often mentioned in the literature, is equivalent to our $J C S$ , because both represent all the possible movements of the joints (or motors) that the robot can perform at a given point in time.

To round off these definitions of terms, the combined effect of the joint commands, the robot morphology, and the environment in which the robot operates, results in a value or set of values assigned by the designer/user representing the fitness $(F)$ of the controller for achieving the designer/user goal through a function $f$ $(f : J, E, L \to F)$ .

Hence, in this formalization, the solution, $s$ , to the learning problem is given by the controller of the robot that optimizes function $f$ for a given morphology and environment, being the Solution Space $(S S)$ , the set of all possible solutions (controllers) that can be obtained $(S S = \forall s)$ . The whole set of fitness values over the $S S$ represents the Fitness Landscape $(F L)$ of the problem.

In most optimization and learning problems, the $F L$ remains constant during learning as there is no time component in the relationship between solutions and fitness. However, in JROM development this is not the case. The relationship between the solutions (controllers) and their fitness values changes during development due to the variations in the relationships between the outputs of the controllers and the joint commands induced by $v (t)$ during development. This implies that the $F L$ changes during learning, providing an opportunity to shape the temporal evolution of $F L$ through the design of appropriate $v (t)$ functions to make learning a globally simpler and more successful process.

Reaching this point, the question now is how to design $v (t)$ and what factors should be considered in order to facilitate this design. To address this issue, we will carry out an empirical study of the effects of different factors or parameters, such as the developmental speed or the developmental strategy selected based on different designs of the $v (t)$ function, with the objective of ascertaining how they should be used to improve learning. In the next sections we will first describe the structure and logic of the empirical study, the experimental setup, and the particular experiments that were run. This will be followed by an analysis and discussion of the results obtained.

3. Experimental setup

3.1. Methodology

As mentioned in previous sections, we want to study the implications of implementing JROM-based morphological development during learning. This implies addressing some initial questions that arise from the work of previous authors. Looking at the literature (see Table 1 for a summary), some authors claim that “learning is simplified through a reduction of the motor space”.^30,31 This hints at the fact that controlling the size of the motor space, which in our case is given by the Joint Command Space, $J C S$ , should be beneficial. However, this is not always the case.^28,29,38 Thus, the first question to address is: What are the mechanisms behind the supposed learning simplification? Under what circumstances do they work? Other authors speculate that “development helps to find an area of high performance in the parameter space”.¹⁷ Thus, are there some rules on how this reduction should be carried out so that areas of high performance are easier to find?

In addition, it must be made clear here that the final goal is to obtain a controller that, when learning is completed, controls the robot without any constraints on the JROM apart from those given by its morphology. This means that whatever reduction in the JROM is initially imposed, it must have been somehow removed by the end of learning. However, the question here that has not been really answered in the literature is how they should be removed. That is, how must JROM change in time during learning? Should it vary progressively towards the final JROM? In one shot? Does this depend on the specific problem? In fact, if progressively were the answer, it would beg the question of how fast. In other words, is there any factor that determines this speed?

To address these questions, and guide the experimental work, we are going to establish a series of hypotheses based on information extracted from the literature and our previous work on the relevance of the application of morphological development for learning. These hypotheses are:

A limitation of the available JROM implies a reduction of the motor space. This limitation means a reduction in the available $J C S$ , which in turn, only makes available some regions of the fitness landscape, compared to the fitness landscape of the final morphology. These reductions should simplify the task of the learning algorithm to find optimal solutions, providing a more efficient exploration of the Solution Space ( $S S$ ).

The developmental speed influences learning: the higher the speed of development, the less relevant development is, because in these cases, the learning algorithm does not have enough time to explore the $S S$ adequately. On the other hand, the learning algorithm may find itself trapped in an area of the $S S$ to which it may have converged when development speed is too slow.

A proper synergy among the various components of the development and learning process⁴¹ is needed for JROM-based development to be relevant for learning. This means that it is not enough to simply reduce the motor space to improve the $S S$ exploration capacity, but that this reduction must include informative solutions that help to enhance learning throughout development, otherwise, a reduction of the motor space to areas that contain invalid solutions will not lead to a learning improvement.

To experimentally study these hypotheses, collecting statistical data to compare and analyze the results of JROM development, we chose two use cases related to walking: quadruped and bipedal walking. The main reason for this choice is that the basic walking task can be achieved without any sensory feedback, as other authors have done.^30,31,41 This way, it should be possible to discriminate the effects of JROM development from, for instance, sensor development or even sensor choice for the feedback, which is the problem of tasks such as grasping or reaching. In addition, these two morphologies have a different number of joints and different stability characteristics, which makes the same task (learning to walk) more challenging for one morphology than for the other. Finally, several authors make use of these cases, and, in fact, we have studied them in a growth-based development case, and thus the results can be compared among developmental strategies, to exploit the relevance of morphological variation at the same time as learning occurs.

Taking these hypotheses into account we have designed a series of experiments in which we have compared the process of learning controllers with and without JROM development considering different parameters, with the aim of generalizing as much as possible the conclusions obtained from the results. To begin, we will start with the quadruped morphology, as it is the more stable one and, thus, theoretically, presents the lowest difficulty for learning. It allows us to test different configurations of the morphology, leading to different configurations of the motor space.

In these settings we will evaluate different developmental strategies, such as progressive JROM development, abrupt JROM change (freezing and freeing DOF), or a mixture of both. To support the previous experiments, a group of experiments based on different configurations of the morphology learning without morphological development, have also been performed. Additionally, we present the results of two experiments with different morphological configurations but contemplating the same developmental speeds.

Figure 2.

Representation of the quadruped with 8 DOF and an angle offset for the joints of 0° with respect to the vertical plane.

On the other hand, the experiments carried out with the biped have been designed to complement the information obtained using the quadruped morphology and thus provide a more general view of the results. To this end, JROM development experiments similar to those performed with the quadruped have been carried out (such as a gradual or abrupt JROM development), but with variations with respect to those implemented with the quadruped (among other reasons, because the morphologies are different) expanding thus the use cases studied.

In the following subsections, we provide more details on the morphologies that were used and the configuration and execution of the experiments themselves.

3.2. Morphologies

The quadruped morphology (Figure 2) is made up of a central body and four limbs attached to it. Each limb consists of an upper link and a lower link, connected by revolute joints. The upper link measures 5 $\times$ 2.5 $\times$ 0.5 cm with a mass of 250 g, while the lower link includes two 250 g segments connected by a prismatic joint, giving it a total length of 17.5 cm. The prismatic joint was used in experiments carried out in previous papers, but in this paper, the joint is not actuated (it is always in the same position). It is just left there so that the results of growth-related development of the previous paper and the current JROM related development are morphologically comparable. Additionally, each limb has two revolute joints: one connecting the body to the upper link and another connecting the upper link to the lower link. The revolute joints are actuated, providing a maximum torque of 2.5 N $\cdot$ m, with a proportional parameter set to 0.1 for each joint. Throughout this article, a joint is defined by two parameters: its range of motion (the already mentioned JROM) and the angle offset, which defines where that range of motion is centred (we refer to it as Joint Angle Offset, JAO). An example of two morphological configurations (including JAO and JROM) for a quadruped robot is displayed in Figure 3. In the quadruped morphology, we keep the same JAO for the joints of the upper limb (the robot needs to move it backwards and forward), but we carried out experiments for different JAOs for the lower limbs’ joints. The angle offset of the joints of the lower limb are fixed for each type of experiment, but the JROM varies depending on the type of experiment and developmental stage under study. The maximum range of motion for the joints of the upper limb and lower limb are displayed in Table 2.

Figure 3.

Top: Quadruped with a joint angle offset (JAO) of 60° and with a JROM of [ $-$ 150°, 30°]. The light green area represents the JROM available until reaching the lower bound ( $-$ 150°). The light blue area represents the JROM available until reaching the upper bound (30°). Bottom: Quadruped with a JAO of 60° and with a JROM of [ $-$ 120°, 60°].

Table 2.

Maximum JROM for the upper limb and lower limb of the quadruped in each joint angle offset.

Joint Angle Offset of the lower limbs	0°	30°	60°
Upper limbs	$[- 90 \circ, 90 \circ]$	$[- 90 \circ, 90 \circ]$	$[- 90 \circ, 90 \circ]$
Lower limbs	$[- 90 \circ, 90 \circ]$	$[- 120 \circ, 60 \circ]$	$[- 150 \circ, 30 \circ]$

The morphology of the bipedal robot is based on a real NAO robot model created in the CoppeliaSim simulator (Figure 4). For coherence with our previous work, the NAO model has the legs modified to change its morphology during learning (allowing it to grow),^13,30,44 although in this article the length of the legs will be fixed: –

Upper link: The upper section consists of three joints (hip yaw-pitch, hip roll, and hip pitch) and two identical cuboids, each with dimensions of 8 $\times$ 8 $\times$ 7.2 cm and a mass of 458 g, resulting in a total of 916 g. These cuboids are connected by a prismatic joint, which has a maximum force of 50 N and an extension range of 4.0 cm. The prismatic joint remains in a fixed position.

–

Lower link: The lower leg section includes one joint (knee pitch) and two distinct cuboids. The upper cuboid measures 8 $\times$ 3 $\times$ 8 cm and weighs 192 g, while the lower cuboid measures 9 $\times$ 8 $\times$ 3 cm and has a mass of 216 g. The properties and geometric orientations of these cuboids were chosen to retain the NAO robot’s original design. The prismatic joint characteristics here are identical to those in the upper link.

–

Foot: The foot model was simplified by reducing the number of cuboids from the original NAO design. Each foot is now sized at 18.4 $\times$ 10 $\times$ 1.5 cm and weighs 276 g. The foot is controlled by two joints: ankle roll and ankle pitch.

Figure 4.

Left: Frontal view of the bipedal (NAO) robot. In green, the default meshes of the original robot. In grey, the modified parts. Rotational and prismatic joints are indicated in red. Right: Side view of the bipedal robot with various parts labelled.

In addition, the shoulders of the NAO are also actuated, being able to move forward and backwards by means of the shoulder pitch joint. That is, there are a total of 14 joints and their ranges of motion are presented in Table 3.

Table 3.

Maximum available JROM values for each joint (right and left sides) of the bipedal robot.

Joint	Shoulder Pitch	Ankle Roll	Ankle Pitch	Knee Pitch	Hip Pitch	Hip Roll	Hip Yaw Pitch
JROM	$[- 40 \circ, 40 \circ]$	$[- 30 \circ, 30 \circ]$	$[- 65 \circ, - 5 \circ]$	$[25 \circ, 85 \circ]$	$[- 50 \circ, 10 \circ]$	$[- 10 \circ, 10 \circ]$	$[- 10 \circ, 10 \circ]$

3.3. Controller, learning algorithm and simulation

The robots’ controller is based on a Neural Network (NN) structure using sigmoid activation functions. The inputs and outputs of the NN for each morphology, as well as other parameters of the controller are summarized in Table 4.

Table 4.
Summary of the NN controller parameters for each morphology.

Parameter Quadruped Robot NAO Robot

NEAT Initialization NN fully connected (no hidden layers)

NN Inputs 1 input $+$ bias 3 inputs $+$ bias

NN Outputs 8 outputs (one for each joint) 14 outputs (for legs and shoulders)

Input Type for NN Sinusoidal signals Sinusoidal signals

Input Amplitude 2 2

Input Angular Velocity 0.1 rad/s 0.1 $\cdot$ $π$ rad/s

Input Phases 0 0, $\frac{π}{3}$ , $\frac{π}{5}$ rad

Parameter	Quadruped Robot	NAO Robot
NEAT Initialization	NN fully connected (no hidden layers)
NN Inputs	1 input $+$ bias	3 inputs $+$ bias
NN Outputs	8 outputs (one for each joint)	14 outputs (for legs and shoulders)
Input Type for NN	Sinusoidal signals	Sinusoidal signals
Input Amplitude	2	2
Input Angular Velocity	0.1 rad/s	0.1 $\cdot$ $π$ rad/s
Input Phases	0	0, $\frac{π}{3}$ , $\frac{π}{5}$ rad

Learning is achieved through neuroevolution using the NEAT algorithm. NEAT was selected because of its capability to simultaneously optimize both the topology and the connection weights of the NN,⁴⁵ thus reducing the influence of the human designer in the learning process.

To simplify the study of joint range of motion-based development, the complexity of the controller has been reduced to a minimum. The inputs to the NNs are sinusoidal signals, used as pattern generators,^46–49 1 for the quadruped and 3 for the biped. The difference is motivated by the complexity of the morphology. A biped requires a larger number of pattern generators to produce different types of gaits (hence the phase change between each of the input signals shown in Table 4). The amplitude of the pattern generators is set to 2 to avoid normalization of the input values. Furthermore, the frequency is also different for each morphology and was obtained experimentally. The outputs are given by the number of joints available for control in each morphology (8 for the quadruped and 14 for the NAO) and they are scaled from [0, 1] to align with the specific range of motion available for each joint ( $v$ function in our formalization). The corresponding data is presented in Table 2 for the quadruped and Table 3 for the NAO. An example of the NN structure for the quadruped morphology at the beginning of learning is shown in Figure 5. In addition, the pseudocode of the learning algorithm is displayed in Algorithm 1.

Figure 5.

Example of the NN structure at the beginning of the learning process for the quadruped, which has 8 degrees of freedom (DOF) and includes 1 sinusoidal input plus a bias. When learning progress, the NEAT algorithm can add extra neurons and connections (weights) to the NN.

The experiments were conducted using the CoppeliaSim simulator with the Open Dynamics Engine. For each independent run, the NEAT algorithm optimizes a population of 50 individuals for 300 generations. These parameters were selected deliberately low, because JROM development allows to ”offload” computation from the algorithm to the morphology, helping to find an optimal solution with less computational resources. The simulation configuration is the default one, with a time step of 50 ms. In addition, as the movements to perform for each morphology are different, the joints of the quadruped are updated every two simulation time steps. This allows enough time to perform the movements, as they are larger than in the biped case. Finally, in the simulator, each individual is evaluated for 9 seconds, time enough to properly learn gait patterns without extending too much the experimental phase. These parameters are summarized in Table 5.

Table 5.

Simulation experimental parameters.

Parameter	Details
Population Size	50 individuals
Generations	300
Independent Runs per Experiment	50
Test Duration per Individual	9 seconds
Simulation Time Step	50 ms
Physics Engine Time Step	5 ms
Quadruped NN Update Frequency	Every 100 ms (every 2 time steps)
NAO NN Update Frequency	Every 50 ms (every time step)

3.4. Fitness function

The fitness value of each morphology is related to the distance travelled in a straight line and the possibility of falling:

F i t n e s s = \frac{D T}{M D} \cdot (1 - α) + \frac{T S T - T F P}{T T S} \cdot α

(1)

Being:

–

$D T$ : Distance Traveled by each morphology in a straight line in meters.

–

$M D$ : Maximum Distance travelled by an individual for each morphology, empirically obtained from a series of previous experiments: 2.75 m for the NAO and 5.25 m for the quadruped.

–

$α$ : Coefficient to weigh the relevance of each term in the fitness value: 1) The Distance Traveled, and 2) The time steps travelled without falling. This term only affects the NAO, because the chance of the quadruped falling is very low, and when it happens it can easily recover. The NAO, on the other hand falls easily and it is extremely difficult for it to recover from falls. By default, for the NAO, we consider a value of $α = 0.5$ and for the quadruped a value of $α = 0$ .

Thus, given the $α$ value of $0$ for the quadruped, the following parameters only affect the fitness value of the NAO:

–

$T S T$ : Time Step Traveled. Number of time steps in simulation travelled by the robot without falling.

–

$T F P$ : Time Steps Falling Penalization. This is a penalization applicable if the robot falls before ending the evaluation period. If it falls, $T F P = 15$ time-steps, otherwise, $T F P = 0$ .

–

$T T S$ : Total Time Steps. Maximum time steps for evaluation (180 time-steps).

In this context, we assume that the NAO has fallen when its head is at a height of less than 0.4 m.

4. Experiments

The following experiments have been carried out over the quadruped and NAO morphology:

–
No Development (ND): This is the baseline reference experiment. At generation 0, the robot starts with all the DOFs and JROM available, those displayed in Tables 2 and 3, and they do not change during the whole learning process (the $v$ function is fixed and does not change during learning).

To study the influence of motor development, two groups of additional experiments have been designed: The first one involves experiments without variation in the number of DOF and the JROM during the learning process (i.e. the $v$ function is invariable), but that present different values to the ones displayed in Tables 2 and 3. These experiments try to address the implications of learning with a reduced (but constant) JROM or number of DOF. They are:

–
No Development with Reduced JROM (RND): These experiments are similar to the ND one but with reduced values for the JROM. They are conducted with the aim of finding out whether a reduction in JROM alone, without development, could achieve an improvement in learning compared to the standard case of ND or whether it would be detrimental or irrelevant (and to dilucidate why). The value of JROM for each joint depends on the type of experiment. For example, for the quadruped morphology, these experiments are characterized by performing multiple experimental runs with various Joint Angle Offsets (JAO) and total JROM configurations for the joints of the lower limbs of the quadruped (Table 6). The term “total JROM” is understood as the sum of the absolute values of the upper bound and lower bound (e.g., a total JROM of 100° means a JROM of [ $-$ 50°, 50°]). The idea behind checking different JAO and JROM configurations is to perform a sweep of those experimental parameter values to check whether there are certain morphological configurations that favour learning over others. These experiments are characterized by having the same JROM for the upper bound and lower bound. Experiments with JROM values that are not the same for the upper and lower limits were left blank. For simplicity, the experiments are executed up to generation 150 instead of 300. This experiment is not carried out for the biped morphology as using a different initial joint angle offset would make the robot fall.

Table 6.
Different JROM tested for each joint angle offset (JAO) (rows) of the lower limbs (Columns).

Maximum JROM available for the Lower Limb (LL) of the quadruped

JAO 20° 40° 60° 80° 100° 120° 140° 160° 180°

80° [ $-$ 10°, 10°]

70° [ $-$ 10°, 10°] [ $-$ 20°, 20°]

60° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°]

50° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°]

40° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°]

30° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°] [ $-$ 60°, 60°]

20° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°] [ $-$ 60°, 60°] [ $-$ 70°, 70°]

10° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°] [ $-$ 60°, 60°] [ $-$ 70°, 70°] [ $-$ 80°, 80°]

0° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°] [ $-$ 60°, 60°] [ $-$ 70°, 70°] [ $-$ 80°, 80°] [ $-$ 90°, 90°]

$-$ 10° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°] [ $-$ 60°, 60°] [ $-$ 70°, 70°] [ $-$ 80°, 80°]

$-$ 20° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°] [ $-$ 60°, 60°] [ $-$ 70°, 70°]

$-$ 30° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°] [ $-$ 60°, 60°]

$-$ 40° [ $-$ 10°, 10°] [ $-$ 20°, 20°] [ $-$ 30°, 30°] [ $-$ 40°, 40°] [ $-$ 50°, 50°]

The other group of experiments involves variations in the JROM or in the number of DOFs available during learning. These experiments imply a modification of the parameters of function $v$ (Figure 1) in each generation that development takes place. These modifications result in a variation in the set of Joint Commands $(J C)$ sent to the joints by modifying the denormalization limits of the outputs of the NN. Once development ends, $v$ remains constant, and it is the same as in the ND case. These experiments seek to study the effects of an initial reduction of the motor space and its increase until the final one (abruptly, gradually, at different speeds, etc.) and its relationship with the walking learning performance.

–
Proximodistal JROM Development (PJD): These experiments are characterized by keeping the JROM limits of the joints closest to the body invariable whilst reducing the JROM of those farthest. Generally, the initial JROM available is 1/2 of the final one, but in some cases, it starts completely limited at generation 0 in the case of the NAO or 1/8 in the case of the quadruped. Such reduced JROMs increase linearly until they reach the experimental configuration of the no-development case. After that, learning continues as in the reference case. For the quadruped, development is applied to the joints of the lower link, while in the NAO, it is applied to the joints of the ankles, knees, and shoulders.
–
Freezing and Freeing DOF development (DOFD): These experiments are characterized by starting learning with the farthest joints completely locked and at generation 30 (1/10 of the total learning period), these DOFs are abruptly or gradually released up to generation 90 (quadruped case) or 150 (NAO case). After that generation, the experiment continues as a ND one. Again, for the NAO, this type of developmental strategy is applied to the joints of the ankles, knees and shoulders.

A summary of these experiments with a brief description of their characteristics is presented in Table 7. In addition, the source code of each experiment is available for the quadruped¹ and for the NAO².

Table 7.
Summary of experiments and characteristics for quadruped and biped morphologies.

Experiments Characteristics Quadruped morphology Biped morphology

Fixed Morphology No Development (ND) Reference Experiment: $J R O M_{R E F}, D O F_{R E F}$ Figures 6 and 8 Figures 9 and 10

Reduced No Development (RND) Lower $J R O M$ : $J R O M_{R N D} < J R O M_{R E F}$ Figure 7 Figure 10

Morphological Development Proximodistal Joint Development (PJD) Linear development in proximodistal joints: while development: $J R O M_{P J D} < J R O M_{R E F}$ after development: $J R O M_{P J D} = J R O M_{R E F}$ Figures 6 and 8 Figures 9 and 10

DOF Development Abrupt (DOFDA) Freezing some joints, then abrupt release: while development: $D O F_{D O F D A} < D O F_{R E F}$ after development: $D O F_{D O F D A} = D O F_{R E F}$ – Figure 9

DOF Development Gradual (DOFDG) Freezing some joints, then gradual release: 1st developmental stage: $D O F_{D O F D G} < D O F_{R E F}$ 2nd developmental stage: $D O F_{D O F D G} = D O F_{R E F}$ $J R O M_{D O F D G} < J R O M_{R E F}$ after development: $D O F_{D O F D G} = D O F_{R E F}$ $J R O M_{D O F D G} = J R O M_{R E F}$ Figure 6 Figure 10

5. Quadruped morphology

	Maximum JROM available for the Lower Limb (LL) of the quadruped
80°	[ $-$ 10°, 10°]
70°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]
60°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]
50°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]
40°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]
30°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]	[ $-$ 60°, 60°]
20°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]	[ $-$ 60°, 60°]	[ $-$ 70°, 70°]
10°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]	[ $-$ 60°, 60°]	[ $-$ 70°, 70°]	[ $-$ 80°, 80°]
0°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]	[ $-$ 60°, 60°]	[ $-$ 70°, 70°]	[ $-$ 80°, 80°]	[ $-$ 90°, 90°]
$-$ 10°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]	[ $-$ 60°, 60°]	[ $-$ 70°, 70°]	[ $-$ 80°, 80°]
$-$ 20°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]	[ $-$ 60°, 60°]	[ $-$ 70°, 70°]
$-$ 30°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]	[ $-$ 60°, 60°]
$-$ 40°	[ $-$ 10°, 10°]	[ $-$ 20°, 20°]	[ $-$ 30°, 30°]	[ $-$ 40°, 40°]	[ $-$ 50°, 50°]

Experiments	Characteristics	Quadruped morphology	Biped morphology
Fixed Morphology	No Development (ND)	Reference Experiment: $J R O M_{R E F}, D O F_{R E F}$	Figures 6 and 8	Figures 9 and 10
Reduced No Development (RND)	Lower $J R O M$ : $J R O M_{R N D} < J R O M_{R E F}$	Figure 7	Figure 10
Morphological Development	Proximodistal Joint Development (PJD)	Linear development in proximodistal joints: while development: $J R O M_{P J D} < J R O M_{R E F}$ after development: $J R O M_{P J D} = J R O M_{R E F}$	Figures 6 and 8	Figures 9 and 10
DOF Development Abrupt (DOFDA)	Freezing some joints, then abrupt release: while development: $D O F_{D O F D A} < D O F_{R E F}$ after development: $D O F_{D O F D A} = D O F_{R E F}$	–	Figure 9
DOF Development Gradual (DOFDG)	Freezing some joints, then gradual release: 1st developmental stage: $D O F_{D O F D G} < D O F_{R E F}$ 2nd developmental stage: $D O F_{D O F D G} = D O F_{R E F}$ $J R O M_{D O F D G} < J R O M_{R E F}$ after development: $D O F_{D O F D G} = D O F_{R E F}$ $J R O M_{D O F D G} = J R O M_{R E F}$	Figure 6	Figure 10

We begin our study of the influence that the application of JROM development has on learning by addressing the validity of each of the hypotheses that were established considering a quadruped morphology.

5.1. H1: A reduction of the motor space facilitates learning

To start, we first select different initial configurations of the quadruped (3 different joint angle offsets: 0°, 30°, and 60°) and evaluate the results in three cases: 1) When there is no variation in the JROM (ND); 2) When learning begins at generation 0 with a limited JROM of the lower limbs available (half of the final one) and it increases gradually until reaching the maximum JROM available at generation 90 (PJD); and 3) when the JROM of the lower limbs joints’ starts completely blocked up to generation 30. After that, the JROM is progressively released until reaching its maximum range, at generation 90. From there on, the learning continues with the adult morphology (DOFD). In all cases, ND, PJD, and DOFD, learning continues from generation 90 onwards with a fixed morphology and the JROM available is the same for all of them. The comparative results of these experiments are presented in Figure 6. The characteristics of the boxplots are the same for all the boxplots in the article. Each boxplot corresponds to the median and the 75 and 25 quartiles of the results obtained at the end of learning of the 50 independent runs. The whiskers are extended to 1.5 of the interquartile range (IQR). Single points are values that are out of the IQR. The statistical analysis has been carried out using the two-tailored Mann-Whitney U test.⁵⁰ We consider a p-value of 0.05 as the significance value for accepting or rejecting the null hypothesis (the compared samples are equal). A Bonferroni correction⁵¹ has been applied to the statistical analysis. The comparative results of these experiments are presented in Figure 6. For clarity, the numerical results from the statistical analysis were replaced with asterisks.

Figure 6.

Results for the quadruped, considering a Joint Angle Offset (JAO) of the lower limbs’ joints of 0°, 30° and 60° for the Proximodistal JROM Development (PJD), the Freezing and Freeing Degrees of Freedom (DOFD) and the No Developmental (ND) cases.

Results of the learning process with the joints of the lower limbs starting at 0° (JAO: 0°) are shown in Figure 6-left. In it, there is no difference in learning between the ND and the developmental experiments (PJD and DOFD). When the JAO changes to 30° (Figure 6-middle), results partially change. PJD surpasses the distance achieved by the ND case (p-value of 1.3 $\cdot 10^{- 4}$ ), but the DOFD case does not present any difference when compared to ND. These results are in line with those of the joint angle offset of 60°: PJD improves learning (p-value lower than $10^{- 4}$ ) while DOFD does not imply any learning improvement. This shows that as the angle offset of the joints of the lower limbs increases, the learning performance of the JROM development strategies improves, not only with respect to the corresponding ND experiment, but also in absolute terms. However, it seems that for the ND cases, the JAO of the lower limbs is irrelevant, because similar performance is achieved in the three cases.

Secondly, having observed that the configuration of morphology can affect the influence of JROM-based development on learning (in this case, the initial position of the joints of the lower limb) we now carry out a series of experiments in which we try to identify the causes of this effect. They consist in a sweep of multiple morphological configurations with a fixed morphology (no development), combining different values of the JAO and the available JROM of the joints of the lower limbs. The results are shown in Figure 7. The colour for each grid cell represents the median of the distance travelled, in meters, of 50 independent experiments for each combination of JAO and JROM displayed in Table 6. The best-valued combinations (those with a strong orange-red colour) are grouped around the JAO range between 20° and 80° and with total JROMs between 40° and 80°. Furthermore, according to Figure 7, a JAO of 30° and total JROM of 80° (leading to a JROM range of [ $-$ 40°, 40°]) is the combination that produces the best results (median distance achieved of 4.817 m), indicating that, for the no development cases, learning is favoured when the available JROM is small (Figure 7) and not as large as the ones of Table 2 (for example, in Figure 7 the JAO of 0° has a total JROM of 180°). These results are partly related to what some authors had indicated when they mentioned that JROM favoured learning thanks to an initial reduction of the motor space.^30,31 However, this does not occur in all cases and under the same conditions. Thus, it seems that the learning improvement associated with a reduction of the motor space is related to the ability of the learning algorithm to explore optimal areas of the Solution Space ( $S S$ ) associated with the available motor space of the morphology, defined in our case by the JAO and the JROM.

Figure 7.

Joint Command Space Sweep (without development) addressing different joint angle offsets from $-$ 40° to 80°, as well as different total JROM of the lower links of the quadruped, according to Table 6.

5.2. H2: Developmental speed influences learning

To further explore this hypothesis, we have carried out a series of experiments in which the motor space is initially strongly reduced (to 1/8 of the final JROM) and is progressively released until reaching the final JROM. This strong limitation has been tested over two JAO cases: one with a JAO of 30°, and another one with a JAO of 60°. Thus, on the one hand, in the case of the JAO of 30°, an initial JROM of 1/8 of the final one (an initial JROM of [ $-$ 15°, 7.5°]) means a strong reduction of the motor space that includes the optimum one shown in Figure 7 (JOA of 30° and JROM of [ $-$ 40°, 40°]). On the other hand, as displayed in Figure 7, starting learning with a JAO of 60° and a JROM of [ $-$ 15°, 7.5°] implies a motor space that does not include the optimum. It is characterized by suboptimal solutions that are close to the optimum and this should make it more difficult to find optimal solution. In addition, different developmental speeds for each case have been considered (development can finish from generation 30 to 180, depending on the type of experiment) to evaluate how the changes in the available motor space affects the capacity of the learning algorithm to find optimal solutions. These results are displayed in Figure 8. Figure 8-left presents the results for the configuration of the morphology with a JAO of 30°. It shows how these experiments clearly outperform the results of the ND case (fixed JROM between [ $-$ 120°, 60°]). However, the learning improvement originated from such a drastic reduction of the initial JROM available does not happen in all morphological configurations. For example, performing such a reduction in experiments with a JAO of 60° (Figure 8-right), instead of 30°, with an initial JROM of [ $-$ 22.5°, 3.75°], we found that there are instances that produce better results than the ND one (those with the slower developmental speed) and others that are irrelevant (those with the faster one). In addition, only the PJD up to generation 180 can achieve similar performance to those displayed in Figure 6-right, obtaining 4 stars of significance.

Figure 8.

Results for initial JROM at 1/8 of the final value in Proximodistal JROM Development (PJD) experiments with Joint Angle Offset (JAO) of 30° (left) and 60° (right). The number with “PJD” indicates the generation where development ends.

Consequently, it seems that developmental speed is quite important when the optimum is not contained within the initial motor space, hinting at the fact that the exploration capabilities of the algorithm used to explore the Solution Space is related to how fast development should be.

5.3. H3: A suitable synergy between the learning algorithm and the JROM developmental strategy is needed

From Figures 6 to 8, we can extract that there are morphological configurations that are more adequate for learning to walk than others, considering as morphological configuration both the JAO and the JROM available (the experiments presented in Figure 7 are a clear example of that). In addition, reducing the JROM available for each joint implies a reduction of the $J C S$ , or in other words, the actual motor space, which obviously, simplifies the task of the learning algorithm (NEAT in this case) finding optimal commands.

This is the key point of JROM development: It helps to improve learning performance by reducing the size of the $J C S$ and thus facilitates finding the optimal solutions. However, because of what was shown, it is not enough to reduce the $J C S$ to improve learning efficiency. Other factors must accompany it. These are:

The optimal joint commands (the area where the optimum is located in the motor space) must be available in the initial reduced $J C S$ or it should be relatively close to it. Otherwise, the learning algorithm will be exploring suboptimal commands.

The effects of the JAO can be observed in Figure 6, where JROM development is irrelevant for a JAO at 0° (far from the optimum), but beneficial at 30° and 60°, which given the initial JROM of one half the final one, both include the optimum in the initial $J C S$ . In fact, when the initial JROM is even smaller, in the case where it contains the optimum, the results are better. This is observed by comparing the JAO at 30° where, although development experiments outperform ND, the results achieved by an initial JROM of 1/8 (Figure 8-left) are better than those obtained by an initial JROM of 1/2 (Figure 6-middle). In other words, these figures show how the more we initially focus on the optimum area, the better. However, the area of the optimum is usually not known beforehand, being this one of the reasons behind the heterogeneous results of different developmental strategies: some of them include the optimal solutions and others do not. Consequently, more open initial JROMs help to make sure the optimum is within the search area or very close to it so that as the JROM increases it will soon include it. These comments are confirmed by reducing the initial JROM to 1/8 of the final one in the case with a JAO of 60° (Figure 8-right). This effectively removes the optimum from the initial motor space, although it is still close to it, allowing the development process to enlarge the motor space until it includes the optimum in a short development time. The results for this case are not as good as before and actually seem to depend on development speed, that is, on how long it takes for the learning algorithm to find the optimum.

The developmental speed for learning. As indicated before, Figure 8-right shows how considering different developmental speeds for the same JAO (60°) and initial JROM available (1/8) the results can be completely different. This may be motivated by the speed of change of the $J C S$ , and the ability of the learning algorithm to maintain the relationship between optimal solutions of the $S S$ and optimal $J C$ from the $J C S$ . If the $J C S$ changes too fast, the changed fitness landscapes are like new problems for NEAT and there is no developmental continuity in the exploration of the $S S$ to find controllers that achieve the walking task. It is this continuity among developmental stages what provides development with its shaping power, simplifying the task of the learning algorithm in finding optimal solutions. That is, if development is too fast, it becomes difficult to find the optimal solution (controller) that matches with the optimal joint command as the morphology changes too fast for the learning algorithm to make use of the sequence of morphologies, making development irrelevant. However, if the $J C S$ changes slowly, or at least slow enough for the given problem, NEAT can find a solution that produces $J C s$ in the area of the optimal one in the initial version of the $J C S$ , or at least in an early enough version of the $J C S$ so that it is not thrown off by the changes in the morphology. Then, as development progresses and the size of the $J C S$ increases to the final one, the learning algorithm adapts the controller to the optimal commands. That is, it moves through its solution space, so as to preserve these optimal $J C s$ , thus achieving a suitable synergy between the learning algorithm and development. To a lesser degree, this is also shown in Figure 8-left, where the highest medians and lower dispersions are found in the PJD150 and PJD180 experiments, and the lowest, for the PJD30 one.

Thus, it seems that the influence of JROM development as a developmental strategy is a combination of multiple factors, but all of them are related to the reduction of the $J C S$ and the time the learning algorithm needs to explore and exploit the Solution Space of possible controllers to find those optimal solutions and to preserve them as development progresses. This is what constitutes the “suitable synergy between the learning algorithm and the JROM developmental strategy” that favours learning the walking task.

Videos of the best individuals obtained in some of the experiments can be found in our repository³.

6. Biped morphology

This section is intended to complete and complement the results obtained with the quadruped in a more taxing problem from a walking perspective. Biped walking is much more unstable and exploring the motor space becomes much harder.

6.1. H1: A reduction of the motor space facilitates learning

In this case, we started by studying the results obtained by applying different JROM development strategies and comparing them with the ND results (Figure 9). Regarding the fitness values (Figure 9-left), the majority of the developmental strategies have provided better results than the no developmental case (with p values below $10^{- 4}$ ) being the best one the Proximodistal JROM development strategy starting with half of the final JROM (PJD-0.5), with a median value of 0.730. The exception is the developmental strategy based on an abrupt freeing of the DOF (DOFDA), characterized by starting with the ankle and knee joints locked, and completely releasing them at generation 30. Such a developmental strategy lead to worse results than the ND one (p-value of 0.00102).

Figure 9.

Learning results for the NAO under Proximodistal JROM development: fully limited JROM (PJD-0), half JROM (PJD-0.5), abrupt DOF development (DOFDA), gradual DOF development (DOFDG), and No Development (ND). Left: Fitness at the end of learning (PJD ends at generation 150; DOFDA fully frees DOF at generation 30; DOFDG gradually increases JROM from generation 30 to 150). Middle: Distance travelled at the end of learning. Right: Percentage of falls of the best individuals for each type of experiment at each generation. The vertical dotted black line at generation 30 indicates the abrupt release of the DOFs

The two parameters that make up the fitness value, travelled distance and the number of falls, are plotted in Figure 9-middle and Figure 9-right. Figure 9-middle displays the distance travelled in each type of experiment at the end of learning. The PJD experiments clearly beat the results of the ND case with a high statistical significance (p values below $10^{- 4}$ ). The DOFDG experiment also outperforms the results of ND, but with a lower significance value (p-value of 0.016). Conversely, the DOFDA experiment has proven to be irrelevant when compared to the ND case (p-value of 0.775).

Figure 9-right represents the percentage of falls that the best individuals of all independent executions suffer at each generation. At the end of learning, the ND experiment achieved 48 $%$ , indicating that almost half of the best individuals are not able to maintain the upright position. The worst ratio is obtained by the DOFDA experiment, with 72 $%$ . On the other hand, better results were encountered by the DOFDG (22 $%$ ) and PJD-0.5 (8 $%$ ) experiments. In this figure, it can be observed how: 1) Although at the beginning of learning all developmental strategies quickly reduce the number of falls, for the DOFDG experiment, as long as the JROM progresses, the number of falls increases until development ends; 2) For the DOFDA, once development ends and all DOF are released with the full JROM, the percentage of falls reaches the maximum (100 $%$ of falls) and gradually decrease but never below the ND value.

These results are in line with those obtained in the case of the quadruped: the relevance of JROM development is conditioned by both the reduction of the motor space, how it is reduced and how it is increased until reaching the final motor space, and by the capacity of the NEAT algorithm to find optimal solutions in the $S S$ in each stage of the reduced motor space.

Considering that these results show how an initial reduction of the JROM and its subsequent gradual release has improved learning to a higher or lower degree, a question arises regarding the relevance of the motor space reduction: What would happen if the whole learning process were carried out by a no-developmental type of experiment whose JROM is reduced to half or a quarter (strong reduction) of the one shown in Table 3? This question aims to support the hypothesis that it is not enough to reduce the motor space to improve learning because the development of the morphology is also needed for maximizing learning performance. In addition, the proposed experiments will also allow us to address the hypotheses raised in the literature, but not studied in depth by the authors, about the relevance of JROM development in learning. Hypothesis such as “it helps to find a high-yield area in the parameter space”,¹⁷ “Strong constraints lead to a faster and safer learning”,³¹ “an initial reduction of the motor space, which simplifies exploration”.³⁰

A set of experiments with a reduced motor space and fixed morphology, RND-0.5 and RND-0.25 which have a maximum JROM of 1/2 and 1/4 of that of ND respectively, are compared with a series of developmental experiments. These developmental experiments start learning with the same reduction of the motor space, but the motor space increases as long as development progresses, until reaching the final one, which is the same as that of the ND case. These are the PJD-0.5 and PJD-0.25 experiments. In addition, all of them are compared with the ND case. The results of this comparison are presented in Figure 10. Figure 10-left shows how, regarding the fitness value, all experiments outperform the ND one, with p-values lower than $10^{- 4}$ . Results that are quite similar to the ones obtained evaluating the distance travelled (Figure 10-middle), encountering p-values lower than $10^{- 4}$ , except in the case of RND-0.25, whose results are similar to ND (p-value of 2.613) , being surpassed by the other experiments, both in the fitness achieved and distance travelled (p-values lower than $10^{- 4}$ ). Regarding the number of falls in each experiment, except for ND (48 $%$ of falls) the other experiments show very stable gaits at the end of learning, with fall values below 15 $%$ (8 $%$ falls for the PJD-0.5, 14 $%$ for the PJD-0.25, 10 $%$ for the RND-0.5 and 2 $%$ for the RND-0.25). The lowest value of RND-0.25 indicates that, although a strong reduction of the motor space helps to achieve a higher fitness value due to an increment in the stability, in this case the reduction is too strong, and optimal solutions are out of the initial motor space.

Figure 10.

Comparison of No Development (ND), Proximodistal Joint Development (PJD-0.5, PJD-0.25), and fixed morphology experiments (RND-0.5, RND-0.25). Left: Fitness at the end of learning. Middle: Distance travelled at the end of learning. Right: Percentage of falls of the best individuals for each type of experiment at each generation.

6.2. H3: A suitable synergy between the learning algorithm and the JROM developmental strategy is needed

The results obtained by limiting the JROM in different ways (DOFD and RND) show how a reduction in the motor space leads to safer learning, by decreasing the number of falls. Statement that has also been mentioned by some authors in the literature on learning to walk with bipedal morphologies.³¹ However, it is important to remark that this does not seem to indicate that better results per se have to be obtained. On the one hand, the results of the RND-0.5 experiment seem to indicate that if the reduced $J C S$ contains the set of optimal $J C s$ it is easier for the learning algorithm to find them in the $S S$ . On the other hand, if such reduction is too large (RND-0.25) the optimum $J C s$ for the problem are far outside the reduced $J C S$ , being unreachable and preventing the learning algorithm from finding them and thus, eliminating the advantage that JROM development provides. Furthermore, such a reduction of the motor space may lead us to search space solutions so far from the optimal zone that the learning algorithm may not be able to find them, as in the DOFA case of Figure 9. These results are in consonance to what has been displayed in Figure 7 for the quadruped. However, it must be said that in a general learning problem, researchers do not know beforehand which areas of the $S S$ are optimal or not. Also performing a sweep, such as the one displayed in Figure 7, that seeks the combination of best solutions, or searching randomly for the suitable DOF to lock hoping to obtain a satisfactory solution are not feasible solutions. Then, as mentioned for the quadruped case, the developmental sequence from an initially constrained morphology to the final one allows the exploration of the whole $J C S$ available, and thus provides the opportunity of finding optimal gaits.

Videos of the best individuals obtained in some experiments can be found in our repository⁴.

7. Discussion

In the previous experiments, JROM-based development has been favourable or irrelevant to learning (Figures 6 to 9), but not unfavourable, except for abrupt development in the case of NAO (DOFA), where the fitness values where worse than in the case of ND (Figure 9-left), although the robots travel a similar distance (Figure 9-middle). Nevertheless, obtaining favourable results by implementing JROM development is not straightforward, and some specific conditions must be fulfilled.

7.1. Positive outcomes

In our scenario, motor development favours learning based on: 1) Having a specific morphological configuration, defined by the morphology, the JAO and JROM, which determine the motor space (or the $J C S$ ) in each particular case (Figure 6); 2) A developmental speed that aligns with the learning algorithm’s ability to adjust to changes in the controller-morphology relationship, based on the variations over the transfer function $v$ induced by development (Figure 8), and 3) A developmental strategy aligned with the morphology configuration, matching the reduced motor space or $J C S$ with the Solution Space (Figures 6 and 9-left) throughout the development process. In other words, JROM development influences learning based on its ability to reduce and modify the $J C S$ , as well as the ability of the learning algorithm to explore and exploit the Solution Space associated with that reduced motor space at each development stage.

The effectiveness behind the modification of the $J C S$ seems quite obvious: a reduction of the JROM, by acting over the $v$ function, implies a reduction of the size of the $J C S$ , both in the quadruped and NAO, limiting the movements and actions that can be executed. With fewer movements available, the learning algorithm’s ability to explore and exploit the Solution Space associated with such a reduced $J C S$ is increased. This facilitates finding optimal solutions when the optimal joint commands are included in such reduced $J C S$ . Otherwise, development can be irrelevant or detrimental for learning. For example, this is what happens in the case of the biped when considering RND-0.5. The learning algorithm only needs to adjust the parameters of the controller for a motor space that is much smaller than the motor space of ND (concretely, as the NAO has 14 DOF and each one is reduced by one-half, the motor space is $2^{14}$ times smaller). Another example is the optimum in Figure 7, in which, although no DOF has been locked and all joints are operational, the JROM is so narrow that the task of the learning algorithm is greatly facilitated. So much so that the use of JROM is irrelevant since the learning results are already excellent in the case of ND.

On the other hand, the continuous variation of the $J C S$ during development helps the learning algorithm to avoid stagnation in local optima due to continuous changes in the relationship between the controllers in the Solution Space and the commands found in the $J C S$ . In this scenario, the learner tries to maintain the relationship between optimal solutions and optimal commands when the $v$ function changes. For example, Figures 7 and 10 show how a reduction of the $J C S$ implies a learning improvement compared to the reference experiments, the ND ones defined in Table 2 (quadruped) and Table 3 (NAO).

These conclusions are in line with our previous work^13,52 studying growth development. In them, we encounter that growth development (modifying the parameters of the links $L$ ) does not influence either the $J C S$ or the Solution Space, because $v$ remains constant, but it modifies the characteristics of the fitness landscape (see Figure 1). Such modifications are motivated by the variations in the relationship between a given solution of the Solution Space and the fitness values obtained by just developing the length of the links (the robot morphology).

In the current article addressing joint development, the relationship between the fitness landscape and the Solution Space is also modified by means of variations in the $v$ function, altering the relationship between the Solution Space and the $J C S$ . Consistent with our previous findings, the impact of development speed is also relevant: there is an optimal development speed that yields significant improvements, while others seem to be irrelevant, especially the rapid ones. They do not necessarily enhance learning and often produce results comparable to those observed in the absence of development.

Another point to mention is the increase in the robot’s stability. Reducing JROM eliminates actions that involve large movements that can destabilize the robot balance. Thus, small JROMs facilitate controlling the robot’s movements and increase its stability, which is especially relevant in the case of the NAO. Fewer falls during learning, especially at the beginning, produce more informative individuals (the learning processes are not interrupted and restarted) facilitating finding optimal solutions (Figure 9-right) and reducing the so called bootstrap problem.

In some cases a simple reduction of JROM without development has led to better results than ND with the full ROM, the JROM developmental process is needed because the optimal reduction of the $J C S$ is not known. In this context, as it is generally not possible to carry out a sweep of all possible solutions (as shown in Figure 7) to select the optimal one (Figure 8), limiting the $J C S$ and not enlarging it developmentally may cause the undesired effect of leaving the learning algorithm manually trapped in suboptimal areas of the $J C S$ . Thus, a smaller and fixed $J C S$ may not find the best solutions in the complete $J C S$ , as shown in the left columns of Figure 7, and in Figure 10. This does not happen if the JROM can develop up to the values of ND.

Finally, the implementation of JROM development in real robots presents a clear advantage: it does not modify the body of the robot. Then, compared to other development developmental strategies, such as our previous work studying growth,^13,41 JROM can be applied to any robot that is already in the market without requiring any hardware modification.

7.2. Limitations

On the other hand, we have seen how it is not enough to simply reduce the $J C S$ to improve learning. The $J C S$ reduction must be aligned with the configuration of the morphology that defines it (JAO and JROM available) and with the ability of the learning algorithm to adjust the controllers of the Solution Space to such motor space. This is observed in Figures 6 and 9 where different developmental strategies that reduce the $J C S$ , have given way to clearly different results, i.e. from being advantageous to even detrimental, caused by an unlucky mismatch between the reduction of the motor space caused by the motor development and the optimal controllers of the Solution Space. In addition, Figure 8-left, shows how, although all cases involved the same reduced Solution Space at the beginning of learning, time is needed for the NEAT algorithm to explore and exploit the $S S$ in a manner that allows it to take advantage of such initial reduction, otherwise, results are irrelevant. Detrimental results also arise when a complete mismatch between these three parameters occurs, as is the case of the DOFA of Figure 9. In this particular case, it seems that the catastrophic forgetting on the NN that happens when the locked DOF is released harms and delays learning.

Another limitation we have observed, and which is also reflected in the literature, is that morphological development based on JROM is much more favourable, among other cases, when it is slow and has an effect on the stability of the morphology, as is the case of the biped. This influence on stability can reduce the efficiency of JROM in those tasks where stability has little or no influence, as is the case of reaching.

These limitations present a challenge for the direct and effective implementation of the JROM strategy in real robots. Specifically, it is not straightforward either to predetermine the morphological configuration that optimally facilitates motor development to enhance learning or the developmental speed for a given morphology. Consequently, experimental tests are required to identify the optimal combination of parameters to provide the best results.

8. Conclusion

The main conclusion of the work presented here is that the influence of JROM-based morphological development is related to three main experimental parameters or conditions: 1) The reduction of the motor space, defined by the morphology (quadruped and biped) and its configuration (different initial positions of the joints and JROM available); 2) The developmental speed, responsible for maintaining an optimal relationship between the changes in the morphology that JROM development caused, and the learning algorithm’s capacity to respond to these changes: if the developmental speed is too fast, the effect of morphological development becomes irrelevant for learning; and, 3) a synergy between the learning algorithm and the motor space at each developmental stage, encapsulating the previous two parameters, coupled to the type of developmental strategy. This synergy represents the capacity of the learning algorithm to establish a relationship between the reduced motor space, defined by the $J C S$ , and the set of possible solutions in each development stage (depending on the developmental speed and strategy selected). In other words, it represents the balance between the capacity to explore and exploit the $S S$ associated with the fitness landscape, which is characterized by the $J C S$ at each developmental stage. This is observed for the biped morphology, where different results were encountered between DOFG and DOFA, two developmental strategies that start with the same morphological configuration, and the same reduced motor space, but use a different developmental strategy, and for the quadruped, where the same developmental strategies over the same morphologies, but with different configurations, also offer different results.

The mechanisms commented above, in essence, reduce the available Joint Command Space at the beginning of development, facilitating the search and exploitation of optimal solutions when these are included in the reduced Joint Command Space or are included in the initial stages of development and then provide an appropriate development path. This path involves a continuous modification of the solution space, and thus, the fitness landscape throughout development, at a speed that allows the learning algorithms to adapt, and that guides these algorithms towards the optimal controller. When this evolution of the fitness landscape takes place in a manner that does not support the above mentioned synergies, development using JROM may be irrelevant or even harmful.

In particular, the results presented in this study have shown how the Proximodistal Joint Development (PJD) strategy has offered better learning performance than learning without morphological changes, or using the other developmental strategies tested (freezing and freeing degrees of freedom with abrupt and gradual release). Nevertheless, most of the other developmental mechanisms where a gradual in crease of JROM at the right speed was contemplated were also quite successful in achieving the desired result, attesting to the robustness of the general approach.

There is obviously still a lot of work to be carried out in this field, both in the study of JROM development strategies in different use cases, to confirm the results obtained here, and in the study of other developmental strategies such as sensor development. In our work we have started this path with growth based and JROM based development and will continue with sensor development and the study the interactions that may occur when different modalities are used simultaneously.

Footnotes

Acknowledgments

This research was partially funded by the European Union’s Horizon 2020, Research and Innovation Programme, GA 101070381 (“PILLAR-Robots - Purposeful Intrinsically-motivated Lifelong Learning Autonomous Robots”), by Xunta de Galicia (EDC431C-2021/39 and M. Naya-Varela’s grant ED481B), by the Spanish Science and Education Ministry (PID2021-126220OB-I00), and the Ministry for Digital Transformation and Civil Service and Next-Generation EU/RRF (TSI-100925-2023-1), by “ERDF A way of making Europe’, Centro de Investigación de Galicia “CITIC” (ED431G 2019/01), and “Centro de Supercomputación de Galicia” (CESGA).

ORCID iDs

Martín Naya-Varela

Andrés Faiña

Alejandro Romero

Richard J. Duro

Conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

Notes

References

Tommasino

Caligiore

Mirolli

, et al. A reinforcement learning architecture that transfers knowledge between skills when solving multiple tasks. IEEE Trans Cognit Dev Syst 2019; 11: 292–317.

Becerra

Romero

Bellas

, et al. Motivational engine and long-term memory coupling within a cognitive architecture for lifelong open-ended learning. Neurocomputing 2021; 452: 341–354.

Xie

Jin

. An extended reinforcement learning framework to model cognitive development with enactive pattern representation. IEEE Trans Cognit Dev Syst 2018; 10: 738–750.

Romero

Meden

Bellas

, et al. Using perceptual classes to dream policies in open-ended learning robotics. Integr Comput Aided Eng 2023; 30: 205–222.

Prieto

Romero

Bellas

, et al. Introducing separable utility regions in a motivational engine for cognitive developmental robotics. Integr Comput Aided Eng 2019; 26: 3–20.

Romero

Bellas

Prieto

, et al. Developmental learning of value functions in a motivational system for cognitive robotics. In: 2020 International joint conference on neural networks (IJCNN), 2020, pp.1–8. IEEE.

Nadizar

Medvet

Miras

. On the schedule for morphological development of evolved modular soft robots. In: Medvet E, Pappa G and Xue B (eds) Genetic Programming. Cham: Springer International Publishing, 2022, pp.146–161.

Max Lungarella

Giorgio

Metta

Sandini

. Developmental robotics: a survey. Connect Sci 2003; 15: 151–190.

Müller

Hoffmann

. What is morphological computation? on how the body contributes to cognition and control. Artif Life 2017; 23: 1–24.

10.

Freyberg

Hauser

. The morphological paradigm in robotics. Stud Hist Philos Sci 2023; 100: 1–11.

11.

Naya-Varela

Faíña

Duro

. Morphological development in robotic learning: a survey. IEEE Trans Cognit Dev Syst 2021; 13: 750–768.

12.

Zhu

Rong

Iida

, et al. Bootstrapping virtual bipedal walkers with robotics scaffolded learning. Front Robot AI 2021; 8

13.

Naya-Varela

Faina

Mallo

, et al. A study of growth based morphological development in neural network controlled walkers. Neurocomputing 2022; 500: 279–294.

14.

Savastano

Nolfi

. A robotic model of reaching and grasping development. IEEE Trans Auton Ment Dev 2013; 5: 326–336.

15.

Gómez

Eggenberger Hotz

. Investigations on the robustness of an evolved learning mechanism for a robot arm. In: Proc. of the 8th int. conf. on intelligent autonomous systems (IAS-8), 2004, pp.818–827.

16.

Benureau

FCY

Tani

. Morphological development at the evolutionary timescale: robotic developmental evolution. Artif Life 2022; 28: 3–21.

17.

Lungarella

Berthouze

. On the interplay between morphological neural environmental dynamics: a robotic case study. Adapt Behav 2002; 10: 223–241.

18.

Liu

Rong

Neri

, et al. Deep deterministic policy gradient with constraints for gait optimisation of biped robots. Integr Comput Aided Eng 2024; 31: 139–156.

19.

Bernstein

. The Coordination and Regulation of Movements. London: Pergamon Press, 1967.

20.

Gray

. Changes in movement coordination associated with skill acquisition in baseball batting: freezing/freeing degrees of freedom and functional variability. Front Psychol 2020; 11.

21.

Bongaardt

Meijer

. Bernstein’s theory of movement behavior: historical development and contemporary relevance. J Motor Behav 2000; 32: 57–71.

22.

Berthier

Clifton

McCall

, et al. Proximodistal structure of early reaching in human infants. Exp Brain Res 1999; 127: 259–269.

23.

Arutyunyan

Gurfinkel

Mirskii

. Investigation of aiming at a target. Biophysics 1968; 13: 642–645.

24.

McDonald

Van Emmerik

REA

Newell

. The effects of practice on limb kinematics in a throwing task. J Mot Behav 1989; 21: 245–264.

25.

Vereijken

Emmerik

REA

Whiting

HTA

, et al. Free(z)ing degrees of freedom in skill acquisition. J Mot Behav 1992; 24: 133–142.

26.

Haehl

Vardaxis

Ulrich

. Learning to cruise: bernstein’s theory applied to skill acquisition during infancy. Hum Move Sci 2000; 19: 685–715.

27.

Ramírez-Contla

Cangelosi

Marocco

. Developing motor skills for reaching by progressively unlocking degrees of freedom on the icub humanoid robot. In: Proceedings of the post-graduate conference on robotics and development of cognition, 2012.

28.

Berthouze

Lungarella

. Motor skill acquisition under environmental perturbations: on the necessity of alternate freezing and freeing of degrees of freedom. Adapt Behav 2004; 12: 47–64.

29.

Bongard

. Morphological change in machines accelerates the evolution of robust behavior. Proc Natl Acad Sci 2011; 108: 1234–1239.

30.

Baranes

Oudeyer

. The interaction of maturational constraints and intrinsic motivations in active motor development. In: IEEE international conference on development and learning (ICDL), 2011, pp.1–8.

31.

Lapeyre

Oudeyer

. Maturational constraints for motor learning in high-dimensions: The case of biped walking. In: 2011 11th IEEE-RAS international conference on humanoid robots, 2011, pp.707–714.

32.

, et al. A developmental evolutionary learning framework for robotic Chinese stroke writing. IEEE Trans Cognit Dev Syst 2022; 14: 1155–1169.

33.

Braud

Giagkos

Shaw

, et al. Robot multimodal object perception recognition: synthetic maturation of sensorimotor learning in embodied systems. IEEE Trans Cognit Dev Syst 2020; 13: 416–428.

34.

Giagkos

Lewkowicz

Shaw

, et al. Perception of localized features during robotic sensorimotor development. IEEE Trans Cognit Dev Syst 2017; 9: 127–140.

35.

Lee

Meng

Chao

. Staged competence learning in developmental robotics. Adapt Behav 2007; 15: 241–255.

36.

Campos-Alfaro

Jara

Romero

, et al. Learning adaptable utility models for morphological diversity. In: International work-conference on the interplay between natural and artificial computation, 2024, pp.105–115. Springer.

37.

Romero

Bellas

Becerra

, et al. Motivation as a tool for designing lifelong learning robots. Integr Comput Aided Eng 2020; 27: 353–372.

38.

Ivanchenko

Jacobs

. A developmental approach aids motor learning. Neural Comput 2003; 15: 2051–2065.

39.

Gómez

Lungarella

Eggenberger Hotz

, et al. Simulating development in a real robot: on the concurrent increase of sensory, motor, and neural complexity. 2004.

40.

Choi

Choe

Park

. Reinforcement learning may demystify the limited human motor learning efficacy due to visual-proprioceptive mismatch. Int J Neural Syst 2024; 34: 2450037.

41.

Naya-Varela

Faina

Duro

. Engineering morphological development in a robotic bipedal walking problem: an empirical study. Neurocomputing 2023; 527: 83–99.

42.

Liu

Rong

Neri

, et al. Entropy-weighted numerical gradient optimization spiking neural system for biped robot control. Int J Neural Syst 2024; 34: 2450030.

43.

Liu

Zhang

Mastoi

, et al. A human-simulated fuzzy membrane approach for the joint controller of walking biped robots. Integr Comput Aided Eng 2023; 30: 105–120.

44.

Naya-Varela

Faina

Duro

. Learning bipedal walking through morphological development. In: Hybrid artificial intelligent systems: 16th international conference, HAIS 2021, Bilbao, Spain, September 22–24, 2021, Proceedings 16, 2021. pp.184–195. Springer.

45.

Stanley

Miikkulainen

. Evolving neural networks through augmenting topologies. Evol Comput 2002; 10: 99–127.

46.

Zhu

Huang

. Stable locomotion of biped robot with gaits of sinusoidal harmonics. IEEE Trans Control Syst Technol 2024; 32: 805–817.

47.

Koos

Mouret

Doncieux

. The transferability approach: crossing the reality gap in evolutionary robotics. IEEE Trans Evolu Comput 2012; 17: 122–145.

48.

Deshpande

Hurd

Minai

, et al. Deepcpg policies for robot locomotion. IEEE Trans Cognit Dev Syst 2023; 15: 2108–2121.

49.

Alam

KMR

Siddique

Adeli

. A dynamic ensemble learning algorithm for neural networks. Neural Comput Appl 2020; 32: 8675–8690.

50.

McKnight

Najab

. Mann-whitney U test. Cors Encyclop Psychol 2010; 1–1.

51.

Abdi

. Holm’s sequential Bonferroni procedure. Encyclop Res Design 2010; 1: 1–8.

52.

Naya-Varela

Faina

Duro

. Harnessing growth-based morphological development to facilitate learning ANN-controlled bipedal walking. In: 2022 International joint conference on neural networks (IJCNN), 2022, pp.1–8. IEEE.