Telepresence Mechatronic Robot (TEBoT): Towards the design and control of socially interactive bio-inspired system

Abstract

Socially interactive systems are embodied agents that engage in social interactions with humans. From a design perspective, these systems are built by considering a biologically inspired design (Bio-inspired) that can mimic and simulate human-like communication cues and gestures. The design of a bio-inspired system usually consists of (i) studying biological characteristics, (ii) designing a similar biological robot, and (iii) motion planning, that can mimic the biological counterpart. In this article, we present a design, development, control-strategy and verification of our socially interactive bio-inspired robot, namely - Telepresence Mechatronic Robot (TEBoT). The key contribution of our work is an embodiment of a real human-neck movements by, i) designing a mechatronic platform based on the dynamics of a real human neck and ii) capturing the real head movements through our novel single-camera based vision algorithm. Our socially interactive bio-inspired system is based on an intuitive integration-design strategy that combines computer vision based geometric head pose estimation algorithm, model based design (MBD) approach and real-time motion planning techniques. We have conducted an extensive testing to demonstrate effectiveness and robustness of our proposed system.

Keywords

Socially interactive robot biologically inspired robot head pose estimation vision based robot control model based design embodied telepresence system

1 Introduction

Robotic systems that employ human-like social cues and verbal & nonverbal communication modalities are called socially interactive interfaces/robots [1]. Socially interactive robots are important for domains in which primary function of a robot is to interact socially with humans. These socially interactive systems are used in variety of applications, e.g. in multimedia [2], video teleconferencing [3], distance learning [4], health care [5], etc. In this work, we propose a design and control strategy of a novel socially interactive bio-inspired system - named - Telepresence Mechatronic robot (TEBoT). TEBoT is specifically suitable for tele-presencescenarios.

The technology used in socially interactive systems follows a standard layout consisting of two main components: 1) feedback motion control algorithms, and 2) planning of desired movement. The combination of both allows researchers to design systems that can move in a desired way. These motions are usually pre-specified or planned dynamically online. Among many applications, the task of mimicking a biological-system is a subject that still poses many challenges to researchers [6].

Over the years, a number of socially interactive robots are designed, which vary from single to thirty degrees of freedom [7]. Despite the success of many systems, (e.g., ASIMO by Honda [8], etc.) there is no commonly agreed design procedure that can be employed fully; especially when it comes to dynamic modeling and analysis of a human body. This problem becomes more trivial when a computer vision based recognition, segmentation, modeling, and analysis are needed. Considering these problems, we have revisited the question by building a socially interactive bio-inspired robot which considers a human-in-the-loop as a designer, as an observer and as an interaction partner [1].

We have built a test-bed: a novel interactive head/neck robot - named - Telepresence Mechatronic Robot (TEBoT). The TEBoT design is inspired by a real human neck where unique mechanical design of a TEBoT is build by studying the real head/neck dynamics. Our intuitive design approach is based on model based design (MBD) that has benefits of model analysis, calibration, control and automatic code generation. The TEBoT is controlled by bringing a real human-head-in-the-loop of closed loop control system. For a human-in-the-loop interactive robotic system, a motion capturing system serves as the primary technology to digitally record the human body movements. In the context of head/neck motion capture, we have proposed a novel computer vision algorithm which is based on low cost, low resolution, low bit rate camera that offers several advantages over wearable devices. The proposed computer vision method captures the real time movements of a person’s head and maps them to a TEBoT’s mechanical assembly.

2 Related work

2.1 Interactive robots

In last decades, a number of commercial and non-commercial interactive systems/robots have been developed. These robots can be broadly categorized into i) Non-Anthropomorphic Robots (N-APR) and ii) Anthropomorphic Robots (APR). The characteristics and appearance of N-APRs are not similar to real human characteristics and appearance, but they still possess some socially interactive skills. The most common example of N-APRs are mobile robotics telepresence (MRP) systems. The basic construction of almost all MRPs consist of a mobile robotic base, an LCD screen, a camera, a microphone and some non-verbal gestures, like hand gestures, etc. A comprehensive review on MRPs can be found in [9]. The other example of N-APR robots includes Mebot [10], ESP [11], Jibo [12], Keepon [13], etc. On the other hand, the characteristics and appearance of APRs are similar to real human characteristics and appearance; for example, HRP-4C [14], Actroid DER3[15], Geminoid [16], Furhat [17], etc. are the APRs which look very similar to real human and it can be hard to decide whether it is a robot or a human. There also exists a set of APRs called android which have characteristics similar to humans but their faces are not similar to humans such as ASIMO [8], Telenoid [18], iCub [19], etc. The detail review on socially interactive N-APRs and APRs can be found in [20].

When it comes to video teleconferencing application, there are limitations in almost all previously developed N-APRs and APRs. The N-APRs do not present an accurate nonverbal gestures and APRs are complex, expensive and just represent one person. Furthermore, these systems are explicitly controlled by the mouse, keyboard and other hand-held devices. By considering these limitations, we have designed and built a novel socially interactive system - named - TEBoT, which can present an accurate head gesture along with audio-video communication. The research in psychology shows that among all non-verbal modalities during human conversation, facial expressions and head movements are the most important for the flow of information [21]. TEBoT has the capabilities and appearance of APR and simplicity of N-APR. TEBoT is portable, cheap, easily controllable and present exact non-verbal head gestures. Furthermore, a novel single camera-based human-in-the-loop control strategy is devised for the TEBoT control.

2.2 Human motion capturing approaches

To actuate a socially interactive bio-inspired robot, one of the best possible way is to input it with a real human motion [22]. Approaches to capture these human-body motions can be categorized into two sub-categories:

2.2.1 Sensor based approaches

In sensor based approaches (SBA), the motion capturing sensors (such as, accelerometers, gyroscopes, IMUs, GPSs, motion sensors, force sensors, electromagnetic trackers, ultrasound trackers and pressure sensors, etc.) are mounted to capture the human body movements. For example, wearable sensors attached to the human body can be used to capture human head movement [23], hand movement [24, 25], shoulder-arm movement [26], etc. Similarly, the full-body motion capture suits [27] are also being employed to record the human movements and later to control the robots from head to toe [28]. The details of motion capturing technologies can be found in [29].

2.2.2 Vision based approaches

In vision based approaches (VBA), the human body movements are captured by a set of camera(s) and/or depth sensors (such as MS Kinect, etc.). Recently, the vision based techniques have gained popularity for estimating the pose angles of human head [30, 31] and human hand [32] by using single, multiple or depth cameras. These VBA are also used to captures full body movements by single camera [33], two cameras [34], multiple cameras [35] and depth camera [36]. These body movements are then used to actuate the robotic system [37].

In the context of motion capture, the computer vision based approaches offer several advantages over the wearable sensor based approaches. In SBA, complex body-mounted sensors restrict the body movement in an environment and it is an overhead for a person wearing these sensors. In case of a non-wearable vision based system, the complex configuration and number of costly cameras require suitable laboratory settings for actuating robots. Hence, there is a need to build a system, which is easy to operate, portable and does not require costly and complex laboratory settings. In this work, we consider this important issue, i.e. how to precisely actuate the movement of biological system by using a non-contact, low cost sensor with high performance in terms of speed.

2.3 Contribution

The key contribution is in the design and control of a socially interactive head/neck robot, where real head motion dynamics are considered in the design of mechanical assembly. For motion control, we have proposed a novel vision based geometric head pose estimation technique to capture the human head movements and actuate the robotic platform in real time.

3 Head motion analysis

To analyze the characteristics of a human head dynamics, we mount an inertial measurement unit (IMU) (containing 3-axis gyroscope and 3-axis accelerometer) as shown in Fig. 1. The gyroscope gives measurement of angular velocities around z, y and x-axis denoted as [ $\dot{ψ} (t)$ , $\dot{θ} (t)$ , $\dot{φ} (t)$ ]. The accelerometer outputs linear accelerations [a_z (t), a_y (t), a_x (t)] along z, y and x-axis. These measurements are used to estimate the angular position using kalman filter. These variables are denoted by the vector of generalized coordinate [ψ (t),θ (t), φ (t)].

3.1 Procedure

For this work, we have recruited seven participants ranging in the age from 22 to 45 years. The participants are asked to conduct recordings of the angular motion [ψ (t), θ (t), φ (t)] and angular velocities [ $\dot{ψ} (t)$ , $\dot{θ} (t)$ , $\dot{φ} (t)$ ] by using a head-mounted IMU. The participants are asked to perform five tasks with their head movements, where each task are performed more than ten times. These five tasks represent the actions of saying Yes, No, May-be and also two motion patterns for performing a Circle and triangle.

Head Nod (Yes): Move head up and down, exciting mainly the pitch coordinate.

Head Shake (No): Move head left and right, exciting mainly the yaw coordinate.

Head Roll (May-be): Move ears close to shoulders, exciting mainly the roll coordinate.

Head Circle: Make circles by head movement.

Head Triangle: Make triangles by head movement.

3.2 Frequency analysis

The frequency analysis measures the operating frequency-band of the human-head movements for all the five tasks. The frequency spectrum for each task is estimated by using the measurements of angular velocities. This can be done by applying fast fourier transform (FFT) and/or power spectral density (PSD). The results of the frequency analysis for each task is presented in Table 1. The table shows the minimum, maximum and the mean frequencies of human head movements. Two out of five trajectories can be visualized in Fig. 2, where we show the angular positions [ψ (t), θ (t), φ (t)], angular velocities [ $\dot{ψ}$ (t), $\dot{θ}$ (t), $\dot{φ}$ (t)] and the frequency spectrum for head nod and head circle movements. These frequency-analysis results are used in the design of a robot controller.

3.3 Velocity analysis and threshold calculation

The collected raw data is further post-processed to calculate other important attributes such as, kinematic constraints. These quantities give an information about a range of the head movement and constraints in velocities. These results are shown in Table 2. This information is used to design our robot architecture. Additionally, the velocity constraints indicate the required specifications for the actuators.

Table 2 shows that the head nod, head shake and head roll movements also involve components of yaw, pitch and roll angles. Ideally, these movements should present only pitch movement for head nod, yaw movement for head shake and roll movement for head roll. These extraneous rotational components define our software threshold, which removes an undesired movement around zero velocities both in position estimation and control.

4 Biologically inspired design

4.1 Biological system: Human neck

Human neck has a complex anatomical structure formed by seven cervical vertebrae (see Fig. 3a) and around twenty main muscles (see Fig. 3b). The main role of the neck is to support human head in balance while performing different head movements. Human neck muscles are responsible for performing head movements and the combination of cervical vertebrae and muscles are used to hold the head in upright position. There are three essential head motions, i.e. yaw, pitch and roll (see Fig. 4) and all other movements are combinations and varying percentages of these three head movements. With reference to cervical vertebrae, the neck muscles are divided into left and right side of the neck muscles; to present a parallel configuration acting at both sides of the shoulders. Working in pair, the left and right sides of the neck muscles control the pitch movement. Working individually, these muscles control the yaw and roll movements (see [38] for more details).

4.2 Biological inspired design

To mimic the movements of a human neck, a system should satisfy not only the mobility properties of a human neck, but also static and dynamic characteristics. The CAD model of our mechatronic system is shown in Fig. 5(a) and a working prototype in Fig. 5(c). Our design uses two active limbs (label 7 in Fig. 5(a)) and one passive limb (label 8 in Fig. 5(a)). The passive limb is a central rod connected between a base (label 3 in Fig. 5(a)) and a mounting assembly (label 4 in Fig. 5(a)) via a universal joint (label 6 in Fig. 5(a)). Whereas, the active limbs are the connecting rods connected between the motors (label 2 in Fig. 5(a)) and the mounting assembly. The function of the passive limb is similar to the function of a cervical vertebrae in human neck and the function of the active limbs resembles the function of human neck muscles. The passive limb in combination with two active limbs are used to support the mounting assembly of a tablet PC (label 1 in Fig. 5(a)). The active limbs control the pitch and roll movements; the motion of the yaw is performed by a motor inside the base (see Fig. 5(b) (label D)). Similar to the human neck anatomy, our system is a 3DOF parallel kinematic system actuated by three servo motors assembled on the base.

The results of head motion analysis are considered in the mechanical design of our robot. From Table II, the maximum of average peak angular values for yaw (ψ), pitch (θ) and roll (φ) movements are 1.43 rad, 0.642 rad and 0.65 rad, respectively. Considering these results our mechanical design undergoes ±1.45 rad for yaw movement, ±0.76 rad for pitch and roll movements.

4.3 Motor selection

In upright equilibrium, the weight of the mounting assembly (label 4 in Fig. 5(a)) and tablet PC (label 1 in Fig. 5(a)) is supported by the passive limb (label 8 in Fig. 5(a)). The whole upper assembly is attached to the base through ball bearings (see Fig. 5(b)). Two parameters are considered important in the selection of motors. The torque in kg-cm (or N-m) and the speed in revolution per minute RPM (or rad/sec).

4.3.1 Torque for left and right servo motor

The total weight of a tablet PC (approx. 1 kg), mounting assembly (0.5 kg) and two active limbs (0.25×2 = 0.5 kg) is 2 kg. The length of the left and right servo arm is 4.5 cm (0.045 m). The combined torque required for the left and right servo is 2 kg×4.5 cm = 9 Kg-cm. The required torque for individual servo motor is 9 kg-cm/2 = 4.5 kg-cm.

4.3.2 Torque for base servo motor

The total weight of the upper assembly including a tablet PC, left and right servos is 2.4 kg. The length of base servo arm is 3.3 cm (0.033 m). The required torque for base servo motor is 2.4 kg×3.3 cm = 7.92 kg-cm.

4.3.3 Velocity requirements

The results in the previous section are used to define the required maximum speed for the motors. Table 2 shows that the maximum speed is 2.94 rad/sec (28.074 RPM).

To comply with above specifications, we have selected a TowerPro SG-5010 - Standard servo that can provide a torque of 8 kg-cm and speed of 58.8 RPM at 4.8 V, and a torque of 11 kg-cm and speed of 71.4 RPM at 6 V.

4.4 Controller selection

The servo motors function using PWM signals. To generate these PWM signals, a controller is required that can handle at least three PWM signals in parallel. Based on this requirement we have selected Arduino Uno, which operates at 5 V with a clock frequency of 16 MHz. It has 14 digital input/output pins, of which 6 can be used as a PWM output.

5 Model based design

Model based design (MBD) [39] is currently applied in a variety of industries. Just as computer aided design (CAD) provides a geometric way of describing an equipment, MBD incorporates the dynamics and performance requirements to properly describe an overall system in a simulation environment.

5.1 From CAD to TEBoT simulation

In order to use a CAD model for MBD, the properties of the physical model have to be transformed into its corresponding set of differential equations describing the system dynamics and the equations of motion. Using the state-of-the-art control engineering design software this can be done by using certain add-ons in e.g. Solid Works. One example of this procedure is by the use of the Simmechanics [40], a product of Mathworks, that allows users to convert a CAD design model into a set of differential equations that represent the dynamics of a robotic motion. These set of differential equations can be simulated and further used for model based control design by applying other tools within the mathworks products.

5.2 Robot motion control

The task is to construct a controller for our TEBoT that meets the desired behavior of a closed loop control system. The dynamics of yaw movement are simple and linear so the yaw controller is simple proportional controller. On the other hand, the pitch and roll dynamics of TEBoT are nonlinear and are given by:

$\begin{matrix} \dot{x} (t) & = & f (x (t), u (t)) \\ y (t) & = & h (x (t), u (t)), \end{matrix}$ (1)

To simplify the controller design for pitch and roll movement, we apply a linearization of the model using Taylor series [41].

$\begin{matrix} \dot{x} (t) & = & A x (t) + B u (t) \\ y (t) & = & C x (t) + D u (t) . \end{matrix}$ (2) The basic goal of the controller is to specify the torques for two servo motors for the desired pitch and roll angles. The multiple-input multiple-output (MIMO) closed loop diagram of a system is shown in Fig. 6. The multiple inputs are the desired pitch and roll angles denoted by θ_d and φ_d, respectively. The multiple outputs are the required torques for the left and right servo denoted by τ_l and τ_r, respectively. The G is the plant, which is imported from the solidworks. There are two controllers used in our system; one main controller denoted by C_m and the low-level controller denoted by C_s. The low-level controller (C_s) is the PID controller for each servo motor to follow the given trajectories as given by:

$\begin{matrix} τ & = & - K_{p} (q - q^{ref}) - K_{d} (\dot{q} - {\dot{q}}^{ref}) \\ - K_{I} \int (q - q^{ref}), \end{matrix}$ (3)

Where K_p denotes the values of proportional gains, K_d the values of derivative gains, and K_I the values of integral gains. The gains of the multiple PID controllers are tuned automatically in simulink by using the tune tab of the PID controller.

The main controller C_m maps the desired trajectories of the TEBoT to the corresponding servo trajectories using inverse kinematics. It starts with the desired rotational angles (θ_d and φ_d) of the TEBoT and calculates the required servo rotations (q_l and q_r) to achieve this. The gains of the C_m for the given crossover frequencies between 1 and 11 rad/sec (taken from Table 1) are tuned with the help of optimization methods described in [42] using MATLAB with minimal loop interaction and adequate MIMO stability margins.

6 Head pose estimation algorithm

This section describes a single camera based geometric head pose estimation technique. Our method uses the location of facial features such as the eyes, mouth, and nose tip to determine pose from their relative configuration. We assume that a human head is three degrees of freedom rigid object with yaw, pitch and roll angles denoted by ψ, θ and φ, respectively, as shown in the Fig. 4. The following procedure is used for estimating these pose angles:

6.1 Face detection

The input to our geometric head pose estimation algorithm is the video frames containing, i) the face of a person and ii) the undesired background. We first detect a human face in a cluttered video stream. Towards this end, number of algorithms have been proposed [43, 44]. However, we have employed Haar-feature-based cascade classifiers proposed by Paul Viola and Michael Jones for human face detection [45]. This algorithm is a two-step process which consists of a training step and a testing step. In training step, the algorithm learns to differentiate between a face image and background. In testing step, the algorithm uses the training information to detect a face in a video stream (see Fig. 7(a)).

6.2 Facial features detection

The second step is to find facial feature points in a detected face. Facial features include eyes, nose, lips, mouth, eye-brows and facial boundary. Numerous methods exist for detecting human facial features as presented in [46], In this work, we have employed the well-known Constrained Local Model (CLM) approach [47]. The CLM is also a two-step process which contains a training session and a testing session. In training session, the shape and texture models are built from a training set of large number of labeled face images. The shape model includes a face and facial feature points and the texture model includes the intensity values of the face and facial features. In testing session, our algorithm iteratively estimates the facial feature location of an unseen image by using the combined information of the shape and texture models. The result of CLM is shown in Fig. 7(b).

6.3 Estimating human neck’s reference frame

Human head has three independent movements (yaw, pitch and roll) around Y, X and Z axes as shown in Fig. 8. The video is formed by a sequence of consecutive images, and images from standard camera contains only 2D information, i.e. X and Y coordinates. This information can be used to compute a roll angle of a human head. However, it requires 3D information for computing yaw and pitch angles. Hence, this step consists of estimation of the Z coordinates to define a reference frame of a human neck. This reference frame is assumed to be at C2 of the spinal column as shown in Fig. 9, and it is found by using the location of the eyes and the facial boundaries [48]. These features allow us to compute the width w, height h, and the distance between the center of the eyes d from the detected face, see Fig. 9. Given these quantities, the neck reference frame is given by: $O_{n} = (X_{n}, Y_{n}, Z_{n}) = (\frac{w}{2}, \frac{h}{2}, - λ d) .$ (4)

This 3D reference point is used to estimate the yaw and pitch angles of the human head.

6.4 Estimating the pose angles

In the case of the roll angle φ, it is sufficient to know the positions of the center of each eye, i.e. E_l = (X_l, Y_l) for the left eye and E_r = (X_r, Y_r) for the right eye. Therefore, φ is computed by right angle triangle shown in Fig. 10(b): $φ = {tan}^{- 1} (\frac{Y_{r} - Y_{l}}{X_{r} - X_{l}}) .$ (5)

For estimating the yaw ψ and pitch θ angles, we define the vector $v \in ℝ^{3}$ from O_n to O_e, where O_e = (X_e, Y_e, Z_e) is middle point of the center of the eyes, see Fig. 10(a).

The projections of v onto the planes XZ and YZ are given by

$\begin{matrix} a & = & {Proj}_{XZ} (v), \\ b & = & {Proj}_{YZ} (v), \end{matrix}$ (6)

The angles between the projections and the Z-axes give us the yaw ψ and pitch θ angles of the human head as computed below: $ψ = {tan}^{- 1} (\frac{X_{e} - X_{n}}{Z_{e} - Z_{n}}),$ (7) $θ = {tan}^{- 1} (\frac{Y_{e} - Y_{n}}{Z_{e} - Z_{n}}),$ (8)

7 Kalman filter based approach

The output of our vision based geometric head pose estimation algorithm could not be used directly due to several limitations lie at the software and hardware ends. At the software end, we have the limitation of video acquisition and camera parameters; at the hardware end, we have the limitations of mechanical structure parameters. To map the complex head dynamics to a limited three degrees of freedom platform, we have used Kalman filter [49]. The kalman filter uses the set of second order differential equations to predict the future state of the angular signals and hence, temporally shaping these geometric head pose angles to make it a suitable input for our mechatronic robot.

The kalman filter uses the discrete time state space equation to govern the dynamic relation of these signals (ψ, θ, φ) in two successive time steps given by k - 1 and k: $X_{k} = A X_{k - 1} + B u_{k}$ (9) Where, X_k and X_k-1 are the state vectors of three pose angles (ψ, θ, φ) in the frame k and k - 1, respectively.

The matrix A in Equation 9 is the state transition matrix and the vector B in Equation 9 is the control input model and are given by the following equations: $A = [\begin{matrix} 1 & T \\ 0 & 1 \end{matrix}]$ (10) $B = [\begin{matrix} T^{2} / 2 \\ T \end{matrix}]$ (11) where, T is the time step between two frames and is the inverse of the frame rate of our head pose algorithm. The kalman filter estimates the future state of the angular signals (yaw, pitch and roll) by using recursive two-step process. The first step is a state prediction step in which it predicts the future state X_k and the covariance matrix P_k by using the following equations.

$\begin{matrix} X_{k} & = & A X_{k - 1} + B u_{k} \\ P_{k} & = & A P_{k - 1} A^{T} + Q_{k} . \end{matrix}$ (12)

The second step is the measurement update step in which it updates the measured angular signal values of yaw, pitch and roll angles according to a predicted angular values from the state prediction step. The measured angular signals from geometric head pose estimation algorithm is denoted by Z_k. The updated angular state ${\hat{X}}_{k}$ is given by: ${\hat{X}}_{k} = X_{k} + K_{k} (Z_{k} - C_{k} X_{k})$ (13)

Where K_k is the kalman gain and C_k is the 1 × 2 vector and is given by, $K_{k} = P_{k} C_{k}^{T} (C_{k} P_{k} C_{k}^{T} + R_{k})^{- 1}$ (14)

At the end of the measurement update step, we update the covariance matrix for next step by using the following equation: $P_{k + 1} = (I - K_{k} C_{k}) P_{k}$ (15)

The output of the kalman filter in each frame k is ${\hat{X}}_{k}$ , which comprises of three updated pose angles and three velocity estimates of the human head movement, i.e.: ${\hat{X}}_{k} = [\hat{ψ_{k}}, \hat{θ_{k}}, \hat{φ_{k}}, \dot{ψ_{k}}, \dot{θ_{k}}, \dot{φ_{k}}]$ (16)

The added advantage of kalman filter is that it estimates the velocity parameters of human head which are given by [ $\dot{ψ_{k}}$ , $\dot{θ_{k}}$ , $\dot{φ_{k}}$ ]. Whereas, [ $\hat{ψ_{k}}$ , $\hat{θ_{k}}$ , $\hat{φ_{k}}$ ] are the updated pose angles which are now suitable input for our mechatronic robot.

8 System implementation

Our system consists of four main blocks: i) a vision based algorithm (VBA) block, ii) a filtration block, iii) a modal based design (MBD) block and iv) a real platform block, as shown in Fig. 11. The VBA for geometric head pose is implemented in VC++. The MBD of TEBoT is simulated in simulink (Matlab) environment. The communication between VBA block and MBD block is done through internal TCP/IP. The input to the control algorithm of MBD is yaw, pitch and roll angles from filtration block. The control algorithm block implements a PID controller based on an error between new pose angles and the previous pose angles. The servo controller takes input from the PID controller and generates PWM signals for performing yaw, pitch and roll movements. These PWM signals can be used by sim-mechanic model for visualization of TEBoT response and similarly they can be used for real time testing with TEBoT hardware. The automatic generated code from MBD is implemented on arduino to implement control algorithm during testing. For this testing the communication between simulink and the hardware is done through USB port of a computer. The sim-mechanics provides a feedback which completes the closed loop control system.

9 Experimental results

The experiments are conducted to measure;

the accuracy of our vision based algorithm.

the tracking performance of the human-in-the-loop TEBoT system.

The accuracy of head pose estimation algorithm can be measured by one of the two ways.

by running an algorithm on annotated data-set such as BIWI Head Pose Database, ICT 3D Database [50], etc.

by comparing the results with electromagnetic trackers, e.g. Flock of Birds and 3Space FASTRAK, Ultrasound trackers, e.g. IS-900 [51], Inertial trackers, e.g. IMU, etc.

In former, the head pose data-sets are usually annotated by the latter technique, i.e., by using an expensive trackers. Furthermore, the accuracy of less expensive IMU is comparable with the accuracy of expensive trackers as proved by [52]. In this work, we have measured the accuracy of our geometric head pose estimation algorithm by employing the latter technique and used an Inertial Measurement Unit (IMU) as a ground-truth.

The experiment is performed in which a user moves his head in different orientations and the data are logged during run-time. The logging frequency of both data (IMU and our algorithm data) is 25Hz and for this experiment we have logged 4000 frames. The comparative results of IMU data with geometric head pose algorithm data is presented in the form of mean error and standard deviation for each yaw, pitch and roll angles as shown in Table 3. Some of the recorded frames for yaw, pitch and roll angles are also shown in Fig. 13(a, b, c).

For measuring the performance of overall system, we took pose angles (yaw, pitch and roll) of ten different people through our geometric head pose algorithm. These raw pose angles are saved for further processing. From Fig. 11, these pose angles are first filtered through Kalman filter. The recorded trajectories can be visualized in Fig. 14. The columns show the response of kalman filter on yaw ψ, pitch θ and roll φ angles. The gray signal is one of the recorded sequence and the striped-black signal is the filtered response. The second row in Fig. 14 shows the estimated velocities through kalman filter.

Following Fig. 11, the filtered signals become an input to MBD block. The MBD block implements the control algorithm for mimicking the head movement based on an input. The MBD block includes hardware-in-the-loop (hardware block). Where, the controller parameters are left as designed, i.e. the tuning made by MBD block is not modified for performing real experiment. The results are presented in the form of tracking performance as shown in Fig. 15. The gray signal shows the filtered input signal and the striped-black signal shows the tracked signal by MBD.

10 Conclusion

In this work, we have presented a design and development process of a human-in-the-loop socially interactive bio-inspired head/neck robot where single camera based motion capturing in combination with Model bases Design (MBD) approach is used for mimicking human head movements. We have developed a reliable and robust biologically-inspired neck platform (TEBoT) using intuitive human-head motion analysis. The TEBoT is compact, self-contained and fulfills the static and dynamic performances of a human neck. In terms of modeling, we have presented model based design (MBD) technique, for which we have transformed the physical CAD model of TEBoT into a set of differential equations describing the system dynamics and the equations of motion by using sim-mechanics library.

For designing the input control of the TEBoT, we have included human-head in the loop where real-human head provides an input to the TEBoT. To capture these real-human head movements we have considered the limitations of previously developed SBAs and VBAs and proposed a novel vision based technique which captures the pose angles of human head in real-time without using any wearable sensors and/or markers. Our proposed geometric head pose estimation algorithm calculates the pose angles based on facial feature points and the geometric manipulation of these feature points. Our novel input control is based on low cost, low resolution, low bit rate and non-wearable webcam of the computer.

Once we have all the sub-modules, we integrated them for real time visualization and testing. The real time visualizations have been done under MATLAB/Simulink. Which allows to perform simulation studies, automatic tuning of control parameters, and code generation for hardware-in-the-loop testing. For real testing the automatic code generation capability of MATLAB was used.

The experiment tests were done to i) measure the accuracy of our geometric head pose estimation algorithm and ii) measure the performance of overall system in mimicking the real head movements. The experimental results show the effectiveness of our geometric head pose estimation technique and satisfactory tracking performance for over-all system.

This article presents an idea of including human-in-the-loop and mapping real human head movements to our socially interactive bio-inspired head/neck robot by using monocular camera. This idea can be extended to other body parts of biological system and is the aim of future work. Our proposed technique can be used in learning-by-demonstration or imitation-learning field. In future, we plan to use the TEBoT as an embodied agent for tele-operation especially in a video teleconferencing for presenting head gesture of a remote person. TEBoT can also be used for assisting elder people, distance learning scenarios and even for entertainment industry.

Acknowledgments

The authors would like to thank Dr. Pedro La Hera and Dr. Daniel Ortiz Morales for their help in the controller design of the TEBoT.

References

Fong

et al., A survey of socially interactive robots, Robotics and Autonomous Systems42(3) (2003), 143–166.

Khan

M.S.L.

, Li

et al., Expressive multimedia: Bringing action to physical world by dancing-tablet, pp. 9– 14. ACM, in 2nd Workshop on Comp Models of Social Interactions: Human-Computer-Media Communication (2015)–.

Khan

M.S.L.

, Li

and Ur

, Réhman, Embodied telepresence system (ets): Designing tele-presence for video teleconferencing, pp. 574– 585, Springer, in Design, User Experience, and Usability. User Experience Design for Diverse Interaction Platforms and Environments (2014)–.

Khan

M.S.L.

and Réhman

S.U.

, Embodied head gesture and distance education, Procedia Manufacturing3 (2015), 2034–2041.

Pineau

, Montemerlo

, Pollack

, Roy

and Thrun

, Towards robotic assistants in nursing homes: Challenges and results, Robotics and Autonomous Systems42(3) (2003), 271–281.

Delcomyn

, Bioinspiration and Robotics Walking and Climbing Robots, Maki K. Habib (Ed.), Biologically inspired robots2007–.

Bekey

G.A.

, Autonomous robots: From biological inspiration to implementation and control2005–MIT Press.

Sakagami

, Watanabe

, Aoyama

, Matsunaga

, Higaki

and Fujimura

, The intelligent asimo: System overview and integration, in Intelligent Robots and Systems, 2002 IEEE/RSJ International Conference on, vol. 3, IEEE, 2002, pp. 2478–2483.

Kristoffersson

, Coradeschi

and Loutfi

, A review of mobile robotic telepresence, Advances in Human-Computer Interaction2013 (2013), 3–.

10.

Adalgeirsson

S.O.

and Breazeal

, Mebot: A robotic platform for socially embodied presence, in Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction IEEE Press, 2010, pp. 15–22.

11.

Venolia

and Tang

E.A.

, Embodied social proxy: Mediating interpersonal connection in hub-and-satellite teams, pp. – ACM, in SIGCHI Conference on Human Factors in Computing Systems (1058)–.

12.

“Jibo:,”, Accessed Feb, 2016–https://www.jibo.com/.

13.

Kozima

, Michalowski

M.P.

and Nakagawa

, Keepon, International Journal of Social Robotics1(1) (2009), 3–18.

14.

Kajita

et al., Cyetic human hrp-4c: A humanoid robot with human-like proportions, pp. 301– 314. Springer, in Robotics Research (2011)–bern.

15.

“Der3, actroid-der.” Webpage, 2016–http://www.kokorodreams.co.jp/english/rttokutyu/actroid.html.

16.

Nishio

, Ishiguro

and Hagita

, INTECH Open Access Publisher, Geminoid: Teleoperated android of an existing person2007–Vienna.

17.

Moubayed

et al., Furhat: A back-projected human-like robot head for multiparty human-machine interaction, pp. 114– 130. Springer, in Cognitive Behavioural Systems (2012)–.

18.

Ogawa

et al., Exploring the natural reaction of young and aged person with telenoid in a real world, JACIII15(5) (2011), 592–597.

19.

Metta

, Sandini

, Vernon

, Natale

and Nori

, The icub humanoid robot: An open platform for research in embodied cognition, pp. 50– 56. ACM, in Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems (2008)–.

20.

Khan

S.R.M.S.L.

, In Handbook: Strategies for a Creative Future with Computer Science, Quality Design and Communicability, Blue Herons (EDs). Ch. 9, Distance Communication: Trends and Challenges and How to Resolve them2014–.

21.

Boker

S.M.

and Cohn

E.A.

, Effects of damping head movement and facial expression in dyadic conversation using real– time facial expression tracking and synthesized avatars, Philos Trans of the Royal Society of London B: Biological Sciences364(1535) (2009), 3485–3495.

22.

van

, der Smagt, M. Grebenstein, H. Urbanek, N. Fligge, M. Strohmayr, G. Stillfried, J. Parrish and A. Gustus, Robotics of human movements, Journal of Physiology-Paris103(3) (2009), 119–132.

23.

Sim

, Gavriel

, Abbott

and Faisal

, The head mouse - head gaze estimation “in-the-wild” with low-cost inertial sensors for bmi use, IEEE (2013), 735–738.

24.

Oess

N.P.

, Wanek

and Curt

, Design and evaluation of a low-cost instrumented glove for hand function assessment, Jour of Neuroeng and Rehabil9(2) (2012)–.

25.

, Halawani

and Lal

M.S.

, Khan, S.U. Réhman and H. Li, Finger in air: Touch-less interaction on smartphone, pp. 16. ACM, in Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia (2013)–.

26.

Rezzoug

et al., A method for estimating threedimensional human arm movement with two electromagnetic sensors, Comp Met- in Biomechanics and Biomedical Eng13(6) (2010), 663–668.

27.

Corrales

J.A.

, Candelas

and Torres

, Hybrid tracking of human operators using imu/uwb data fusion by a kalman filter, pp. 193– 200. IEEE, in Human-Robot Interaction (HRI), 2008 3rd ACM/IEEE International Conference on (2008)–.

28.

Stanton

, Bogdanovych

and Ratanasena

, Teleoperation of a humanoid robot using full-body motion capture, example movements, and machine learning, in Proc Aus Conf on Robotics and Automation2012–.

29.

G.Welch and E. Foxlin, Motion tracking survey, IEEE Computer graphics and Applications, 2002, pp. 24–38.

30.

Murphy-Chutorian

and Trivedi

M.M.

, Head pose estimation in computer vision: A survey, Pattern Analysis and Machine Intelligence, IEEE Transactions on31(4) (2009), 607–626.

31.

Yan

, Ricci

, Subramanian

, Liu

, Lanz

and Sebe

, A multi-task learning framework for head pose estimation under target motion, 2015–.

32.

Erol

and Bebis

E.A.

, Vision-based hand pose estimation: A review,52– 73., Computer Vision and Image Understanding108(1) (2007)–.

33.

, Halawani

, Feng

, Li

and Réhman

S.U.

, Multimodal hand and foot gesture interaction for handheld devices, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)11(1s) (2014), 10–.

34.

Muhlbauer

, Kuhnlenz

and Buss

, A model-based algorithm to estimate body poses using stereo vision, pp. 285– 290. IEEE, in Robot and Human Int Comm, 2008 RO-MAN 2008 The 17th IEEE Int Symp on (2008)–.

35.

Sundaresan

and Chellappa

, Markerless motion capture using multiple cameras, pp. 15– 26, IEEE, in Computer Vision for Interactive and Intelligent Environment, 2005 (2005)–.

36.

Schwarz

L.A.

, Mkhitaryan

, Mateus

and Navab

, Human skeleton tracking from depth data using geodesic distances and optical flow, Image and Vision Computing30(3) (2012), 217–226.

37.

You

, Kim

, Oh

, Jeong

and Oh

, Network-based humanoid ŚmahruŠas ubiquitous robotic companion, Seoul, Korea, 17th World Congr, Int Federation Autom Control2008–.

38.

Hiatt

J.L.

, Gartner

L.P.

, Rohen

and Gadd

J.L.

, Williams & Wilkins, Textbook of head and neck anatomy2001–Lippincott.

39.

Paterno

, Springer Science & Business Media, Model-based design and evaluation of interactive applications2000–.

40.

Wood

G.D.

and Kennedy

D.C.

, Simulating mechanical systems in simulink with simmechanics, The Mathworks Report2003–.

41.

Chen

C.-T.

, Inc, Linear system theory and design1995–Oxford University Press.

42.

Aström

K.J.

and Hägglund

, Pid controllers: theory, design, and tuning, NC, Inst Society of America, Res Triangle Park1995–.

43.

Zhang

and Zhang

, A survey of recent advances in face detection, Tech rep, Microsoft Research, Tech Rep2010–.

44.

Chen

, Huang

and Lv

, Towards a face recognition method based on uncorrelated discriminant sparse preserving projection, Multimedia Tools and App (2015), 1–15.

45.

Viola

and Jones

M.J.

, Robust real-time face detection, International Journal of Computer Vision57(2) (2004), 137–154.

46.

Wang

, Gao

, Tao

and Li

, Facial feature point detection: A comprehensive survey, arXiv preprint arXiv:, (1037), 2014–.

47.

Cristinacce

and Cootes

T.F.

, Feature detection and tracking with constrained local models, in BMVC2(5) (2006), 6–.

48.

Sikandar

, Lal Khan, Z. Lu, H. Li, et al., Head orientation modeling: Geometric head pose estimation using monocular camera, in 1st IEEE/IIAE International Conference on Intell, Systems and Image Processing 2013, 2013, pp. 149–153.

49.

Grewal

M.S.

and Andrews

A.P.

, Kalman filtering: theory and practice using MATLAB. John Wiley & Sons, 2011–.

50.

Baltrusaitis

, Rob

and Morency

, 3d constrained local model for rigid and non-rigid facial tracking, in IEEE Conf on Comp Vision and Pat Recog (CVPR), 2012 IEEE, 2012, pp. 2610–2617.

51.

Meyer

, Applewhite

H.L.

and Biocca

F.A.

, A survey of position trackers, in Presence: Teleoperators and Virtual Environment, vol. 1, 1992, pp. 173–200.

52.

Girardo

D.O.

, Popovic

M.B.

, Blumenau

, Galiana

, III

F.L.H.

, Howe

R.D.

, Jentoft

, Kesner

S.B.

, Lin

E.L.

and Mandala

, Physics applied to post-stroke rehabilitation shoulder soft robotics brace december 31, report, (2011), 2011–.