Abstract
This article presents mixed-initiative collaboration between two artificial agents of heterogeneous realities (a humanoid robot and a virtual human) for a real-world social task communicated through a common platform. A detailed framework for the collaboration is developed. Based on the framework, a representative real-world task to be performed through collaboration between the artificial agents for the wellbeing of humans is decided, which is to search for a missing (hidden) object in a homely environment. The agents are enriched with various similar intelligence and autonomy, functionalities and interaction modalities, and are integrated through a common communication platform based on a collaboration scheme so that they can collaborate (assist each other) to perform the real-world task. As a part of the collaboration scheme, a robot-virtual human bilateral trust model is derived and a real-time trust measurement method is developed so that the role of taking initiative in the collaboration can be switched between the agents triggered by the bilateral trust, which results in a mixed-initiative collaboration. An evaluation scheme is developed to evaluate the performance of the agents for the collaborative task. To benchmark their performance in the collaboration, human’s collaborations with the artificial agents and with some other allied agents for the same task are studied. The evaluation and benchmarking results show that both the robot and the virtual human perform satisfactorily in their collaboration, which proves the effectiveness of the real-world collaboration between the artificial agents of heterogeneous realities as well as justifies the effectiveness of the common platform and of the bilateral trust-based mixed-initiatives between the artificial agents. The results can be used to develop intelligent agents of heterogeneous realities to assist humans in various real-world tasks or help humans get real-world tasks performed in cooperation between artificial agents of heterogeneous realities.
Keywords
Introduction
Present applications of humanoid robots include various activities such as therapy for abnormal social development and autism [47], rehabilitation [27], tele-health care [30], education and training [1], security and rescue operations [53], social services and business [29], entertainment [56], industrial operations [41,45], and so forth. However, the scope of applications of humanoid robots especially in accomplishing social tasks in cooperation with (or for the welfare of) humans is still limited probably due to limitations in their intelligence and autonomy, anthropomorphism, and social, behavioral, perceptual and communication skills [33]. The humanoid robots can be developed as human-like intelligent agents so that they can take inspirations from humans and possess capabilities of communicating with human counterparts, perceiving human’s affective states, expressions, intentions and actions, interpreting the perceptions based on contextual information, and acting based on prevailing situations [3,26].
On the other hand, virtual humans are software-generated human-like animated artificial characters. The present applications of virtual humans include various tasks such as serving as virtual patient [46], tutor and student/trainee [2,14,50], acting as characters to visualize effects of temperature increases in room [9] and simulate engineering maintenance system [13], and so forth. There are increasing contributions of virtual humans towards anatomy education, psychotherapy and biological and biomedical research [24,46]. However, it appears that the virtual humans could not come beyond their virtual environments, i.e. their applications are limited to the virtual world only, and their interactions with their real-world human counterparts are still limited. It is believed that the scope of their contributions could be augmented if they could be used to perform real-world tasks for humans or cooperate with humans to perform real-world tasks. In order to do so, the virtual humans should be enriched with various real-world social functions and attributes, e.g. they should be able to exhibit human-like intelligence, motions, actions, emotions, gestures and expressions, communicate and interact with humans, memorize facts and retrieve them with dynamic contexts, and demonstrate reasoning and decision-making abilities based on perceptions [51]. However, virtual humans with intelligent features are not observed for real-world tasks except a few preliminary initiatives [19,20].
Based on the knowledge of the state-of-the-art humanoid robots [1,3,26,29,33,41,47,53,56] and virtual humans [2,14,24,46,50,51], it is believed that an autonomous humanoid robot (HR) and a virtual human (VH) show a lot in common in their objectives and performance though there is a basic difference that HRs exist physically while VHs are software-created visual characters. It is believed that humanoid robots and virtual humans can separately cooperate with humans, and also with each other to perform various real-world tasks. The properly networked and bi-laterally communicating humanoid robots and virtual humans cooperating in a coordinated and goal-oriented way can perform better than an individual robot or a virtual human. Dynamic collaboration between robots and virtual humans seems to be superior to the augmented reality for robots where a robot follows its virtual counterpart, but dynamic bidirectional collaboration between them is very limited [42]. Nonetheless, investigations on collaborations between humanoid robots and virtual humans have not received much attention yet except a few preliminary initiatives [8,11,34–37]. In addition, these initiatives are still in concept design phases, and no real-scenario characters and cooperation methods have been proposed to justify and validate the effectiveness of the initiatives.
It is believed that a comprehensive framework is necessary to stage the real-world collaboration between a humanoid robot and a virtual human. Well-defined evaluation scheme (evaluation criteria, methods and metrics) and benchmarking standards are also necessary for evaluating and benchmarking the performance of the collaboration, which can help improve the performance of the collaboration through calibrating with the benchmarks that can also help increase the social acceptance and impacts of the collaboration [4,21]. It is assumed that a common platform can be technically helpful to stage the real-world collaboration between a robot and a virtual human. For example, it can reduce the volume of software development, and ease the animation [48]. However, suitable framework for collaboration between a robot and a virtual human, and initiative of developing a common platform between them are not observed. Evaluation and benchmarking schemes for such collaboration have also not received priority in the state-of-the-art research [8,11,35–37], except the preliminary initiatives in [4,21,34].
Trust of one agent in another agent is mandatory for any collaborative task between two agents because one agent may not want to collaborate with another agent if the agents do not trust each other [16]. Human trust in collaborating robots has been studied enormously [40], but human trust in collaborating virtual humans as well as trust between two artificial agents of heterogeneous realities (e.g., robot’s trust in virtual human and virtual human’s trust in robot) have not been studied enormously [36,37]. The trust in each other can be used to plan their role and autonomy in their collaboration. To do so, appropriate computational models of trust are necessary to measure real-time trust between a humanoid robot and a virtual human in their real-world collaboration. However, such trust modeling and real-time trust measurement methods have not been proposed yet, and trust-based collaboration between robot and virtual human to perform real-world tasks is yet to be observed except the preliminary studies in [36,37].
On the other hand, mixed-initiative interaction is the case where turns in interaction/collaboration are negotiated between two or more participating agents rather than solely determined by a single agent [6]. It is assumed that mixed-initiatives between a robot and a virtual human can make the collaboration more participatory, intuitive and natural, which can enhance their individual contribution to the collaboration. It is also believed that the bilateral trust status between robot and virtual human can trigger (decide) their turn- taking in the mixed-initiative collaboration. However, possibility of such bilateral trust-triggered mixed-initiatives in the collaboration between a robot and a virtual human has not been investigated enormously [36,37].
Being motivated by the above limitations of robot-virtual human collaboration, and for exploiting the benefits of integrating the novel concepts proposed in [34–37], the objective of this article has been decided as to investigate the trust-triggered mixed-initiative social collaboration between a virtual human and a humanoid robot for a real-world common social task (e.g., searching for a missing/hidden object in a social environment) through a common platform, and to evaluate and benchmark the collaborative performance. A detailed framework is developed to implement the collaboration. Bilateral trust models between a robot and a virtual human are derived, and real-time bilateral trust measurement methods are proposed. Trust-triggered mixed-initiatives are incorporated in the collaboration between the robot and the virtual human. A comprehensive scheme is developed to evaluate and benchmark the collaboration. Finally, the results are proposed to use to develop intelligent agents of heterogeneous realities to assist humans in real-world tasks, or help humans get real-world tasks performed in cooperation between artificial agents of heterogeneous realities. The collaboration model can provide an assistive scenario contributing to enhance the quality of life (QoL), and implement assisted living and smart home facilities.
The rest of the article is organized as follows: Section 2 introduces the overall framework. Section 3 introduces a representative real-world collaborative task. Section 4 states the requirements for integration between a humanoid robot and a virtual human for the collaborative task. Section 5 presents the development of the artificial agents, technical and implementation aspects of the collaboration and the common platform. Section 6 presents the trust-based collaboration scheme. Section 7 presents the experimental setup, evaluation and benchmarking schemes. Section 8 presents the experimental evaluation, and Section 9 presents the evaluation results. Section 10 summarizes entire results, Section 11 presents the discussion and Section 12 includes conclusions and future works.
The framework
A detailed framework (guideline) is necessary to perform the activities required for real-world collaboration between a humanoid robot and a virtual human. The proposed framework is given in Fig. 1.

A 5-step framework (guideline) to stage the collaboration between a humanoid robot and a virtual human for a real-world task.

The room layout for the real-world collaborative task between a robot and a virtual human.
As per step 1 of the framework of Fig. 1, a representative collaborative task was identified. For the task, ten (10) rectangular paper boxes with identical appearance (black) and dimensions were kept in a room, as shown in Fig. 2. Five boxes were randomly put in the left side, and the remaining five boxes were randomly put in the right side of the room. An object (e.g., a small doll) was hidden in any of the 10 boxes by the experimenter (recognized as “
One may argue that human2 might himself/herself find out the object instead of receiving services from the artificial agents. However, the cases were considered with the perspectives on an assistive environment. For example, the following prospective assistive scenarios can be considered: (i) human2 did not have knowledge about the correct location of the missing (hidden) object, and (ii) human2 could be a disabled or a busy person who did not have ability or time to find out the object himself/herself. Instead, he/she wanted to get the object found out through the services/assistance of the artificial agents [34–37].
Again, one agent (either the robot or the virtual human) could find out the object alone without collaborating with another agent. However, the cases were taken into account if: (i) one agent did not have knowledge about the correct location of the object, (ii) one agent did have knowledge about the location of the object, but the agent was physically absent in the site, and thus it needed to appear through telepresence and help another local agent to find out the object, (iii) one agent could be more intelligent, but less physically skillful to find out the object, and the vice versa. As both agents were artificial, they had limitations in skills, intelligence and capabilities. This is why, collaborations between the agents were considered where the agents could benefit each other through their complementary attributes, intelligence and skills, which could help perform the task easily [34–37].
Requirements for integration between a humanoid robot and a virtual human
Kapadia et al. identified several requirements as well as explained key limitations in representation, control, locomotion, multimodal perception and authoring of the state-of-the-art autonomous virtual humans [19]. These need to be considered when generating successful interactions between a virtual human and a robot. Other requirements for creating interactive virtual human and humanoid robot were explained in [12]. The effective integration between a robot and a virtual human for the proposed real-world task would need to satisfy a set of requirements that were identified at step 2 of the framework given in Fig. 1. It was determined that, in ideal cases, the robot and the virtual human should be enriched with: (i) functions and skills, e.g. locomotion, gesture, actions, voice, facial expressions, gaze, attention, manipulation, communication, (ii) attributes, e.g. anthropomorphism (attribution of human traits, emotions or intentions observed in non-human entities [33,58]), embodiment (tangibility or visibility of an idea, quality or feeling in any particular form [52]), stability, (iii) intelligence such as perception, recognition, decision-making, and (iv) interaction abilities and modes such as visual, auditory (speech), demonstrative, and body lingual [3,18,20,26]. It means that, ideally, the agents need to see and recognize each other, the target object and the environment, speak and listen to the counterpart for verbal instructions by the agents about the search path for the target object, show gesture and understand/recognize counterpart’s gesture that can be used by the agents to demonstrate/understand the search path for the target object, etc. They may need to have mobility to search for the object and point out the object when it is found. The agents can be enriched with the required technologies, control methods and algorithms, interfaces, sensors and integration and communication platform to fulfill the requirements for their collaboration for the target task [34–37]. They should be as human-like in appearance and performance as possible, which may enhance their acceptance by their human beneficiaries [33].
Development of the intelligent agents and the common communication platform
As per steps 3 and 4 of the framework in Fig. 1, in this section, the hardware components, the software packages and the control and communication technologies were determined for the agents, the agents were developed using the hardware and the software, and those were integrated in system-level to implement the proposed collaboration [34–37].

The intelligent autonomous artificial agents, (a) the virtual human displayed in a large screen, (b) the humanoid robot.
A realistic autonomous intelligent virtual human was developed as shown in Fig. 3(a). The Smartbody system (
Development of the humanoid robot
A NAO humanoid robot (http://www.aldebaran-robotics.com/en/) as shown in Fig. 3(b) was employed as the collaborating robot. APIs for various functions and attributes for the robot were developed to make it skillful, intelligent and autonomous to perform the functions and interactions required for the selected collaborative task. The functions included stand up, sit down, walk, shake hand, wave hand, grab and release an object, speech (text to speech), look at a position, point at something, and so forth. Like the virtual human, the robot could perceive the environment through vision sensors, make decisions based on adaptive rules and stored information, and react by moving, talking or showing emotions [34–37].

System architecture including technical and implementation aspects for human-mediated robot-virtual human collaboration.
Details regarding the technical and implementation aspects of the collaboration system between the virtual human and the robot mediated by a human from an engineering point of view can be explained using Fig. 4 [38]. As the figure shows, the entire system can be technically divided into 4 modules: (i) sensing and interfaces, (ii) world model, (iii) system behavioral model, and (iv) artificial characters/agents. The modules are also appropriately connected. For the first module, a few useful sensors and interfaces are determined. The sensors can be used to perceive the environment/world including artificial characters, and the interfaces can be used by humans to input to the system depending on requirements. The world model can include the human, room environment, object and collaboration system/scenario. The world model can receive sensory inputs from the sensing and interfaces module. The system behavioral model includes scenario-based artificial intelligence (AI), decision rules, trust models and mixed-initiative algorithms. Note that such model can be updated based on changes in the world model [38].
Appropriate control systems are developed for the robot and the virtual human. The software development kit (SDK) provided with the robot is used to control it. To control the virtual human, SmartBody control environment is used. Other required software packages are determined and installed. The APIs for the functions for the robot and the virtual human are archived in the robot and the virtual human control servers respectively. The system behavioral model is connected to the characters through the control system and appropriate network so that desired behaviors are created in the characters, and the behaviors of the characters also impact the system behavior model. The characters can also be connected with each other. Figure 4 shows the details including the information flows, and it also shows how the information is inferenced. All the elements in each module may not be relevant to all application scenarios, but a subset of the elements can be used depending on specific requirements [38].
Some modules of the architecture in Fig. 4 remain inside the computers (e.g., system behavioral model, virtual human character), and some other modules are out of the computers (e.g., the robot), but all are properly connected. Figure 4 firstly converges on a world model while emphasizes on the collaborative task later because the world model introduces the systems and resources that then are used for the collaborative task between the characters/agents (robot, virtual human). Here, two agents are two different entities and they look different. However, they may look the same, but one should be physical and another should be virtual.

Elements and architecture of the common communication platform for different artificial agents.
Two computers were used: (i) an AlienWare Windows7 to run all the functions related to the control of the virtual human. This computer was connected to the large screen, Kinect cameras and the wired network, and (ii) an HP compaq computer with Linux operating system (OS) to run all the functions related to the control of the robot. This computer was also connected to a router. Both computers were networked. The visual studio solution was installed in the computers for Windows OS for programming with C++ (a GCC compiler was necessary to use Linux OS). To use Python for programming, Python versions 2.6 and 2.7 of 32 bit were downloaded to the computers. Python 2.6 was necessary to run the codes for the virtual human that ran on the SmartBody. The Python libraries included the thrift, numpy, scipy, matplotlib and Qt4. All the services and the function modules were implemented and tracked on Bitbucket.
For code management, CMake, Git and Web interface were used. The CMake is an application that is a cross-platform and open-source built system. It was used to look for the required libraries form the 3rd party libraries that might help make C++ solutions (or makefiles) easy. The Git is a revision control and source code management software that can help track changes in codes and make the duplication of codes. The web interface is a web-based service that was used to manage/host remote repositories. The required libraries and functions and tools were downloaded to the respective computers. Two separate servers were implemented to control the functions of the robot and the virtual human. In order to compile and run a function, for Python, just the scripts were run. For C++, visual studio or the makefile was used to compile. The codes were run on visual studio [38].
A remote procedure call (RPC) library was used to handle the communication between the functions [23] over Thrift network [5]. RPC was used as it allows inter-process communication and relies on a server/client architecture. The RPC is modular and flexible. Thrift was preferred over ROS (robot operating system) because ROS runs only on Linux/CORBA, and it is complex [44]. Instead, Thrift is reliable, it supports cross-platform and language, and it provides high performance [5]. Information of port numbers for the thrift interfaces was found in the thrift files for the concerned services.
All the tools, libraries, functions, etc. were put in a folder that was connected to the world model to help the characters perform their activities [38].
A common communication platform for the virtual human and the robot was developed as shown in Fig. 5 [34–38]. The proposed common communication platform shown in Fig. 5 was accommodated within the overall architecture in Fig. 4. As Fig. 5 shows, using the RPC, the animation of each function for each character was commanded from a common command script (client), which was networked with the concerned control server through the Thrift interface [5].
The virtual human control server was connected to a display window (within the computer) or to a display screen (external device). The robot control server was connected to the physical robot through wireless communication network. This platform was named as “Common Interaction Platform for Heterogeneous Agents (CIP-HA)”, which could be used to animate different agents especially the agents of heterogeneous realities such as the robot and the virtual human using the RPC-based common command script (client) and the interface protocol through specifying the character during the function call. However, each character needed to have its individual control server containing the concerned APIs for the functions to be called in the client script [5,23,34–38].
The APIs for the functions for the virtual human and the robot were generated in such a way that the functions were very similar to each other, which might result in similar behaviors in the virtual human and the robot (within its mechanical limits) for a particular function. For example, if a function “point at something” was called for the virtual human, then the virtual human showed a posture pointing at something. Similarly, if the function was called for the robot, the robot showed a similar posture pointing at something. However, the difference was that the virtual human performed within the computer screen, but the robot performed in the physical environment.
Note that operating and controlling different artificial agents of heterogeneous realities for different tasks and environments through a common platform can motivate the researchers because one platform can be used for multiple tasks and environments if the platform can be used to operate and control multiple characters. It means, both the environment and the task diversities can motivate the common platform.
Bilateral trust-triggered mixed-initiative collaboration scheme
As per step 5 of the framework in Fig. 1, in this section, a collaboration scheme is proposed between the humanoid robot and the virtual human. Based on Fig. 2, a room was built as shown in Fig. 6. As shown in Fig. 6, in the room, 10 boxes were kept in different places, the virtual human appeared in the screen, and the robot stood at a position opposite to the screen. For the real-world position of each box (e.g.,

Arrangement of the 10 boxes, positions of the virtual human and the humanoid robot in the room, and an example of mapping between the pointing (fingertip) position of the virtual human in the virtual environment and the corresponding position of a target box in the physical environment.
The experimenter (human1) hides an object inside any of the 10 boxes, and inputs the position information of the object/box (e.g.,
The virtual human is the master and the robot is the follower
As Fig. 6 shows, during the collaboration, the virtual human appears in the screen, shows some gestures (e.g., stands straight, looks at the robot standing opposite to her based on head/eye tracking [7], shows gaze/attention at the robot [55]), shows some emotional expressions (e.g., smiles at the robot), and also uses some verbal expressions (speech). For example, the virtual human tells, “hi robot! I will help you find out the hidden object, follow me”. Being the master, the virtual human inherits the position information of the target object/box in the real-world, e.g.
Then, the follower agent (robot) needs to identify the correct location of the box based on the instructions it receives from the master agent (virtual human). To do so, it seems to be natural or human-like that the robot recognizes the virtual human’s fingertip position using a Kinect camera or any suitable vision system embedded in its head. The robot then can use such information to determine the corresponding position of the real-world box (target position for the robot). Even though this procedure of recognition is natural, it is not technically reliable or robust as the virtual human is displayed in a 2D screen. Hence, as an alternative, the virtual human’s actual fingertip position in the virtual environment captured through the SmartBody system is shared to the robot through the computer network. This helps the robot determine the corresponding position of the target box in the real-world space (i.e., target position for the robot) based on the preplanned mapping (Fig. 6). This procedure is found technically reliable and robust [34–37].
Once the target position is determined, the robot uses some verbal expressions such as tells, “hi virtual human! thank you for instructing the location of the object, now I can try to find it”. Then, the robot shows some gestures such as turns its face towards the target position, walks to near the target position, stops walking, looks at the targeted position (box), points at the box and tells, “I have found the box where the object may exist, thank you virtual human for your help”. In fact, the robot cannot open the box due to limitation of its skills at the current stage. Hence, the experimenter (human1) opens the box on behalf of the robot, and checks whether the object exists inside the box, i.e. whether the robot can point the correct box based on the virtual human’s instructions.
The robot is the master, the virtual human is the follower
Similar story as in Section 6.1.1 happens if the robot is the master and the virtual human is the follower. In this case, the robot and the virtual human stand face to face as in Fig. 6. At the beginning, the robot uses similar gestural and verbal expressions as the virtual human uses during its role as the master agent. Being the master, the robot inherits the position information of the target object/box in the real-world, e.g.
Once the hidden object is found out through the collaboration between the artificial agents as above, the service beneficiary (human2) can obtain it. Here, the robot and the virtual human’s gestural, emotional and verbal expressions are considered to mimic human-human interactions to make the collaboration natural [33]. The dialogue (verbal expressions) between the robot and the virtual human is actually implemented through “text to speech” command, which is pre-taught and not “natural”. It can be made natural using high level natural language or natural speech processing algorithms [31]. Robust and natural language/speech processing can make the dialogue as well as the interactions more intuitive and natural [31].
Strategy of determining master and follower agent
Whether the virtual human or the robot should act as the master agent depends on the bilateral trust between them. The collaboration scheme including the switch of master (leader)-follower role based on the bilateral trust is given in Fig. 7 [28], where

Note that, here only two agents are collaborating, and hence after the first run, an ‘if expression’ based on the trust-based condition in Fig. 7 can be sufficient to switch the roles between the two agents as the master or the follower as well as to determine their turn of taking the initiatives. However, a finite state machine (FSM) model may be helpful for complex collaboration among multiple agents performing multiple tasks [32,36,37].
Scenario-based AI or knowledge-based decision rules are incorporated with the robot and the virtual human on top of Thrift (Fig. 5) [5]. The agents can autonomously perceive the environment through the recognition algorithms using sensor information. The AI techniques and decision rules are reflected through the autonomous decisions and executions of the functions triggered by sensor information or prior knowledge such as the head tracking, gaze/attention, facing towards correct location of box, path planning based on target position, determining fingertip posture to point target box, trust-based decision making for turn taking, etc. The scenario-based AI or decision rule is controlled through an ‘if expression’ as in Fig. 7.
Based on the collaboration scheme of Fig. 7, it is clear that modeling and real-time measurement of bilateral trust is necessary to implement the collaboration, which is described below [36,37].
Trust modeling
Though trust of one agent in another agent can depend on many factors, in Lee and Moray’s study [25], a time-series model based only on performance and faults of automation (artificial agent) was used to compute human’s (biological agent) trust in automation. Trust is actually a perceptual issue and the human possesses actual feeling of his/her trust in an artificial agent (automation/robot) or in another human. However, it is impossible to generate similar feelings of trust of an artificial agent in another artificial agent. Nonetheless, the idea in Lee and Moray’s study can be extended to derive the computational model of an artificial agent’s trust (e.g., virtual human’s trust) in another artificial agent (e.g., humanoid robot) as in (1) and (2) [36,37]. In (1)–(2),
Trust measurement
For real-time measurements of
An IMU attached onto the robot hand fingers can be used to measure its actual fingertip position when it points at the target box. Let,
Similarly, the movement speed and the deviation of the actual pointed position from the targeted pointing position of the virtual human is considered as its performance and fault criterion respectively. Same as the robot, the speed and the targeted pointing position of the virtual human are ideally fixed, but the communication delay between the client server and the control server, software errors and system instabilities can reduce the speed of the virtual human. All these including other uncertainties such as calibration (mapping) errors, unnoticed displacements of the screen or the box, etc. can cause slight deviation in the virtual human’s pointed position from the targeted position. If
Note that δ and σ need to be adjusted because the chance of the humanoid robot being affected by external disturbances seems to be higher than that of the virtual human. One approach of such adjustment is to consider

Complete experimental setup.
The generality and validity of the trust models in (1)–(2) were discussed in [25]. The models include performance and fault status of one agent to estimate trust of another agent. Firstly, the terms ‘performance’ and ‘fault’ are general irrespective of nature of tasks and collaborating agents. However, measurement methods of these parameters can vary depending on tasks and agents. These parameters can be assessed using appropriate scoring methods, and normalized to obtain a common range of score values. Once the scores are normalized to a range such as between 0 and 1, these become general. The computed trust can also be compared with the maximum possible trust (say, 1.00), and the individual trust can be expressed in a common scale, e.g. in % of the highest possible trust. Thus, these models can be used for any numbers of agents for all types of tasks to measure and compare trust given that the performance and faults of the individual agents can be measured, normalized and expressed in a common scale [25]. Similar generality principles apply to the weighing factors,
The proposed models can be verified using some model verification tools [15]. The models can also be examined for their validity through estimating the trust of an animate agent in another collaborating agent using a subjective rating scale, and comparing this with the computed trust value of the animate agent [25,39]. However, the models are evaluated here based on the experimental results because the experimental approach is reliable [40,41], and the artificial agents cannot assess their trust subjectively.
The experimental setup
The experimental setup is shown in Fig. 8. As Fig. 8 shows, three rooms were built temporarily. In room 1, the computers with all required software packages for controlling the robot and the virtual human through the common platform were put. Room 1 was the home to the experimenter (human1). Room 2 represents the room shown in Figs 2, 6 and 9. In addition, there was a sound system near point P3 in room 2 that helped transmit the voice of the virtual human. In room 3, there was a laptop (laptop1) with Skype connection that a human (human3) could use to appear in the large screen or in another laptop screen (laptop2) in room 2 through Skype. Kinects were also put in room 2.
The evaluation scheme
Collaboration between the robot and the virtual human is evaluated using 4 categories of evaluation criteria: (a) attributes of the master agent, (b) quality of interactions between agents, (c) task performance, and (d) the service beneficiary’s likeability of the service. Attributes of the master agent are evaluated based on following criteria: (i) level of anthropomorphism, (ii) level of embodiment, (iii) quality of verbal and facial expressions, gestures and action generation, and (iv) level of stability of the agent. The quality of interactions between the agents is evaluated based on the following criteria: (i) cooperation level, (ii) clarity of instructions of the master agent and the level of understanding of the instructions by the follower agent, (iii) level of engagement between the agents, (iv) naturalness in interactions, (v) potential of long term companionship between the agents, (vi) perceived cognitive workload of the follower agent, (vii) situation awareness of the follower agent, and (viii) team fluency. Attributes and quality of interactions (except workload and team fluency) are assessed using a Likert scale [41] as in Fig. 10. The workload is assessed using the NASA TLX [40,41].
Team fluency is the coordinated meshing of joint efforts and synchronization between virtual human and robot during collaboration [17]. Four criteria are used to measure team fluency objectively as follows:
Robot’s idle time: it is the time that the robot waits for sensory inputs, information processing, computing, decision-making, etc.
Virtual human’s idle time: it is the time that the virtual human waits for sensory inputs, information processing, computing, decision-making, etc.
Non-concurrent activity time: it is the amount of time during a trial that is not concurrent between agents, but it needs to be concurrent.
Functional delay: it is the time between the end of one agent’s action and the start of another agent’s action.
Each criterion is expressed as a % of the total trial time. The criteria can be interrelated.

Interior view of room 2 that shows the collaboration environment between the virtual human and the humanoid robot.

Likert scale to assess the attributes of the master agent and the quality of the interactions between the agents.
The task performance is evaluated objectively following two criteria: (i) efficiency, and (ii) success rate in finding the hidden object. The efficiency and success rate are measured following (7) and (8) respectively. In (7),
The service beneficiary (human2)’s likeability about the services he/she receives through the collaboration between the robot and the virtual human is expressed through the level of service beneficiary’s (i) satisfaction in the service, (ii) own trust in the collaboration, and (iii) dependability in the service. These are assessed using the Likert scale in Fig. 10.
The agents and their interactions
The main objective of benchmarking is to compare the evaluation results of the artificial agents to some standards to rate the level of the attributes, interaction quality and performance of the artificial agents [4]. To do so, four additional experiment protocols were adopted where four different master agents collaborated with the real human (follower agent) to search the hidden object as in Table 1 (protocols#1–4). In protocols#1–4 (benchmark interactions), the real human received instructions from various master agents such as another real human (protocol#1), a human appeared through the Skype (protocol#2), the virtual human (protocol#3), and the robot (protocol#4) for finding out the hidden object. In protocol #5 (targeted interaction), the virtual human and the robot cooperated to find out the hidden object. Note that the virtual human is the software-generated human-like animated virtual artificial character [2,14,24,46,50]. On the other hand, the Skyped-human is an actual human appeared in front of another agent from a distant place through Skype connection, i.e. a real human in the virtual situation (Skype).
As the master agent, the VH has human-like appearance and functionalities, but it is artificial, screen-based and cannot appear physically in front of the follower agent. Similarly, as the master agent, the tele-presented Skyped-human from room 3 cannot appear physically in front of the follower agent in room 2. It appears through a screen, but it is natural and it is believed to be the physically non-appeared (screen-based) agent with the highest intelligence and autonomy (as its origin is the human) [10]. Thus, the Skyped-human is considered as the benchmark or standard for the VH. On the other hand, the HR is physically embodied and existed in the physical world like the real human, it has human-like appearance and functionalities, but it is artificial [33]. The human is believed to be the physically embodied natural social agent with extraordinary cognitive abilities, intelligence and autonomy [10]. Hence, the real human is considered as the benchmark for the HR. In summary, the protocol#5 is used to evaluate the interaction/collaboration between the VH and the HR, which is the main goal of this article, and the interactions in protocols#1–4 are used as the benchmarks for the interactions between the VH and the HR in protocol#5.
As Table 1 shows, in the first 4 interaction protocols, the human is the follower agent and it has interaction with the overall system or the master agent. In such cases, human’s interaction is utilized to benchmark the performance of the master agents. In the 5th interaction protocols, the humanoid robot and the virtual human interact and collaborate to perform the collaborative task. Human’s interaction is reflected here through several ways: (i) giving command to the humanoid robot and the virtual human to accomplish the task, (ii) receiving the benefits of the performed task being present at the environment where the task is performed, and (iii) evaluating the performance of the task performed through the collaboration between the humanoid robot and the virtual human, etc. Such interaction also justifies the collaboration between the robot and the virtual human with an assistive perspective as the assistance receiver (human) has interaction with the assistant agents (robot, virtual human).
Experimental evaluation
Subjects
Fifty two (52) human subjects (engineering students, males 46, females 6, mean age 24.59 years with variance 2.76) were recruited to participate in the experiment protocols in Table 1. Out of the 52 subjects, one was used as the master agent for protocol #1 (subject1), one was used as the Skyped-human (master agent) for protocol#2 (subject2, Human3). The remaining 50 subjects were used to form five independent groups (Groups I, II, III, IV, V) each consisting of 10 subjects. Subjects in Groups I, II, III, and IV were used separately as the follower agents in protocols#1, 2, 3 and 4 respectively. Subjects in Group V were used as the service beneficiary (human2) in protocol#5. All the subjects reported to be healthy with sound functionalities of their eyes and ears. The study was approved by concerned ethical committee.
Experiment objectives
The objectives of the experiment were to evaluate the effectiveness of the collaboration scheme (Section 6) between the humanoid robot and the virtual human through the common platform (Fig. 5) for the real-world collaborative task (protocol#5), and to benchmark the results with standards (protocols#1–4).
Experiment design
The independent variables were various types of interactions between the agents (Table 1), and the dependent variables were the evaluation criteria (Section 7.2).
Hypothesis
It was hypothesized that more human-like attributes in artificial agents might enhance the quality of their interactions with humans during the collaboration [33].
Experimental procedures
Protocol#1: The experimenter (human1) kept the object hidden in any of the 10 boxes (see Figs 8–9), for example, in Box 7. The master agent (subject1) stood at P3 (see Fig. 8) keeping the face towards point P1. The follower agent (a human subject) stood at P1 keeping his/her face towards P3.Then, the master agent instructed (once only) the follower agent how to find out the hidden object through collaboration. The collaboration procedures included similar verbal, gestural and emotional expressions and functions as the robot and the virtual human used for their collaboration described in Section 6.1. Figure 11(a) illustrates the procedures for protocol#1.

Collaboration between a master and a follower agent for different benchmark protocols.
At the end of a collaboration trial, the follower agent subjectively evaluated the attributes of the master agent and the quality of the interactions between him/her and the master agent (except team fluency) following the evaluation scheme. Likeability of the service beneficiary (human2) was not evaluated as there was no service beneficiary for this benchmark trial. To measure team fluency and task performance, the experimenter recorded various time-related data for the complete trial and recorded whether or not the collaboration was successful to find out the hidden object. The follower agent was then replaced by another subject (but the master agent was unchanged) and the trial was repeated for each new follower agent (subject), and in this way, the experiment was conducted for all 10 subjects of Group I separately.
Protocol#2: the procedures were same as those employed for Protocol#1, but subject2 was standing in room 3, appeared in the large screen in room 2 via Skype, and then served as the master agent, i.e. the subject1 (a real human) of Protocol#1 was replaced by subject2 (a Skyped-human) as the master agent. Figure 11(b) illustrates the procedures. Group II subjects participated in this protocol separately.
Protocol#3: the procedures were same as those employed for Protocol#1 or 2, but the virtual human served as the master agent, as illustrated in Fig. 11(c). Group III subjects participated in this protocol.
Protocol#4: the procedures were same as those employed for Protocol#1, 2 or 3, but the robot served as the master agent, as illustrated in Fig. 11(d). Group IV subjects participated in this protocol.
Protocol#5: The default protocol was that the virtual human served as the master agent and the humanoid robot served as the follower agent to find out the hidden object through collaboration between them following the collaboration scheme presented in Section 6 and in Fig. 7. However, depending on the trust values, the role of the master and the follower agents might switch. The collaboration procedures are illustrated in Fig. 9. At first, practice trials were performed. The information on the agent performance and faults obtained through the practices was used to compute the constants (
Constants and thresholds for trust computation
Then, the actual experiments were conducted, and the experiment procedures were same as those employed for Protocols#1–4. The follower agent was an artificial agent, and was unable to evaluate its collaboration with the master agent. Hence, each subject from Group V separately acted as the service beneficiary (human2) and evaluated the collaboration between the master and the follower agents on behalf of the follower agent in each trial for the criteria of the evaluation scheme (Section 7.2) except the cognitive workload, team fluency and situation awareness. The service beneficiary also evaluated his/her likeability of the services, and the experimenter recorded data to measure the team fluency and the task performance for each trial following the evaluation scheme.
Figure 12 shows the values of robot’s trust in virtual human and virtual human’s trust in robot for the trials for protocol#5. The results show that the trust of the agents in each other was high (88% and above), which indicates their willingness (here, rationality or practicality) to collaborate for the common task. The high trust also proves the ability of the artificial agents to produce high performance and avoid faults during the collaborative task [16,25]. The results thus justify the effectiveness of generating similar and satisfactory skills and capabilities in the artificial agents of heterogeneous realities for a real-world common task. The results in Fig. 12 show that the virtual human served as the master agent and the humanoid robot served as the follower agent in 80% trials because the robot’s trust in the virtual human was greater than the virtual human’s trust in the robot in 80% trials. However, in the remaining 20% trials, the roles of the agents as master and follower (and hence their turns in taking initiatives) were switched based on the bilateral trust values according to the collaboration scheme in Fig. 7 [28]. The results thus prove the mixed-initiatives in the collaboration [6].
Figure 13 shows that in the master-follower interactions, the attributes of the master agents such as the levels of anthropomorphism and embodiment, quality of gesture and actions and stability are the highest for the human, the second highest attributes belong to the Skyped-human, the third highest attributes belong to the humanoid robot, and the 4th highest attributes belong to the virtual human. The highest attributes of the human as the master agent are very logical as the human is the natural agents with the highest levels of attributes [10]. The Skyped-human was originated from the real human, and thus its attributes as perceived by the subjects were close to the real human. However, the Skyped-human’s attributes were lower than those of the real human due to the reason that the Skyped-human was confined to the screen with reduced (two) dimensionality.

Values of robot’s trust in virtual human and virtual human’s trust in robot for the trials.

Evaluation of attributes of the master agents in various interaction protocols (#1–5).
The results show that the robot’s attributes were much lower than those of its natural counterparts (real human) because the robot was the artificial agent though it had three dimensions similar as the human. Again, the attributes of the virtual human were lower than those of its natural counterpart, the Skyped-human. Both the virtual human and the Skyped-human were two-dimensional, but the Skyped-human had better attributes perceived by the subjects as it was originated from the human, which is natural.

Evaluation of interaction quality between the agents in various interaction protocols (#1–5).
The attributes of the humanoid robot were perceived higher than those of the virtual human by the subjects probably due to the reason that though both agents were artificial, the robot had three dimensions in the physical environment, and on the contrary the virtual human had two dimensions within the screen only. However, there were slight different results in stability between the robot and the virtual human. The stability of the robot was lower than that of the virtual human probably due to the reasons that the robot was affected by the external disturbances such as floor roughness, motor temperature, obstacles, air resistance, etc. [22]. Nevertheless, the attributes of both artificial master agents (robot and virtual human) were satisfactory in comparison with those for their natural counterparts (human and Skyped-human), which justify the effectiveness of generating human-like satisfactory attributes in the artificial agents of heterogeneous realities for the real-world task [33].
Figure 14 shows the quality of interactions between the follower agent (human) and different master agents as well as between the artificial agents (robot and virtual human). The results show that the agent’s interaction quality seems to commensurate with the agent’s attributes in Fig. 13, which justifies the hypothesis that more human-like attributes in artificial agents enhance the quality of their interactions with humans [33]. It was noted that the quality of interactions between the artificial agents was satisfactory in comparison with that for their natural counterparts (benchmarks), which justify the quality of generating similar and satisfactory interaction abilities in the artificial agents of heterogeneous realities for the real-world common task.
Figures 13 and 14 show that the attributes and interaction quality of the humanoid robot as the master agent were better than those of the virtual human. However, Fig. 12 shows that the robot’s trust in the virtual human was higher than the virtual human’s trust in the robot, which means that the performance and fault avoidance abilities of the virtual human were higher than those of the humanoid robot. This is why, in most cases the virtual human served as the master agent and took the collaboration initiative (Fig. 12), and the humanoid robot served as the follower agent. It might happen due to the reason that the speed of the robot was slower than that of the virtual human, which reduced the performance of the robot. Again, the robot was easily affected by external disturbances that reduced its fault avoidance ability. It is believed that the comparatively lower performance (speed) and higher faults (caused by disturbances) of the robot might result in comparatively lower trust of the virtual human in the robot [16,25,36,37].
NAO robot’s mobility skills are rather average with respect to precision. Nevertheless, NAO was used as it is one of the most successful robotic platforms for conducting similar research as presented herein. It seems that this is the reason for the superiority of the trust in the virtual human against the robot (Fig. 12). The NAO wearing an IMU was also used for determining 3D coordinates of its fingertip when it pointed at the object (box) [22]. The virtual human was custom-built that included most of the features of representative virtual humans [20,51]. It is not an issue whether the robot has more trust on the virtual human or the vice versa. Trustworthiness of the robot and the virtual human depends on their performance and accuracy [25,39]. The objective is not to prove who is more trustworthy or superior. Instead, the objective is to prove that the common platform helps heterogeneous agents collaborate for a common goal, their bilateral trust is measured in real-time and is used to determine their leadership. As Fig. 12 shows, the virtual human trusts the robot less. However, the results can be opposite if another robot with better mobility and precision is used [36,37].
Figure 15 shows that, in benchmark protocols#1–4, the follower (human)’s cognitive workload was the lowest when a human was the master agent (protocol#1). Then, the workload slightly increased when a Skyped-human served as the master agent (protocol#2). It was logical as the human was the natural agent and the Skyped-human was originated from the natural agent. The results show that the follower’s workload increased when the artificial agents (virtual human or robot) served as the master agents (protocols#3-4). However, the increment in workload was not so high, which proved that the artificial agents were able to produce expected workload in the follower agents in comparison with their natural benchmarks [4,41]. Between the virtual human and the robot as master agents, the follower’s workload for mental and physical demands for the case when the robot served as the master was lower than that when the virtual human served as the master. It happened as the robot could instruct the follower agent physically, which was not possible for the virtual human. However, the follower’s workload for temporal demand, performance, effort and frustration increased when the robot served as the master agent probably due to the reason that the robot was affected much by external disturbances and thus its performance and fault avoidance abilities were lower.
Figure 16 shows that situation awareness of the follower agents (humans) for HR-H and VH-H interactions are satisfactory in comparison with the benchmark H-H and SkypedH-H interactions respectively, which thus prove advanced interaction capability of the artificial agents. Again, the robot produces slightly better situation awareness in the human due to its physical existence and interface, embodiment and more human-likeness [33,43].

Mean cognitive workload of the follower agent (human) for different benchmark collaborations for the common collaborative task (protocols#1–4).

Mean situation awareness of the follower agent (human) for different collaboration protocols for the task (protocols#1–4).
Table 3 shows the evaluation results with standard deviations in parentheses for different team fluency criteria for different interactions [17]. The results show that the trends in team fluency among different interactions match with that in the interaction quality. The fluency in VH-HR and HR-VH in comparison with that in the benchmark protocols show that the collaboration between the robot and the virtual human was fluent even though they were artificial agents. Note that less idle time, non-concurrent activity time and delay indicate more fluency [17]. The results show that the team fluency is high for all interaction protocols including VH-HR and HR-VH.
Team fluency evaluation results for different interaction protocols (standard deviations)
Figure 17 shows that the success rate in finding the hidden object through the collaboration between the artificial agents of heterogeneous realities (robot and virtual human in protocol#5) was 100%. The efficiency in the collaborative search task was also high. It is believed that the high level team fluency as in Table 3 positively influenced the efficiency in the collaboration [17]. All these results indicate the effectiveness of the real-world collaboration between the humanoid robot and the virtual human for the common real-world task. Again, the humanoid robot and the virtual human were operated through a common platform (Fig. 5), and the results thus prove the effectiveness of the common platform between the artificial agents as well [34–38].
Figure 18 shows the likeability of the service beneficiaries (human2 or the subjects) for the services they received from the collaboration between the humanoid robot and the virtual human in protocol#5. The results show that the service beneficiaries were satisfied with the services, and thus their own trust towards the collaboration between the artificial agents was also high [16]. High level satisfaction and trust of the service beneficiaries in the collaboration indicate their interest to receive the services. All these including the high level dependability of the service beneficiaries on the services justify the potential of the applicability of the collaboration between the humanoid robot and the virtual human for the benefits and welfare of humans for various purposes such as social companionships for old and lonely people, assisted living [49], etc. The results also show that the likeability was slightly higher when the virtual human served as the master agent than when the robot served as the master agent in the collaboration.
The results in general show that, being a physical agent, the robot had better attributes and interaction quality than the virtual human. However, the robot’s lower stability and vulnerability to disturbances compared to its virtual counterpart might reduce its overall performance, accuracy, trustworthiness and likeability, which also reduced its role in taking initiatives in the mixed-initiative collaboration [6]. However, despite having slight differences in attributes, interaction quality, stability, performance and accuracy between the robot and the virtual human as master agents, the fact is that the real-world collaboration between the artificial agents through the common platform was very successful.

Efficiency and success rate in the collaborative search task between the humanoid robot and the virtual human for various trials for protocol#5. In most trials, the virtual human served as the master agent and the humanoid robot served as the follower agent (Fig. 12).

Evaluation of service beneficiary’s likeability in the services provided through the collaboration between the humanoid robot and the virtual human for protocol#5.
Analyses of Variances (ANOVAs) showed that variations in evaluation scores of agent attributes (Fig. 13), interaction quality (Fig. 14, Table 3), follower agent’s workload (Fig. 15), follower human’s situation awareness (Fig. 16) and service beneficiary’s likeability (Fig. 18) due to subjects were statistically nonsignificant (
The key results can be summarized as follows:
The trust of the agents in each other was high (88% and above). The virtual human served as the master agent and the humanoid robot served as the follower agent in 80% trials. In the remaining 20% trials, the role of the agents as a master and a follower (and hence their turns in taking initiatives) were switched based on bilateral trust values. The results prove the trust-triggered mixed-initiatives in the collaboration as effective. The human, Skyped-human, robot and the virtual human were ranked as the first, second, third and fourth in their attributes perceived by human beneficiaries when they served as the master agents. Similar ranks were also observed for their interaction quality and team fluency. The follower (human)’s cognitive workload gradually increased when the master agent became more artificial or less natural. For example, workload was the least when human was the master agent. Then, the follower’s workload increased when a Skyped-human served as the master. Between the robot and the virtual human as the master agents, the follower’s workload for mental and physical demands for the case when the robot served as the master was lower than that when the virtual human served as the master agent. However, the follower’s workload for the temporal demand, performance, effort and frustration increased when the robot served as the master agent. Situation awareness of humans for HR-H and VH-H interactions were satisfactory in comparison with the benchmark H-H and SkypedH-H interactions respectively. The robot provided better situation awareness in humans than the virtual human. The success rate in finding the hidden object through collaboration between the artificial agents of heterogeneous realities was 100%. The efficiency in the collaborative search task was also high (above 90%). The likeability of the service beneficiaries (humans) for the services they received from the collaboration between the robot and the virtual human was satisfactory, and humans’ trust towards the collaboration was high as a result. Despite having slight differences in attributes, interaction quality, stability, performance and accuracy between the robot and the virtual human as the master agents, the real-world collaboration between them through the common platform was successful. As a whole, integration of benchmarking with trust-triggered mixed-initiative collaboration through the common platform seems to enhance the effectiveness and practicality of the overall key results in comparison with previous results [34–38].
Discussion
Ambient intelligence
The term “ambient intelligence” was used here with limited scope and meaning [49], for example, the intelligence gained through the Kinect-based vision sensing. This justifies the use of the term ambient intelligence. However, to expand the scope of ambient intelligence, it can also be a good idea to attach some sensors with the boxes or to use some external sensors to track the boxes, etc.
Anthropomorphism and embodiment
The anthropomorphism and embodiment of the artificial agents were considered as the attributes of the agents with the notion that the human user might expect it, and it might impact user’s trust. However, trust does not depend only on these factors. Instead, agent performance and precision (fault status) are the main factors determining trust [16,40]. Investigating individual impact of anthropomorphism and embodiment of artificial agents on user’s trust can be useful to analyze the trust related results. Again, a virtual human is limited in its environment, and the robot is an extended bodily feature. However, only this difference may not perfectly state their overall performance. For example, one agent poor in embodiment or anthropomorphism may be more intelligent, interactive and autonomous than another agent with better anthropomorphism and embodiment. The anthropomorphism and embodiment may also be complementary. This is why, anthropomorphism and embodiment along with agent performance and precision need to be considered when analyzing the mutual trust results.
Generality of the results
The presented robot-virtual human collaboration case is not a special case. Instead, it is an example of real-world collaborative tasks between a real (humanoid robot) and a virtual (virtual human) character through a common platform based on mutual trust. This task can represent all types of collaborations between robots and virtual humans without significant loss of generality. The overall results can be still valid even though the robot and the virtual human are replaced by other types of robots and virtual humans given that they are operated through the common platform, and their performance and accuracy levels are modeled and measured in real-time to estimate their bilateral trust. To prove the generality and robustness of the proposed approach, the approach may need to be verified using several other robotic platforms and different kinds of virtual humans for different types of collaborative tasks [1–3,9,13,14,24,26,27,29,30,33,41,45–47,50,53,56]. Note that it is not easy to implement all possible cases and prove the generality of the proposed approach based on results of all possible cases. This is why, the concept was proved here using a specific scenario, and the overall obtained results can show the potential of proving effectiveness of the proposed approach in general.
The main idea behind generalization is that the proposed concept can be applied to many similar scenarios directly and indirectly with some small and specific modifications to get appropriate benefits. Note that here the generality can mean generality in various aspects: (i) agent type, (ii) task type, (iii) user type, (iv) environment type, etc. It means that the proposed approach can be made effective for any agents, for any collaborative tasks, for any segments of users, and in any task environments if the collaboration design is enhanced and customized to fit with these requirements. The proposed generality can be reached by adjusting the agent capabilities and collaboration methods and materials for different aspects of the generality. The generalization is necessary so that the scope of applications of the results increase, and the results can be utilized for many relevant cases directly and indirectly.
Extrapolating the results to real applications
During the experiment, each trial or run was independent. One run meant the implementation of Protocol#5 in Section 8.5 once. Several runs/trials were made for several observations that helped produce necessary data to analyze and understand the results and the general trends in the results. There is no limitation of the number of runs because more runs can produce more reliable results. However, in actual practice, one run can mean one command from the user that can make one collaboration trial to perform one task. Multiple runs can generate multiple independent trials or commands to perform repeated or multiple tasks. Again, there was limitation in the number of turn taking in a run in the presented scenario. Here, in one run, the turn of initiative as the master or the follower was decided once based on the mutual trust values of the agents in each other in the previous run. Multiple turns of initiatives can be taken in a single run if the trust values can be measured and updated multiple times within a single run.
In the presented collaborative system, there is no chance that at the end there will be no interest of the user to find out an object while the user knows where the object is. Because, during the experiment, each trial was independent and in each trial the experimenter put the object in a new location. In actual applications, each trial will represent a separate and new command of the user to the artificial agents for new tasks and one task may be different from others. However, the agents and the systems need to be made very robust so that they can perform so many new commands. Here, only one example of many potential commands possibly made by human users is shown.
Conclusions and future works
Two artificial agents of heterogeneous realities (a physical humanoid robot and a virtual human) were developed with a set of similar skills, intelligence and capabilities, and were integrated through a common platform to perform a real-world common task. Bilateral trust models for the agents were derived and a mixed-initiative collaboration scheme between the agents were developed where the agents’ roles and initiatives were switched between the agents triggered by the bilateral trust of prior trial. The collaboration was evaluated following a comprehensive evaluation scheme and the results justified the effectiveness of the collaboration involving the novel approaches. In addition, an appropriate benchmarking strategy was proposed to benchmark the real-world collaborative performance. Integration of benchmarking with trust-triggered mixed-initiative collaboration through the common platform was performed to enhance the effectiveness and practicality of the entire results.
The integrated and comparative approaches enhanced the intelligence, autonomy and capabilities of the virtual human and the robot. It was made feasible that the virtual human could perform with another real-world artificial agent to solve real-world problems, which enhanced the scope and application paradigm of the virtual human that apparently meant that the virtual human was empowered to perform beyond its traditional virtual environment. The integration of artificial agents using appropriate sensing methods and communication networks seemed to reduce the application complexity and costs, enhance the individual contributions of the agents, devices and systems by enabling new services that could not be performed by any of the agents, devices and systems itself. Thus, the results possess potential to help humans develop intelligent artificial social agents of heterogeneous realities to assist humans in various real-world complex social tasks or to get the real-world social tasks done in cooperation between the agents. The results can be especially useful to develop adaptive social ecologies, cyber-physical systems using smart social agents of heterogeneous realities, and smart homes and social spaces with ambient intelligence for various purposes such as assisted living, social companionships, etc. [36,37].
In the future, in particular, intelligence of the agents as well as scope and robustness of the integration between the agents will be enhanced to make the collaboration feasible for the object hidden in any location in the space. Human’s interaction with the overall system and the assistive perspective of the application will be made clearer. In general, agent capabilities will be further increased by enriching with natural language processing, AI, machine learning and intelligent controls. The scope of ambient intelligence will be enhanced. Individual impact of anthropomorphism and embodiment of artificial agents on user’s trust will be investigated. Validity and generality of the approaches and the results will be further investigated.
