Abstract
The need for flexible production has turned manufacturing’s attention to integrate fast and uncomplicated solutions. Collaborative robots (cobots) have been considered the most impactful technology due to their versatility and human-robot interaction feature. Its implementation requires expertise in both process and cobot programming. Consequently, demand for effective programming training has increased over the past years. This paper, then, aims to design and explore a smart cobot programming system and conduct an empirical study to understand human engagement and programming performance. A repertory grid is employed based on cobot experts to understand different cobot programming approaches. Meaningful insights were considered to design and implement a smart programming system configuration. Then, an empirical programming study was performed considering cobot expertise and human engagement. Results demonstrated similarities and disparities in data collected, which was inferred to indicate differences in cobot programming behavior. Finally, the work identifies and discusses patterns to differentiate programmer expertise levels and behaviors.
Keywords
Introduction
The advent of industry 4.0 and of manufacturing’s need for customization has demanded increased flexibility. However, to retain high volume capabilities, industrial automation has been challenged to conduct research and develop faster and uncomplicated production system design and deployment methods. In response, manufacturers have investigated the adoption of collaborative robots (cobots) as a tool to integrate flexibility and human decision-making (Djuric et al., 2016) into traditionally hard automated processes. Cobots usage ranges from material handling to assembly and advanced manufacturing (Cohen et al., 2021). It started to gain attention from research and industry in 2014, and, since then, cobot implementations have significantly increased (Matheson et al., 2019). Cobot implementation differs from other robots as it allows humans and robots to interact with each other. Consequently, the importance of understanding, evaluating, and optimizing the interaction between cobots and human operators has increased. Critical to human-cobot system interactions is cobot programming, which, in this paper, is evaluated based on the time spent programming cobots to achieve several applications with acceptable accuracy, and precision standards and operator engagement when programming a cobot.
Although cobots significantly increase human interaction compared to traditional robots, they are still subject to operator capabilities, such as stamina, repeatability, and speed (Müller et al., 2016). These human characteristics usually limit the efficiency and quality of human-cobot manufacturing processes (El Zaatari et al., 2019, Rußmann et al., 2015). To overcome this limitation, entry-level cobot operators are entered into cobot programming training that capacitates interacting with new collaborative automation technologies. The goals of these programs are to reduce the time-to-learn and lower the human-robot interaction barriers. As a result of the training, operators can achieve a faster pace for cobot deployment and process adaptation while matching the production needs.
While the demand for effective training programs has increased at a 16.1% compound annual growth rate (Educational Robot Market Size Global Forecast to 2026 |MarketsandMarkets TM , n.d.), manufacturing has lacked a system capable of better accommodating user skills at the best performance. Intelligently perceiving cobot operator behavior supports managing operator weaknesses and strengths during the learning phase. This support is only achievable in smart systems capable of monitoring their own components and adapting them for enhanced operation. Smart manufacturing is a promising approach for satisfying the ever-growing needs of the customer and the increasing supply costs by converging cutting-edge information and communications technology and devices, such as IoT (Ezell, n.d.), data analysis (He & Wang, 2018; Moyne & Iskandar, 2017; Shukla et al., 2019), machine learning (Bajic et al., n.d.; Nicora et al., 2021) and to support a more efficient and secure automation process (Evjemo et al., 2020; Phuyal et al., 2020). Combining these technologies has introduced a new vision, human-centric manufacturing processes, where manufacturing integrates the benefits of automation with human flexibility and decision making (Liu and Wang, 2020).
Achieving this vision requires customized systems that optimize human activity and learning for next generation manufacturing technologies, such as cobots and cobot programming. This paper presents an empirical study that incorporates human engagement into cobot programming performance to address this need. This paper is organized as follows, Section 2 conceptualizes and aggregates literature related to smart cobot programming systems. Section 3 gathers knowledge of requirement and functionality specifications of a smart cobot programming system by comparing different types of programming systems. Section 4 presents the framework design for human-cobot programming performance assessment, which combines human data, extracted from facial feature recognition, and task execution data collected from cobot sensors. Section 5 demonstrates empirical case studies of the system with cobot programmers from different experience levels undertaking two different cobot programming tasks. This section also analyzes the results and discusses evidence of the distinctive programming behaviors identified from each expertise level. Finally, Section 6 concludes with principal findings, implications for research, future implementation, and broad relevance to manufacturing.
Literature review: Cobot programming in smart manufacturing
The demography shift to a new generation in the manufacturing workforce carries with it a change in beliefs and values. In a deep exploratory study, Hurtienne observed that a collaborative environment where the employee can actively engage is one of the most impactful features of the work environment (Hurtienne et al., 2021). Based on interviews, Kaasinen identified a strong need for an environment that is adaptive to the employee in an industry 4.0 era (Kaasinen et al., 2020). The futuristic work environment design suggests that personalized training to support competence and learning development benefits employee engagement. To achieve this, organizational support and work resources should be available, as this has been proven to provide a positive impact on human engagement (Coetzer & Rothmann, n.d.). In addition, Molino (Molino et al., 2020) showed a strong positive impact on the work engagement associated with opportunities for information and training within technology acceptance, which was observed to be even more impactful on blue-collar workers. The work environment, consequently, has a great potential to explore human training while emphasizing employee engagement.
From logistics to assembly, manufacturing environments have adapted cobots to a wide variety of businesses and applications (Bøgh et al., 2012). Human-robot collaboration research in the manufacturing domain dates back to 2008, where researchers attempted to introduce the cooperation of humans and machines into assembly lines (De Santis et al., 2008). In small- and medium-sized enterprises (SMEs), the usually limited resources for personnel and investment dramatically impact the implementation of automation. Consequently, cobots’ fast deployment and user-friendliness characteristics effectively address the need for low-cost, rapid adaptive technology in manufacturing (Pieska et al., 2018). However, there are difficulties that humans face when programming cobots. By having a diverse programmer population, Hader (Hader, 2021) demonstrated the inability to solve assembly programming tasks, which contained problems such as modeling the program flow and task assignment. In a large enterprise, Schou (Schou et al., 2018) demonstrated programming problems among novice and robot experts even when using an improved User Interface for first-time robot task designing. In addition, three coding environments were evaluated by Weintrop (Weintrop et al., 2018), and showed that adults with minimum experience had the same error patterns: missing code snippets and incorrect positioning.
Although multiple difficulties persist in cobot programming, overall cobot programming performance relies on operator engagement when interacting with cobots (Toichoa Eyam et al., 2021). Ejsmont (Ejsmont et al., 2020) states that the promotion of a comfortable workspace through human-cobot interaction technologies improves human trust in machines along with performance. Alimardani (Alimardani et al., 2020) showed, through EEG measures, that social robots encourage feelings such as co-presence and mind perception. The trust generated from cobot situational awareness can enhance human performance and motivation during a human-robot interaction (Alimardani et al., 2020; Wiese et al., 2017). Human cobot engagement studies typically rely on EEG sensors, which are very sensitive to human body conditions, making it difficult to account for all types of noise produced by internal and external factors of an individual. Spezialetti (Spezialetti et al., 2020) emphasizes that human-interaction understanding is challenged by producing a reliable solution by measuring and assimilating correctly human emotion using EEG signals.
Human-robot interaction research, therefore, lacks a robust methodology to incorporate both human body variability and interaction understanding. Research using social robots (Anzalone et al., 2015) has suggested that human engagement could be extracted from posture and gaze behaviors. Khamassi (Khamassi et al., 2018) demonstrated an engagement estimation methodology by head position while interacting with cobot. As mentioned in his work, this type of approach drops estimation accuracy when high disturbances are present in the body measurement. As an alternative, engagement can be measured by facial feature recognition, as indicated by Ben-Youssef (Ben-Youssef et al., 2017).
Prior work on user engagement leveraged hand-crafted features (energy filters, box filter) on cropped face images with machine learning techniques (Whitehill et al., 2014). Bosch et al., (Bosch et al., n.d.) investigated head pose and position for 19 action units to recognize the effect on students in a learning environment. Recent works used OpenFace (Baltrusaitis et al., 2018) to extract a comprehensive feature of the face, including landmarks, gaze, action units, and head movement information, that characterizes the sequence of temporal dimension changes (Thong Huynh et al., 2019; Toyoda et al., 2021). The statistical measures of these features were inputted into a deep recurrent network. Deep representation of faces from pretrained deep learning models for face recognition was also used (Zhu et al., 2020), but these models were much more time-consuming compared to the other methods.
In summary, there is a critical need to improve cobot programming skills in the next generation manufacturing workforce. While human engagement was identified as a key component for improving work-related efficiency, it still lacks a smart cobot programming system able to understand human-robot interaction and that is robust and scalable for human engagement monitoring.
An analysis of requirements for smart cobot programming
To analyze the smart cobot programming system requirement, this research first employs a knowledge elicitation method, Repertory Grid (RG) and a concept map to identify and visualize the list of requirements of a smart cobot programming system. Section 3.1 demonstrates the repertory grid process for analysis of cobot programming approaches, and Section 3.2 explains the reasoning of system requirements identified using a concept map as an illustration.
Repertory grid
To fully understand a smart cobot programming system, we need to collect a proper set of knowledge from cobot experts, while considering that human experts often have personal bias. Repertory Grid (RG) (Repertory Grids, n.d.) is used to understand the smart cobot programming system and to collect knowledge for requirements and specifications. RG is a cognitive interviewing technique devised by George Kelly and based on Personal Construct Psychology in the context of psychotherapy. RG is not restricted to clinical psychology and has been applied to education, market research, entertainment computing (Mol et al., 2021), and socio-technical systems (Dey & Lee, 2017). RG allows the interviewees (cobot experts) to interpret his or her cobot programming experience in a less biased way. Specifically, the RG analysis is used to identify cobot programming characteristics (as constructs) explained in Session 3.1.1 and programming approaches (as elements) explained in Session 3.1.2, and to cluster the constructs and elements, explained in Session 3.1.3, to obtain requirements for smart cobot programming system.
The RG analysis requires four main components: topic selection; element selection; construct selection; and rating and analysis. Here, the topic of the RG process is “analyzing cobot programming approaches”, which is aligned with our purpose. Section 3.1.1 lists different programming approaches with a description as elements. Section 3.1.2 defines the constructs, which are used to differentiate the programming approaches (elements). The constructs are identified by conducting a bipolar comparison to distinguish the pairs of elements. Section 3.1.3 illustrates the RG results showing the similarity of constructs, related construct clusters, and insights obtained.
Element and construct selection process
To conduct a reasonable RG, different cobot programming technologies were first considered and, from that list, nine approaches were selected for further analysis. After consulting with cobot programming experts and referring to recent cobot programming technology, we selected nine approaches as the RG elements, Table 1. They are teach-pendant, teach by demonstration, motion command, voice command, block-based programming, CAD simulation software (CAD), AR programming, VR programming, and remote-control programming. Then, the essential characteristics of the nine elements, such as system influence on humans, cobot related, programming related, and miscellaneous are discussed, identified, and agreed upon with the experts to build constructs. The miscellaneous group includes a user interface, VR/AR/CAD environment, and voice/motion programming. The characteristics that can differentiate cobot programming approaches (elements) are used as possible keywords to build constructs. The obtained construct keywords include safety, emotional/physical workload, situational awareness, and programming speed for system influence on the human. With these construct keywords, the constructs were detailed as a sentence form as shown in Table 1 (in the Description column).
Cobot programming approaches
Cobot programming approaches
Based on the characteristics of the elements, RG constructs were developed as shown in Table 2. Several interviews with questionnaires were used to modify the constructs to better describe their intended meaning and to reduce redundancy. Then, RG ratings were assigned by experts to gather scores for the RG. The RG table contains constructs and elements, with the constructs located to the left and right with a scale of 1 to 7. The final ratings are shown in Table 2.
RG table with rating
RG table with rating
After RG rating, the RG table was analyzed with R software (R: The R Project for Statistical Computing, n.d.) and OpenRepGrid package (Burr et al., 2020). The analysis results are presented as a Bertin matrix (Clustering of Constructs and Elements, n.d.). There are two distance trees for elements and constructs, each at the bottom and right sides of the matrix table (Fig. 1). The default cluster analysis method, Euclidean, is used. Euclidean measures a squared distance between two vectors, which is from the respondent ratings. The shorter Euclidean distance between the constructors, the more similar they are (http://docu.openrepgrid.org/clustering.html). Four clusters show strong connections based on the similarity method mentioned above, and meaningful constructors for the study are marked in red boxes Fig. 1. RG clustered Bertin matrix table with related constructs cluster marked. Some noticeable observations are summarized and interpreted below for further discussion.

RG clustered Bertin matrix table with related constructs cluster marked.
The precise positioning of the cobot and the time for program validation are interrelated.
The development cost for the system user interface and a situational awareness system are pertinent, and they may be related to reliable programming codes.
The cobot programming time is connected to the cobot idle time during manual program updating, and they may also be connected to the shape intricacy level.
The difficulty of programming adjustment and the programmer’s emotional workload are connected.
Reviewing the four clusters indicated in the Bertin matrix, the RG study provided us with additional insights related to the smart cobot programming system. For example, as more human-focused cobot systems dominate the market, the human-related constructs should be investigated further to achieve a human-friendly workspace (Cluster #2 and Cluster #4). These constructs represent some essential requirements, such as a cobot programming user interface, monitoring cameras, recognition algorithm, decision making, and deliverable programming codes in a physical smart cobot programming system.
For Cluster #4, a programmer with a high emotional workload may have more complexity and difficulty to perform program adjustments while interacting with the cobot. For Cluster #2, the low development cost for a system user interface is related to a non-situational awareness system and a non-reliable programming code. Converting these two cluster results into a physical system, a user interface should provide easy programming adjustment as it relates to operator mental concerns, and sensor technology should be included to collect operator status and provide feedback. These observations are considered to identify the requirements of the smart cobot programming systems, discussed in more detail in the next section. Although the RG analysis shows some similarity between some constructs, specific cause and effect relationships and conflicting relationships may not be well captured. These aspects will be studied further and reported as a separate article.
To conceptualize and visualize the smart cobot programming system requirements obtained from the RG analysis, we first build a concept map for the groups of related insightful information and the requirements identified, Fig. 2. The concepts are named based on observations from the RG analysis and related literature study and their relationships are indicated in a simple map. As an example, the “cobot” in the concept map’s first layer subject has several sub-subjects: “programming method”, “Programming user interface” and “Programming codes,” which are extracted from our RG analysis table. As mentioned in the previous section, the difficulty of cobot programming involves a substantial relationship with the programmer’s emotional workload (indicated from Cluster #2), and the concept map includes human operator sensing. Based on the aforementioned information, we concluded a smart cobot programming system requires several aspects, including human operators, machine hardware, and data collection tools in the concept map, Fig. 2.

A concept map of smart cobot programming system with potential components.
Here, we briefly explain the process of the concept map construction. In the concept map, a smart human-robot collaboration system is the central topic (i.e., a starting point). Obviously, the system should contain humans and cobots. Data, for the humans and cobots, and data analysis is included in the concept map. To gather data, advanced sensor technologies are necessary. In a human-cobot system, the recognition target is important to decide at the beginning of the process of development. After these concepts are included in the map, more detailed information is included to enrich the concept map as shown in Fig. 2. According to the constructed concept map, we can detail the requirements and functionality specifications of a smart cobot programming system. Some examples are shown below.
A smart cobot programming system should be able to perceive the environmental events with respect to time or space in a data form. A smart cobot programming system should be able to receive inputs from the cobot or human operator. A smart cobot programming system should be able to provide user feedback, such as a focus-loss signal, to programmers.
This section presents details of the smart cobot programming system architecture, information flow structure, and the facial recognition algorithm for evaluating human operator engagement. Section 4.1 presents the system architecture, the role of each aspect and the processes of the system. Section 4.2 describes how the facial recognition algorithm captures operator engagement and this information is leveraged.
System architecture
The smart cobot programming system integrates sensor data acquisition and data analysis process. Essentially, the system collects and utilizes information from humans and cobots to assist programming tasks in a work cell, as indicated in the requirements. In Fig. 3, the system structure is divided into three layers: physical, transformation, and analytics. The physical layer corresponds to the physical assets in the programming work environment. The programmer is the system end-user, cobots are the task actuators, and the task corresponds to sequence planning, workspace design, and process parameters. The transformation layer represents the translation of the respective physical asset metrics into valuable metrics. The last layer, analytics, incorporates asset modeling and performance classification.

Human engagement integrated smart human-cobot training system.
The system is designed for users who program and assist cobots in complex human-robot tasks. To improve cobot involvement in the task, the system aims to optimize user programming efficiency and task performance by raising their programming knowledge. A better understanding of the cobot-task interaction can support cobots to achieve more complex tasks and decrease human effort at the same time. Consequently, it is critical that the system tracks human engagement status to understand the human programming conditions. According to the RG discussion mentioned in the previous sections, the system requires detection of whether a human is becoming distracted or disengaged and whether the task brings mental and emotional stress to the human. The system enables programmers to know if the current procedural task is appropriate for their status and capacity at the time, or if they should enhance their focus.
Cobots cooperate with human workers to complete repetitive tasks according to pre-set programs. However, traditional cobot systems do not consider the physical and mental condition of human workers. If humans are unable to perceive such conditions, human-cobot collaboration cannot be more efficient. The system in this paper, on the other hand, is designed to eliminate this gap through continuous human performance insight feedback to programmers and a self-adjusting process based on human engagement. In addition to human information, cobot information is collected to complete the cobot programming insight feedback loop. Cobot-related information is obtained through built-in and external sensors that monitor cobot components and process status, such as trajectory, current position, and speed. This information is critical to analyzing programmer performance, including programming time, process cycle time, and script quality.
The proposed system also considers the workspace and human model when evaluating cobot data in the process performance analysis. Workspace defines task burdens and environment details into the system, such as descriptive tasks and handled object position. Certain tasks require more programmer attention due to the high accuracy level requirements (e.g., manipulating a cobot’s tool in a low process tolerance) or complex sequence planning (e.g., assembling large numbers of components). Workspace data, then, is critical to systematically define subtask boundaries. A human model includes programming skill level, programming preference, and engagement behavior. The first two factors of the human analysis are a qualitative evaluation that can be performed through interviews, while the last one is obtained using human sensors. The proposed system uses web cameras as human sensors to capture real-time video of human-cobot interactions. During a task, facial recognition algorithms are used to provide and measure real-time human engagement metrics.
In this study, a customized OpenFace (Baltrusaitis et al., 2018) process and the algorithm based on work in Huynh et al. are employed to estimate human engagement (Thong Huynh et al., 2019). The gaze direction, eye landmarks, and head pose are calculated for every frame in the video with OpenFace and sent to a python module in real-time. Over a specific time, pre-defined (e.g., five seconds), the python module estimates the engagement score for that interval by dividing the received sequence into multiple overlap segments with equal length. Then, statistical features are computed for each segment and fed to two streams of a deep learning model as illustrated in Fig. 4. Each stream contains two long short-term memory (LSTM) layers for learning the temporal relationship between segments and two fully connected (FC) layers used for estimating scores for each segment, then the results are averaged to calculate the engagement score. The goal of the first FC layer is to transfer the feature set into a new representation that is learnable and improves system performance.

An overview of deep learning model for engagement estimation (Thong Huynh et al., 2019).
Cobot programming empirical study scenarios and experimentation setup
An empirical study has been developed and analyzed to demonstrate the smart cobot programming system. A flowchart of the programming evaluation process is shown in Fig. 5. Initially, two experiments were designed by cobot experts. Each of them required the 11 participants to program a cobot for a disassembly task. Each of the experiments had its own difficulty and goal. Once a disassembly task was selected, participants with different cobot programming experiences executed the disassembly programming and evaluation. Two cameras were used to record their engagement. In addition, cobot performance metrics were taken during and after programming. Then, participant engagement and cobot data were used to evaluate the smart cobot programming framework.

Human-cobot programming empirical study scenario.
Experimental data in this study was collected from ten participants composed of beginners and advanced programmers. Each participant was given process goals to be completed using a Universal Robot (UR) 5E cobot, but they were not instructed on how to complete each process. As a minimum requirement, cobot movement (controller and hand-guidance movements) and tool commands were taught to participants before the cobot programming evaluation. Before the participants began, the participants were allowed to ask questions to the experiment supervisor. The supervisor was responsible for setting up the testbed, including connecting the cobot and camera to the computer and ensuring that experimental data was being recorded.
After setup, the supervisor was instructed to leave the work area and let the participants complete the task without further assistance. During the experiment, participant engagement scores were obtained with OpenFace while cobot-related data was gathered through cobot supporting software, UR Log Viewer (Universal Robots - Log Viewer V1.2.1.0, n.d.), and teach pendant. When the task was completed, the participants notified the supervisor to stop the data collection regardless of whether they were successful or not.
Two disassembly tasks were designed based on cobot command and sequence complexity. The first task is shown in Fig. 6, where Fig. 6a represents the initial state of the process and Fig. 6b represents the final object location. As the first level of complexity, this task required the programmer to use basic cobot commands to pick an object and place it in a different location. It tested the participant’s ability to understand cobot movement and gripping commands. The user had to take a ball bearing from the shaft of a partially disassembled alternator and place it in a specific location to successfully complete the task, as displayed in Fig. 6b. The bearing was moved using the cobot gripper, and it could not collide with the alternator or perform a movement that would result in a cobot stop during the process. The cobot started from a pre-determined home position and the task was completed when the cobot was returned to the home position.

First cobot disassembly programming task. (a) initial step; (b) second state.
The second, more complex, disassembly task is shown in Fig. 7a-d. This task required more advanced cobot programming knowledge and operation experience. It maintained the same programming abilities; however, with more constrained movements. The cobot had to grip four pins located at the corners of an object and place them into a box, Fig. 7b. The pinhole diameter was 1 millimeter (0.04 inches) larger than the pin body, which required the user to precisely retrieve the pin without the gripper dropping the pin by colliding with the main frame. The second step required participants to remove the frame using their own hand and place it in the designated area (Fig. 7c). The third step was to remove four more pins on the base plate and place them into the box (Fig. 7d). Similar to the first task, the participants created a cobot program script starting from and returning to the home position.

Second cobot disassembly programming task. (a) components description in the object partially disassembled; (b) initial assembly state; (c) object state after disassembling four external pins; (d) object state after disassembling the frame component.
Figure 8 shows the workspace designed for the cobot programming empirical study. It consists of a cobot UR5e, a work object, two cameras, and a laptop. The laptop had OpenFace and UR Log Viewer installed for human and cobot monitoring. The two cameras were connected to the computer through USB cables and each camera had independent facial recognition algorithms. The cobot was connected via controller-laptop through ethernet cables and established a connection with the UR Log Viewer. The UR5e end-effector was a Robotic Hand-E gripper. Participants were required to use only the teach pendant or hand-guidance movements for programming. Position, speed, acceleration, joint torque, and other data were captured through the UR Log Viewer during the programming phase.

Empirical study workspace setup.
OpenFace software measured the participant’s engagement during cobot programming based on facial features. As displayed in Fig. 8, one camera was placed on the teach pendant and the other camera had a stand placed beside the cobot. The camera layout aimed to capture the programmer’s facial features when focusing on specific process aspects, either the programming interface or the cobot-object interaction.
Plastic strips and blue lines were used to limit the workspace and keep participants within the camera’s field of view. The plastic strips emitted a sound when the participant stepped on them as a reminder to return to the boundary. It was used to avoid losing track of users’ faces while not dramatically deviating participant attention from the cobot programming task. The laptop was placed outside of the camera view so the experiment supervisor could observe the user’s behavior.
Based on cluster results and the system requirements described in Section 3, there was a need to receive human inputs to capture emotional workload via task engagement and to evaluate the program accuracy and complexity via cobot program development and execution measures. Therefore, four parameters were selected in this study to reflect the state of the programmer when programming a cobot, Table 3. Programming time was measured using the video recording and the movement of the cobot from and to the home position. This parameter was used as a measure of the difficulty of the cobot programming task for the participants (i.e., the more challenging the task, the longer the programming time). Two engagement scores, each from one camera, were also collected to reflect the human-task engagement. Decision time recorded the length of time participants spent in understanding task requirements, thinking about solutions, and planning the cobot trajectory. It was measured by summing the time that the cobot remained static during the programming time.
Human data evaluation
Human data evaluation
Three cobot parameters were used to evaluate the cobot operation during the experiments: process cycle time, task success, and script score, Table 4. The process cycle time was defined as the amount of time the cobot took to complete the task. It was extracted from the final cobot program script execution. Cobot programs should avoid longer paths, leading to a shorter process cycle time. Task success referred to whether the cobot could complete the task according to the program script written by the participant. It was a binary parameter, where 0 indicated failure of task and 1 success. Failure conditions included collisions, parts falling, and parts being placed at incorrect locations. The script (cobot program) score was graded based on the number of lines and commands utilized. The script score ranged between 0 and 1. Since a shorter script is preferred, a higher score represents a more concise program.
Cobot data evaluation
Different behaviors among the programmers with distinct levels of experience were observed from the engagement data. The typical engagement scores of an advanced and a beginner programmer are displayed in Figs. 9 and 10, object camera and controller camera, respectively. These figures exemplify the behavior of having the average value of controller engagement higher than the object engagement for advanced programmers and having the opposite be observed for beginner programmers. The cause of this behavior is hypothesized to be due to the user focus being directed towards the less confident areas of the workspace, which increases the respective camera metric. In the experiments, the beginners dedicated more time to looking at the commands and deciding which action to take while the experts dedicated more time looking at the object and the cobot position during the tasks. The second observable difference was that beginner programmer had more frequent oscillation in controller engagement score. The oscillation was related to the redirection of the face from the controller to the cobot position, and vice versa, while moving the cobot using the teach pendant. This pattern was caused by their inexperience in how the cobot should be positioned in a certain location causing a rerouting of the cobot trajectory.

Engagement scores of object camera.

Engagement scores of controller camera.
Similar engagement patterns were also observed across users with different experience levels. During the programming task, participant behavior can be classified into three actions: programming, teach pendant movement, and hand-guidance movement. The first action was defined by looking for and selecting the commands at the teach pendant. It combined both the time for selecting a command and the decision-making time for choosing the next action. The second and third differentiated the way the cobot was moved, defined by selecting the directions on the teach pendant or moving the cobot by using the programmer’s hands, respectively. According to the nature of the action, the user’s attention was focused on the teach pendant, object, or both. Consequently, it was observed that the object engagement score was classified as low, medium, and high for programming, teaching pendant, and hand-guidance movement respectively. Furthermore, this pattern was observed independent of the programmer’s level of expertise.
There were observable points in time during the collection of the engagement data that the camera could not retrieve a score. The inability to produce a score was due to the requirement that the camera track movement during the entire collection interval period, set as three seconds in this experiment. If the camera lost face tracking for more than the collection interval, the engagement score would be a zero value. This occurred in two situations: (1) face tracking obstruction and (2) loss of attention. The first occurred when the user positioned their face close to the object location, blocking the camera view with the cobot arm. The second situation, mostly on the controller camera, happened when the user was distracted due to an external disturbance, such as a conversation or environmental noise. Both situations were considered a cause of task disengagement and they were not ignored during the programming assessment.
Results collected from both tasks are displayed in Fig. 11. The data obtained from the experiments were also aggregated by the programming experience due to the similarities in the engagement behavior. The expert group was represented by two individuals. The beginner group had seven participants with none to very little programming experience with any type of robot. Programming time, decision time, and process cycle time parameters were transformed using max-min normalization among all the data collected to allow comparison between the two classes. There were experiments that took sixteen minutes and others that were accomplished in less than two minutes, the normalization provided improved visualization of the difference between the programmer classes. The average, minimum, and maximum points from each engagement collection were also determined and plotted. Due to the unique analysis of the engagement behavior in this case scenario, the time series analysis was not covered by this work. Instead, a basic statistical analysis was completed, which provided an initial understanding of the relationship between the programmer class and the engagement behavior.

Human and cobot data collected during experiments.
Furthermore, the advanced programmer class was used to benchmark the optimal human and cobot performance to incorporate human programming insight in the smart cobot programming system. Ideally, any user would reach the same values as the advanced programmer with proper training. Consequently, a beginner programmer should work on or be assisted with program aspects that have the most disparity between the user and the benchmark class. However, there were some parameters with a wider gap between beginner and benchmark users, such as process cycle time, average controller engagement, and minimum controller engagement. The narrower the parameter’s deviation, the less variance a beginner is allowed and the higher the effort required to be reached the advanced mean value.
Of the non-engagement parameters, the script score and decision time require knowledge and confidence in the process. They can be related to the maximum controller engagement score when associating the feeling of success of using human training memory to solve a programming problem. Expanding this association, they presented a lower deviation compared to other parameters, which could be a key evaluation for identifying an expertise level of a chosen programmer. The programming time and process cycle time, on the other hand, are the quantitative performance classifiers. Although they have high variation among the experiments, there are distinct data regions for beginner and expert programmers. Better performance by experts is hypothesized to be due to a better understanding of how to position the cobot efficiently and where to move it during the entire task. This object focus insight is visualized on the average object engagement, which also had different regions for both classes.
With respect to emergent technologies, this paper presents a smart cobot programming system for considering both human and process performance. The system proposes an innovative human-robot interaction analysis that combines human facial sensors and cobot process data. To ensure an insightful smart cobot programming system, an RG analysis is conducted to gather expert knowledge from the expert gradings. After reviewing the RG analysis results, some meaningful construct clusters are applied to develop a smart cobot programming system. The study conducted suggests that the engagement data collected from the experiments can provide insights to translate programming behavior from facial feature recognition. The system used in the experiments benefits from having a multi-point camera. It was able to distinguish process and programming difficulties among the cobot programmers by comparing engagement data. Aligning with process data, the findings provide a potential mechanism for advancing cobot programming training systems by pointing to specific aspects where the programmers had more difficulties. In addition, beginner and expert programmers showed consistent differences based on the metrics evaluated. Programmers can identify aspects that should be improved to advance their learning and new program tasks can be suggested for improvements when benchmarking the metric behavior of expert programmers. However, while this paper reports the outcomes from an empirical study, further research is critical to obtain generalized relationships between the human operator behavior and cobot programming performance.
Future research should address the following next directions. The first direction targets the limited number of participants and tasks tested. More participants from different experience levels would likely provide additional and more distinct programmer classes, leading to a better understanding of user behavior and how to prescribe training modules to enhance cobot programming expertise. The second is undertaking a deeper evaluation of the engagement pattern in each element of a program task. A process fault can be caused by a particular command line. Identifying the human behavior close to the point of failure can lead to new insights into human-cobot behavior and cobot system design. The third is that the personal preference from RG analysis places a high level of importance on the selection of participants at the beginning of the study. Although the RG analysis shows some similarity between some constructs, specific cause and effect relationships and conflicting relationships may not be well captured. These aspects will be studied and reported as a separate article. Regardless of these limitations, this work presents promising discoveries in the human-cobot interaction field. Furthermore, manufacturing can benefit from using different work setups aimed at decreasing cost and time while achieving large-scale cobot automation systems.
Footnotes
Acknowledgments
The cobot used on this research was funded by CMI: Critical Materials Institute (FA-3.3.11), and Oak Ridge National Laboratory.
