Abstract
BACKGROUND:
Unmanned Aerial Vehicles (UAVs) have created safety problems for the publics. Assessments of the mental workload for UAV operations are essential to realize the causes of UAV accidents.
OBJECTIVE:
To test the following hypotheses: i. mission difficulty in UAV operation affects both subjective and objective measures of mental workload; ii. mission difficulty affects number of failures in UAV operation.
METHODS:
Fourteen male adults participated as UAV operators after attending a UAV training course. They performed four flight missions of different difficulty levels. During their flight missions, their heart rate and inter-beat interval (IBI) were collected. Upon completing each flight mission, the participants gave subjective ratings of mental workload using three commonly adopted assessment tools. The time of flight and number of failures in flight operations were also recorded.
RESULTS:
The results showed that mission difficulty affected the scores of all three assessment tools significantly. Mission difficulty also affected number of failures and IBI significantly. The scores of the three assessment tools were highly correlated (ρ= 0.7 to 0.83, p < 0.001) with one another. The results of the three subjective ratings were also consistent with that of the IBI data.
CONCLUSIONS:
High mental workload in UAV operation could lead to poor flight performance.
Introduction
Unmanned aerial vehicle (UAV) has becoming popular both in commercial and personal use in recent years. With the onboard camera and high mobility in an open space, UAVs have been applied in various fields including aerial photography and filming [1], cargo delivery [2], ground target researching [3], ground structure inspections [4–6], environment and wildlife monitoring [7], transportation engineering [8], and so on. The rapid growth of UAV usage has brought benefits for both business and individuals. It also results in safety concerns of the public due to crashes of the UAV on either people or property on the ground [9, 10].
Many UAV crashes have been reported [11, 12]. However, a systematic research on the causes of those accidents has not been reported. This is probably because the operator of the UAV was normally far away from the crash site. It was, then, difficult to find the operator and to investigate the causes of the crash. There could be many reasons of UAV crash. Human error is apparently one of them [13]. Human error in operating a UAV may occur due to overload of human information processing system and many other human-machine interface issues [14, 15]. Typical errors in UAV operations include not initiating the appropriate maneuvers, failure to note the auditory and visual alarms, failure to maintain proper situation awareness, and so on [5]. These failures may be attributed to overloading of the human information processing system. Assessing the mental workload of UAV operators is, therefore, a very important step in reducing human error in UAV operation.
Mental workload has been an important issue in studying the behaviors of human pilots [16–20], drivers [21], control room operators [22–24], maritime operators [25, 26], janitors [27], and surgeons [28]. It is associated with mental resource of humans and task demand. It is widely accepted that mental resources are required to perform mental task. A fundamental concept of mental resource is that when task demand exceeds mental resource available, performance will break down [29]. Measurement of mental workload is very significant in studying the performance of operators and thus providing information in improving human-system designs and operator training.
Mental workload may be measured using either subjective or objective measures [30]. Subjective measures are widely adopted because of their validity, sensitivity, and especially ease of use. The NASA Task Load Index (TLX) [30, 31] has been one of the most commonly used rating scales. In addition, the Subjective Workload Assessment Technique (SWAT) [32, 33], Cooper- Harper Scale (CH) [20], Workload Profile [34], and Multiple Resource Questionnaire [35] have also been used. Objective measures provide valid and reliable assessment results on mental workload and are mainly achieved via physiological measures. Physiological measures involve measurements of activity changes in the cardiac, brain, eyes, and metabolic systems [36–38]. They may be quantified via measures in electrocardiogram (ECG), eye movement, electroencephalogram (EEG), respiration, electromyogram (EMG) [37], and skin response [39]. The heart rate variability (HRV) has been shown to be one of the reliable physiological parameters to measure mental workload [20, 37]. HRV provides information of average heart rate (HR) about the feedback between the cardiovascular systems and central nerve system structures [37, 38]. The inter-beat interval (IBI) of HR has been one of the mostly commonly used parameters to represent HRV [30]. It is the time period between two consecutive heart beats at any instance in time. A decrease in IBI generally indicates an increase in mental workload.
Mission difficulty has been reported to affect mental workload. Orlandi and Brooks [26] had their participants performed berthing operations of different difficulty levels in a ship simulator. Their levels of mission difficulty were determined based on the size of the swing basin, environmental condition, vessel characteristics, tug usage, interaction with traffic, and communication. Their results showed that mission difficulty significantly (p = 0.001) affect the HR, overall TLX score, and the Likert scale of mental workload assessment.
Mental workload assessments have been performed for human pilots in different flight scenarios [16–20]. Operations of a UAV are quite different from those of a manned aircraft. The workload of UAV operators should apparently be different from that of human pilots. Even though there were some studies on UAV simulators [40, 41], mental workload assessments have not been applied in real UAV flying missions. Mental workload measurements in real UAV operations, under different mission difficulty conditions, have not been reported. It was hypothesized that mission difficulty in UAV operation affects both subjective and objective measures of operator’s mental workload. In addition, mission difficulty was also hypothesized to affect the number of failures in UAV operation. It was also believed that mental workload is linearly correlated the number of failures in UAV flight operations. In other words, high mental workload is associated with poor UAV flights in terms of operational failure. This study was performed to test these hypotheses.
Materials and methods
An experiment was performed on the campus of a university.
Human participants
Fourteen healthy male adults were recruited. Their age, stature, and body mass were 23.4 (±0.7) yrs, 174.2 (±3.9) cm, and 68.4 (±9.2) kg, respectively. All of them have normal (naked or corrected) visual and hearing functions and have no prior experience of drone operation. All the participants read and signed an informed consent before joining the experiment. This study was approved by a local ethic committee (CMTU-SM-2019001).
Unmanned aerial vehicle
A Mavic Air drone (DJI®, Shenzhen, China) was adopted (see Fig. 1). This quadcopter has a weight of 430 g and a height of 6.4 cm. It has a diagonal size (protect frame included) 38.7 cm. A HUAWEI® V10 smartphone was adopted as the monitor of the remote controller to show the flight information. The flight control software was the DJI® GO app [42]. The participants operated the UAV primarily by maneuvering the two joysticks on the remote controller. The location of the UAV during the flight is marked as a triangle on an e-map on the monitor. In addition to this information, the primary flight information includes horizontal distance to the take-off spot, altitude, remaining power (%), and intensities of remote and satellite signals.

Mavic Air drone.
The Mavic Air can be operated using the novice (N), normal (P), or sport (S) mode. When using the N mode, the GPS positioning system is on and both the horizontal distance and altitude of the vehicle cannot exceed 30 m. For the P and S modes, the GPS system is on and off, respectively, and the maximum horizontal distance and altitude of the vehicle may be designated by the operator. The UAV may be hovering at a specific location without joystick control when the P mode is activated. In the S mode, the vehicle cannot hover at a specific location without a joystick control. Under such circumstance, the vehicle will be moving because of the wind. All the novices were requested to practice using the N mode in the beginning of the practice. They could fly using the P and then S mode when they were familiar with the flight controls.
Take-off and return and landing of this UAV may be operated using either automatic or manual modes. The manual mode requires the operator to maneuver the joysticks to activate the commands. The automatic mode, on the other hand, may be accomplished by clicking on the monitor once and no further joystick control is required. In the flight control app, the operator may designate the remaining power (%) in the power alarming setup. When the remaining power reaches the level designated, a power alarm notice will appear on the monitor. When this happens, the operator should fly carefully and needs to have the vehicle return and land before the power runs out.
HR data were measured using a Polar® V800 heart rate tester (Kempele, Finland). This tester includes a chest strap and a watch. The sensor on the chest strap monitors the heartbeat data and then transmits the data to the watch. The watch stores the data which may be uploaded into a computer via the FlowSync software (Polar, Finland) for further processing. Both HR and IBI during the flight were recoded.
Subjective rating
Subjective ratings of mental workload were accomplished using the TLX, CH, and the SWAT. The TLX has six dimensions: mental demand, physiological demand, temporal demand, overall performance, effort, and frustration level. In this study, the authors adopted the unweighted average score of these dimensions to calculate the overall score [20, 30]. Both the raw score for each dimension and the overall score were ranged from 0 to 10. The CH is a decision tree. The participant answered questions concerning the controllability of the UAV, performance attainability, and satisfactory of the operator during the flight. A score between 1 (excellent) to 10 (major deficiencies) was recorded [30]. When using the SWAT, the participant rated his experiences of flight on three dimensions (time load, mental effort, and psychological stress) using a 3-point scale. The scales of these three dimensions were ranked into a final score from 1 to 27 [32, 33]. For all the three subjective scales, high score indicates high mental workload.
Test site and training
Each participant attended a three-hour flight training. The training included introduction of the hardware and software components of the Mavic Air, safety precautions of UAV operation, flight operations, and filming and video-taking. The training was provided at the test site where the participant could practice immediately after the lecture. The practice and the actual experiment were conducted at a university campus (see Fig. 2).

Test site and flight space: H and the triangle are the taking-off spot and revolving target, respectively; the diamond and the arc are the starting point and path of target revolving, respectively; the radius of the arc is 160 m.
The participants practiced flight repeatedly until they felt they were familiar with the flight operations and were confident to operate the UAV without the guidance of others. Specifically, the participants needed to be familiar with operations in target revolving (see Fig. 2), climbing/descending (see Fig. 3), yawing left/right (see Fig. 4), S flight (see Fig. 5), Z flight (see Fig. 6), and rolling left/right (see Fig. 7). The flight path for each trial may be traced in the flight records of the DJI® GO app. For a climb or descend, a successful operation was recorded if the flight path followed a line of oblique. For yaw, Z, and S flights, a successful flight was marked if the flight path followed approximately the designated path. To perform target revolving operation, the vehicle needed to move to the starting point, the target revolving option in the DJI® GO app must be activated, and the target of interest needed to be specified. The radius of the revolving was then maintained by the app. The operator needed only to manoeuvre the joysticks to specify the clockwise and then counterclockwise vehicle movement after the vehicle has reached the farmost spot of the testing area. For all the operations except target revolving, the vehicle, after taking off and before landing, needed to be operated within the designated area (60 m×120 m) in Fig. 2.

Climbing and descending.

Yawing left and right.

S flight.

Z flight.

Rolling left and right.
The participants were also instructed the meaning and usage of the TLX, CH, and SWAT for subjective rating measurements.
Flight missions were planned to cover four levels of difficulty in flight control (see Table 1). These four levels (1 to 4) correspond to easy, moderate, difficult, and very difficult.
Levels of difficulty and flight operations
Levels of difficulty and flight operations
DL: Difficult level; §Power alarm appeared after the remaining power has reached this level; A: automatic; M: manual; aaltitude (m); bhorizontal distance (m); *left and right each repeated 3 times; †Target revolving with a radius of 160 m. v: performed; -not performed.
Before the first trial in each day, the weather condition was checked using the UAV Forecast app [43] to guarantee safety of the flight. The weather conditions for a “good to fly” included a precipitation probability of 0%, a wind speed of 32 km/hr or less, and a visibility of 5 km.
In addition to the training, each participant attended four flight missions. Flight mission of one of the four levels in Table 1 was assigned. The order of the four flights was randomly determined. In each trial, an experimenter set up the UAV and then briefed the mission request verbally. The participant put on the chest strap and watch of the HR monitor. He then followed the instruction of the experimenter and performed the flight mission. The experimenter monitored the operation of the participant and gave a hint on the next operation. After landing, the participant completed the three rating scales of mental workload. The time of the flight was recorded in the DJI® GO app. The participant took a rest for ten minutes and waiting for the next flight. Each participant completed all the four flights in the same day.
Data analysis
The dependent variables include the flight time, number of failures, HR, IBI, and the scores of the TLX, CH, and SWAT. A mission failure was recorded if the flight path did not match the flight route assigned in each of the flight operation in Figs. 2 7. The numbers of failures were recorded for each operation in Table 1 and for each mission. The percentage of failure was calculated by dividing the number of failures by the total number of failures of all the participants. Descriptive statistics and Kruskal-Wallis tests were performed on the dependent variables. Spearman’s correlation coefficients (ρ) were calculated to quantify the correlations between the dependent variables. Regression analyses were performed to establish predictive equations of number of failure and flight time on the mental workload measures. A significance level of α= 0.05 was adopted.
Results
Subjective ratings
Table 2 shows the means and standard deviations of the scores of the three mental workload rating scales.
Mean and standard deviation of the subjective ratings of mental workload
Mean and standard deviation of the subjective ratings of mental workload
Kruskal-Wallis tests were performed to test the significance of mission difficulty on the subjective ratings of mental workload. The results indicated that mission difficulty was significant on scores of the TLX, (χ2 = 26.21, p < 0.001), CH (χ2 = 32.60, p < 0.001), and SWAT (χ2 = 32.51, p < 0.001). Pair-wise comparison results between the levels of difficulty are shown in Table 3.
Pair-wise comparisons of mission difficulty results
*: p < 0.05, **: p < 0.001, –: p > 0.05.
There are six dimensions in the TLX. Table 4 shows the means and standard deviations of the score of the six dimensions. Pair-wise comparison results of the score of each dimension between levels of difficult are shown in Table 5.
Means and standard deviation of the dimensions of TLX
Pair-wise comparisons of the dimensions of the TLX
*: p < 0.05, **: p < 0.001, –: p > 0.05.
There are three dimensions in the SWAT. The means and standard deviations of the scores of each dimension are shown in Table 6. Table 7 shows the results of the pair-wise comparison of these scores between combinations of difficult levels.
Means and standard deviations of score of the dimensions in SWAT
Pair-wise comparisons of the dimensions of the SWAT
*: p < 0.05, **: p < 0.001, –: p > 0.05.
The HR of the participants during the experiment was 89.4 (±13.4) bpm. It was not affected significantly by mission difficulty. The mean and standard deviation of the IBI for difficuly levels 1 to 4 were 689.5 (±30.6) ms, 687.1 (±32.8) ms, 686.6 (±32.2) ms, and 674.5 (±36.0) ms, respectively. The Kruskal-Wallis test results indicate that the difficulty level affected the IBI significantly (χ2 = 20.15, p < 0.01). Table 8 shows the results of pair-wise comparison of the IBI on mission difficulty.
Pair-wise comparison of the IBI between mission difficulties
Pair-wise comparison of the IBI between mission difficulties
*: p < 0.05, **: p < 0.001, –: p > 0.0.5.
Table 9 shows the means and standard deviations of flight time and number of failure. The percentages of failure for easy, moderate, difficult, and very difficult missions were 8.4%, 19.6%, 28%, and 43.9%, respectively. Kruskal-Wallis test results show that the effects of mission difficulty were statistically significant on the percentage of failure (χ2 = 27.17, p < 0.001). Pair-wise comparison results between level of mission difficulty indicated that percentages of failures of both very difficult (p < 0.001) and difficult (p < 0.05) missions were significantly higher than that of easy mission. The percentage of failure of very difficult mission was also significantly higher than that of moderate mission (p < 0.05). The percentages of failure in climbing/descend, S flight, yaw, Z flight, target revolving, roll, and return and landing were 51.4%, 19.6%, 17.8%, 3.7%, 3.7%, 2.8%, and 0.9%, respectively. There was no failure in taking-off.
Mean and standard deviation of number of failure and flight time
Mean and standard deviation of number of failure and flight time
The Spearman’s correlation coefficients between the dependent variables are shown in Table 10. Regression analysis results indicate that all the three subjective ratings were significant (p < 0.001) on both the number of failure and flight time. In addition, the IBI was significant (p < 0.001) on the flight time. Table 11 shows the results of regression analyses of the number of failure and flight time over the mental workload measures.
Spearman’s correlation coefficients
Spearman’s correlation coefficients
*p < 0.01; **p < 0.001.
Regression analysis results
Note: all the regression coefficients were significant at p < 0.001.
The TLX, CH, and SWAT have all been applied in measuring the mental workload in aviation operations. The mission difficulty was statistically significant on all the three subjective rating scales. Increasing mission difficulty could therefore lead to elevation of mental workload. It should be noted that all the three subjective scales were used after the missions were completed. The scores of these scales could not be used to quantify the mental workload for each individual operation in Table 1. They represented the overall mental workload in each mission.
Both the HR and IBI were analyzed. Even though HR has also been adopted in dozens of studies [37], variations of HR among different mission difficulties were small. This implies HR was not sensitive enough to be used as an index in measuring the mental workload in our UAV operations. The IBI has been one of the most commonly used physiological indexes to differentiate mental workload [37, 38]. The results indicate that IBI was sensitive enough to differentiate the levels of the mental workload in our flight missions.
The percentages of failure increased almost linearly with the level of mission difficulty. This was consistent with our anticipation. Most (51.4%) of the operation failure occurred in climbing and descending. Climbing required manoeuvring the two joysticks forward simultaneously with different increments so as to meet the distance and elevation requirements of the vehicle. Typical failure occurred when the flight path was comprised of polygonal lines. Such a flight path indicated poor coordination of the two hands in manoeuvring the movement of the vehicle in elevating and flying forward simultaneously. Such a failure also occurred in descending. The percentages of failure in climbing and descending were 29.9% and 21.5%, respectively. Typical failure flight paths of the S flight and yaw were also polygonal lines (or nearly a straight line for some of the S flights) instead of a curved flight. In some of the yaw operations, the participants mistakenly input the roll control and made corrections afterward. Z flight was relatively easy as compared to the S flight probably because both the eye tracking and joystick control of the former was easier than the latter. There were only 4 (3.7%) failures in Z flight. Target revolving was successful if the flight path formed a curve with respect to the revolving target. With a percentage of failure of only 3.7%, target revolving was easy after this function has been activated in the DJI® GO app. The participants controlled the moving direction by manoeuvring the joysticks and the curve path was maintained automatically via the system of the navigation app.
The literature has shown that high mental workloads are required in taking off and landing in simulated flights both with [41, 44] and without [45, 46] human pilots. This contradicts our results. Taking off the Mavic Air may be completed by tapping a button on the screen or pulling the right stick backward when using the automatic or manual modes, respectively. This involved hardly any mental workload and was very easy even for novices. There was no failure in the taking off operations. The return and landing of the Mavic Air using the automatic mode may also be completed by tapping one button and was also very easy. Return and landing using the manual mode, on the other hand, required manoeuvring of the joysticks. When the S mode was activated, the GPS did not function. Depending on the gust level, joystick maneuvering to counterbalance the wind might be required in navigating the vehicle. This made flight operation under S mode be somewhat difficult than when using P mode. Even though, there was only one failure in return and landing in the very difficult mission.
Mission 1 was performed using automatic mode in both taking-off and return and landing. In addition, this mission was done without operations of yawing, S flight, and target revolving. The IBI and scores of all the three subjective ratings of this mission were significantly different from the other missions, indicating low mental workload of this one as compared with the others. Activating manual mode both in taking-off and return and landing together with performing the yawing, S flight, and target revolving seemed to have increased mental workload of UAV operation significantly.
There were three differences between mission 2 (moderate) and 3 (difficult). The first one was that the power alarm of the former was at 30% while the latter was at 50%. The second was that target revolving was performed in mission 3 but not in mission 2. The third one was that automatic return and landing was adopted in mission 2 but manual mode was adopted in mission 3. As mentioned previously, target revolving was relatively easy and involved in relatively low number of failure. The contribution of target revolving in mental workload was probably negligible. When the power alarm appeared, the participant needed to judge whether the remaining power was enough to complete the mission. The participant needed to have a safe return immediately if he thought the power was not enough to complete the mission. Otherwise, a crash could occur. Power alarmed earlier in mission 3 (50% left) than in mission 2 (30% left). It was, therefore, suspected that power alarm increases the temporal demand (TLX) and time load (SWAT) and could therefore increase the mental workload in mission 3 than in mission 2. However, all the dimensions in both TLX and SWAT were not significantly different between these two missions. This implies that the 20% difference in the power alarm setting and manual return and landing did not increase the mental workload of the participants.
The operations of missions 3 (difficult) and 4 (very difficult) were mostly the same except that P mode was activated in the former and S mode was used in the latter. The difference of all the three subjective rating scores between the difficult and very difficult missions were not statistically significant. This was, however, inconsistent with the results of IBI (see Table 8) where the difference between the difficult and very difficult missions was significantly different. This implies that some of the dimensions in both the TLX and SWAT might be significant while others might not. The mental demand in the TLX and the mental effort in the SWAT were both significant (see Tables 5 7) while other dimensions (except the overall performance in the TLX) were not significant. The implication was that using the S mode increased the mental demand in the TLX and mental effort in the SWAT but did not increase other dimensions in both the TLX and SWAT.
The correlation coefficients between the scores of any two of the three subjective rating scales were positive and high (ρ= 0.7 to 0.83, p < 0.001), indicating the consistency of the TLX, CH, and SWAT in measuring the mental workload of operators of our UAV operations. The correlations between the IBI and the scores of the CH (ρ=–0.3, p < 0.01) and SWAT (ρ= –0.37, p < 0.01) were, however, moderate and was insignificant with the score of TLX. This was partially consistent with the findings of Mansikka et al. [20] where they found insignificant correlation between the IBI and both of the TLX and CH in a simulated piloting study. The correlation coefficient between flight time and number of failures was high (ρ= 0.66, p < 0.01) indicating more failures was associated with longer flight time. The correlation coefficients between number of failure and the scores of the three subjective scales (ρ= 0.65 to 0.7, p < 0.001) were strong, indicating high mental workload in our UAV flight missions could lead to poor flight performance in terms of operational failure. The implications of the results of regression analyses were that the TLX, CH, and SWAT all could be adopted in predicting both the number of failure and the flight time. The IBI, however, may only be used to predict the flight time, not the number of failure.
There are limitations of this study. The first one is that the sample size of this study was relatively small. This was due to the constraints of our budget and availability of the test site. The second one is that the Mavic Air could fly only approximately 21 minutes when the battery was fully charged. Such a short time period might not be enough to impose significant mental workload as those experienced by human pilots. The mental workloads in our experiment were relative and may not be comparable to those of human pilots in performing real flying missions. Thirdly, performing a real flight experiment is difficult. Even though the experimenter had checked the weather conditions immediately before the trials, the atmospheric and weather conditions, especially the gust speed, in one trial could be somewhat different from the other one. It is not clear how this would affect the operations of the participants and thus on the mental workload and flight performance data. Future research may be performed to determine the effects of atmospheric conditions on the mental workload of small drone operation. Finally, mission difficulty was composed of operations involving joystick manoeuvring and eye tracking of flight information. The effects of joystick manoeuvring and eye tracking of flight information on those operations could not be split because they were confounded in the missions. The contribution of each of these effects on the mental workload of UAV operation was therefore unknown. This also provides interesting research topic in the future.
Conclusions
Mission difficulty significantly affected all the subjective measures of mental workload and IBI in performing the UAV flight missions. The effects of the three subjective ratings on the mental workload were consistent with that of the IBI data. Subjective scores of mental workload was linearly correlated with the number of failures in UAV operations. High mental workload was associated with long flight time and more operation failures in UAV operations. Findings of this study provide insights into valid measures of mental workload in UAV operations, and identifying critical flight operations which may give rise to elevation of mental workload and probability of operation failure.
Footnotes
Acknowledgments
The authors are grateful for the support from all participants.
Conflict of interest
The authors declare no conflict of interest.
