Abstract
BACKGROUND:
Today’s work environments have high cognitive demands, and mental workload is one of the main causes of work stress, human errors, and accidents. While several mental workload studies have compared the mental workload perceived by groups of experienced participants to that perceived by novice groups, no comparisons have been made between the same individuals performing the same tasks at different times.
OBJECTIVE:
This work aims to compare NASA Task Load Index (NASA-TLX) to Workload Profile (WP) in terms of their sensitivity. The comparison considers the impact of experience and task differentiation in the same individual once a degree of experience has been developed in the execution of the same tasks. It also considers the acceptability and intrusivity of the techniques.
METHODS:
The sample consisted of 30 participants who performed four tasks in two sessions. The first session was performed when participants had no experience; the second session was performed after a time of practice. Mental workload was assessed after each session. Statistical methods were used to compare the results.
RESULTS:
The NASA-TLX proved to be more sensitive to experience, while the WP showed greater sensitivity to task differentiation. In addition, while both techniques featured a similar degree of intrusivity, the NASA-TLX received greater acceptability.
CONCLUSION:
The acceptability of WP is low due to the high complexity of its dimensions and clarifying explanations of these may be necessary to increase acceptability. Future research proposals should be expanded to consider mental workload when designing work environments in current manufacturing environments.
Introduction
The tasks involved in modern manufacturing environments have been calling for a more educated, more highly qualified workforce; that is, for workers with a more complex set of cognitive skills such as decision-making, improvisation, information management, and failure diagnosis, among others [1]. Such skills are frequently required during interaction with automated and/or complex systems; in turn, these systems demand the execution of tasks that involve an increasingly high mental workload [2, 3]. Additionally, mental workload leads to mental fatigue or the temporary deterioration of mental functioning, which results in a decrease in the ability to perform tasks [4]. Because mental workload takes a significant toll on workers, publications on this topic have increased in recent years [5]. To assess mental workload, it is necessary to consider a variety of methods and select the best for each problem and condition. The diversity of techniques and methods entails different dimensions, scales, and procedures [6]. Currently, however, the lack of a standardized method or measurement unit, makes mental workload evaluation difficult [5, 7]. Another problem to consider is the scarcity of mental workload assessment in the manufacturing industry, which offers ample opportunities for future research [6].
Among the different methods and techniques available, the NASA Task Load Index (NASA TLX) stands out as the most widely studied and used [8–10]. On the other hand, the Workload Profile (WP) can analyze attentional resources and cognitive processes during the execution of several tasks [11, 12]. Although both methods have addressed the relationship between mental workload and the role of prior experience in human performance, such studies are scarce. Examples of them are Rubio et al. [13], which compared results for the NASA TLX, the WP, and the Subjective Workload Assessment Technique (SWAT); and Bommer and Fendley [14], which compared the use of the NASA TLX and the WP, along with other techniques, to determine operators’ mental workload in the manufacturing sector. Nonetheless, the literature on the use of a combination of mental workload evaluation techniques in terms of sensitivity, acceptability, and intrusivity is scarce. Furthermore, the studies mentioned analyzed mental workload as perceived by groups of participants with different degrees of prior experience as opposed to the load perceived by novice groups lacking background [15, 16]. The novelty of this paper lies in the fact that it evaluates the same individual’s perception of his/her mental workload once a degree of experience has been gained in the execution of the same tasks. This approach has not been found in the available literature so far, and it could help understand how experience contributes to reducing an individual’s perception of mental workload.
The objective of this work is to compare two subjective mental workload evaluation techniques, the NASA TLX and the WP, using a novel individual approach. Additionally, the study aims to determine the sensitivity of these methods to the impact that recently gained experience has on mental workload perception and differentiation of tasks. Finally, it also seeks to determine the participant’s acceptance of the technique and the degree of intrusion perceived in both techniques (intrusivity).
Background
Mental workload is defined as the cognitive resources needed to perform a task. For example, according to Young and Stanton [17], mental workload is “the level of attention resources needed to meet both objective and subjective performance criteria, which may be influenced by task demand, external support, and experience. Hart and Wickens [18], on the other hand, define mental workload as the mental cost of meeting a task requirement. While efforts have been made to clarify the mental workload construct [3], such efforts have been insufficient.
According to Xie and Salvendy [19], there are two types of mental workload evaluation models: those that are empirical, which include performance-measurement techniques, both subjective and physiological; and those that are analytical, which include mathematical techniques and models as well as their simulations. The most widely used methods are those of the empirical type, among which the NASA TLX and the WP stand out [20, 21].
There is also a set of quality criteria that must be met by any mental workload assessment technique [13, 22]. Such criteria include: Sensitivity: Defined as the ability to detect changes in the levels of task difficulty. Diagnosticity: Defined as the capacity to detect changes and their causes. Selectivity/validity: Which refers to the techniques’ sensitivity only to differences in cognitive demands while overlooking changes in factors such as physical load or emotional stress. Intrusivity: Which determines the degree to which the technique interferes with the performance of the task for which mental workload is being assessed. Reliability: A criterion that calls for the results of the assessment technique to be reliable at any time. Implementation requirements: This addresses the implementation requirements of the evaluated technique. Subject acceptability: Which refers to the subjects’/participants’ perception of procedure’s validity and usefulness.
Ideally, all the above features should be present in any mental workload-assessment technique. In this case, the criteria of sensitivity, intrusivity, and acceptability were evaluated to obtain more information about the participant’s opinion regarding the instruments used.
This research compares two subjective methods which are widely accepted in literature, and which will be described in the following paragraphs.
NASA TLX
The NASA TLX is a multidimensional, subjective evaluation technique proposed by Hart and Staveland [23] and is based on the mental processing model where mental workload is a hypothetical construct referring to the toll that reaching a certain level of performance takes on the operator; the focus of the model is on the person rather than on the task [23]. Mental workload is, thus, the result of the interaction between the task requirements, the circumstances in which it is performed and the operator’s skills, behaviors, and perceptions of the task. In general, this technique is implemented immediately after the job has been performed.
The NASA TLX identifies six dimensions of mental workload: mental demand, physical demand, temporary demand, effort, performance, and level of frustration [10, 23]. Mental demand: Refers to the amount of mental exertion and perceptual activity required during task execution. Physical demand: Relates to the level of physical activity required to carry out a task. Temporary demand: Involves the perceived pressure in terms of time as a result of the speed or pace at which the task or its elements are to be executed. Effort: Refers to the level of perceived effort (both physical and mental) required to achieve an adequate level of performance. Performance: Relates to the perception of the degree of success reached in task execution concerning the goals established by the analyst (or by oneself). Frustration level: Refers to how insecure, discouraged, irritated, stressed, or upset, as opposed to confident, gratified, happy, relaxed, and comfortable, someone felt during the task.
Workload Profile (WP)
A second subjective technique is the WP, developed by Tsang and Velazquez [12], based on the Multiple Resources Model (MRM) proposed by Wickens [24], which suggests that when sensory information from the environment is received, cognitively processed, stored, and used in decision-making before a response is executed. Sensory information is collected by several types of cells which receive light, sound, taste, smell, touch, and internal sensations. Once the recipient cells have been activated, these signals are stored in a sensory record. Sensory memory stores a large volume of detailed information but only for a brief period. During the perception stage, all information is processed, and perception provides meaning to it as it compares it with relatively permanent information coming from long-term memory. This causes several stimuli to be assigned to a single perceptual category. The WP uses the following dimensions: perceptual/central processing, response selection and execution, spatial processing, verbal processing, visual processing, auditory processing, manual output, and vocal output.
Thus, the WP encompasses the same dimensions as those of the MRM [10, 12], namely the following: Processing stage: Perceptual/cognitive: These involve the attention resources needed for activities such as perceiving (detecting, recognizing, and identifying objects), remembering, problem-solving, and, decision-making. Response: This refers to the attention resources needed for response selection and execution. Code processing: Spatial: These are the attention resources needed to spatially locate objects and situations. Verbal: This includes the resources involved in the processing of verbal and linguistic material. Input: Visual: These are the resources that are used strictly to process the visual information received. Auditory: This involves the processing resources used for the collection of auditory information. Output: Manual: Some tasks require considerable attention to the production of a manual response, i.e., typing or playing the piano. Verbal: Other tasks require speech responses. For example, engaging in a conversation needs attention to the production of spoken responses.
Hierarchical Task Analysis (HTA)
Prior to the application of any mental workload assessment method, it is recommended to conduct a Hierarchical Task Analysis (HTA). The HTA describes a task in terms of the hierarchy of its operations, seeking to offer as much detail as possible. Thus, the HTA must begin with the description of the main task, followed by the rest of the operations in the sequence in which they are performed. If any of the operations entail sub-operations, those ought to be listed as well. Next, a plan must be put in place indicating the order in which both operations and sub-operations will take place [10, 32].
Materials and method
This section will first describe the characteristics of the sample. One of the objectives of this work was to determine the sensitivity of the studied techniques by analyzing the effect of experience on the mental workload caused by tasks in the manufacturing industry; therefore, it was important to show that participants had no prior experience in such tasks. In this case, all participants affirmed having had no experience in manufacturing tasks. This section will also describe the stages in this study, including the task mental workload assessment, and the data transformation and analysis.
Participants
Participants were selected through convenience sampling among college-level students and volunteers from the private sector in Ciudad Juarez, Chihuahua, Mexico, who declared having no experience at all in the task chosen for this study. All participants were informed of the study’s objectives and methods and were asked for their written consent to the use and publication of the collected data. The ethics committee of the university had previously approved such consent. The thirty participants were equally divided into two groups: one to be analyzed using the NASA TLX and the other one, using the WP evaluations. The participants’ descriptions are shown in Table 1.
Description of participants
Description of participants
This work consisted of a one-session, quantitative study with a cross-sectional design. The methodology was divided into two stages. Stage one focused on the task description, its execution, and the corresponding assessment of the mental workload. Then during stage two, data transformation and analyses were conducted to compare the techniques’ sensitivity, acceptability, and intrusivity. These stages are described below.
Stage 1. Task description and execution for mental workload assessment
Four tasks were chosen for this stage, two with similar characteristics to those performed in the manufacturing industry and two used in academic tasks. All four tasks were similar in that they required the use of cognitive resources, physical skills, and spatial responses. Task selection was guided by variables measured by the NASA TLX and the WP. The HTA was performed to describe all the tasks in consistency with any cognitive analysis performed [10, 32]. The tasks chosen were: Mechanical assembly (MA): It consisted of assembling a computer’s cabinet or Central Processing Unit (CPU). This involved assembling the motherboard, memory modules, fan, Digital Versatile Disk (DVD) drive, and power supply, as well as their respective harnesses. Similar assembly tasks have been used before in the evaluation of mental workload [26]. Table 2 shows the HTA for the Mechanical assembly task. Visual inspection (VI): During this task, participants were asked to identify defects or discrepancies in the elements that make up a computer motherboard; namely, the memory modules, the connectors, and the heatsink. This task was chosen because it had been used in Ntuen’s [27] work and because it is a common activity in industry. The task involved the inspection of 30 items in 5 minutes. The HTA for this task is shown in Table 3. Puzzle (PZ): The task consisted of putting together a 30-piece puzzle. An example of the use of puzzles in mental workload evaluation can be seen in the work of Miyake [28], where wooden puzzles were also used. The corresponding HTA for this task is shown in Table 4. Memory test (MT): Developed by Robert Sternberg, this test was designed to evaluate how individuals store and retrieve random information from their short-term memory [29]. Tasks that engage the working memory have been widely used in various mental workload studies [30, 31]. This task consisted of showing a certain number of characters on a screen, all of which were deleted after 3 seconds. Next, a character was shown, and the participant was to answer if such a character had been featured among the previously shown characters; then a new series of characters were displayed. Table 5 shows the corresponding HTA for this memory test.
HTA for the MA task
HTA for the MA task
HTA for the VI task
HTA for the PZ task
HTA for the MT task
Because the participants had no prior experience with the type of tasks to be performed, and to prevent them from becoming familiar with the tasks by observing other participants’ execution, only one participant was scheduled in the laboratory per day. The program for each session involved 5 steps:
Step 1: The participant was informed of the purpose of the study and was required to fill out their respective consent form.
Step 2: The order of execution of the four tasks was established randomly and explained to the participant.
Step 3: The participant performed all four tasks in sequence. After completion, the participant was required to complete the first mental workload assessment, using either the NASA TLX or the WP.
Step 4: To develop a certain degree of experience in the execution of the task, the participant was required to repeat it several times over the course of one hour, a minimum of two repetitions were required. Once repetitions were performed, a second mental workload assessment was conducted.
Step 5: After the second mental workload evaluation, the participant was asked to fill out a survey to evaluate the acceptability and intrusivity of the techniques used, as perceived by them.
Mental workload assessment with NASA TLX
As was previously explained, six mental workload dimensions were identified in the NASA TLX, and the technique was applied in two steps: weighing and evaluation.
Step 1: The weighting stage took place before task execution and consisted of making 15 binary comparisons of the 6 NASA TLX dimensions and choosing, from each pair, the one that the subject perceived as the largest source of workload.
For example, participants were required to select between physical demand and temporary demand, then they were asked to select between physical demand and frustration, and so on. The number of occurrences was counted to obtain each dimension’s weight, which could be from 0 to 5.
Step 2: Evaluation: Immediately after performing the task, the participant was required to estimate the task’s mental workload, using a scale of 0 to 100. The same was done for each of the 6 dimensions.
After collecting each participant’s data, the global mental load index for the task was calculated by applying Equation 1.
Mental workload assessment with WP
According to Stanton [10] and Tsang and Velazquez [12], before applying the WP technique, participants had to be instructed in the MRM principles and dimensions. Once these principles were understood, participants were required to evaluate the proportion of attention resources used after task execution.
In this evaluation, the participant rated the level of resources used in task execution using a 0 to 10 scale, where 0 meant that the task did not require any resources, while 10 meant that the task required all available resources. The evaluations of the individual dimensions for each task were added later to obtain a general classification of the mental workload; this was done using Equation 2:
Ct represents the subtask mental load, and
di indicates the assessment that each participant assigned to the mental load dimensions in the evaluation format, while
Data analysis consisted of transforming the results obtained by the NASA TLX and the WP. Then a normality test was needed to evaluate the sensitivity of each technique in terms of the effect that the ongoing task repetition experience had on individuals and their sensitivity to task differentiation.
Because each technique analyzed in this research used different scales and dimensions, data transformation was done to obtain an adequate statistical comparison of the results. Thus, the results obtained were transformed using min-max normalization, as can be seen in Equation 3.
Once the data were transformed, their normal distribution was verified using the Shapiro-Wilk test as it is the one required for cases when the number of samples is lower than 40.
On the other hand, the sensitivity analysis was divided into two parts: sensitivity to experience and sensitivity to task differentiation. The sensitivity to experience analysis looked at the impact of experience on the participants’ perceived mental workload. This analysis statistically compared the total scores for the mental workload perceived using the NASA TLX and the WP techniques during two separate evaluations: one when participants had no prior experience performing the tasks, and another one after participants had developed a degree of experience in the task. Sensitivity to task differentiation was found by determining significant differences among the total scores for the mental workload on the same techniques in the four studied tasks. In both cases, the comparisons were done using parametric or non-parametric statistical techniques according to the normality test results of the data and, their corresponding purpose. Table 1 shows each appropriate technique.
As for the evaluation of the acceptability and intrusivity of the techniques, participants determined their perception through two Likert scale questions with values ranging from 1 to 5, where the higher the value, the greater the degree of acceptability and intrusivity. The data obtained were analyzed using the Kendall correlation coefficient to determine the level of agreement among participants. A summary of the statistical techniques employed is shown in Table 6. All statistical analyses were performed using the SPSS© v24 statistical package, with an α = 0.05.
Statistical technique according to the normality results
Stage 1. Task execution, and mental workload evaluation
Once the tasks were described in detail, using the HTA, and executed by participants, mental workload evaluation results for the NASA TLX and the WP were obtained. Table 7 shows the mean scores for both techniques. The results show that the total mental workload perception decreases from the first evaluation to the second one, a change that is noticed in both methods. Additionally, the first evaluation shows the MA task as featuring the highest mental workload in both methods. However, once the second evaluation with the NASA TLX is done, it is the MT task that shows the highest score. On the other hand, when using WP, the ranking of tasks by their mental workload scores reveals no differences from the first to the second evaluations. It is important to mention that in this part of the study, the performance measurements, such as number of correct responses, number of errors, or duration of the tasks, were not able to be properly registered to be compared.
Original NASA TLX and WP scores
Original NASA TLX and WP scores
Once the NASA TLX and the WP scores were transformed, separate normality tests were applied for the NASA TLX and the WP. The proper statistical analysis was used for data comparison.
NASA TLX data analysis and sensitivity results
Figure 1 shows the mental workload mean value obtained during the evaluation carried out using NASA TLX after data transformation. As can be seen, during these tasks, a decrease in the mental workload can be observed, which can be attributed to the learning curve.

Average results for the NASA TLX evaluation.
Regarding the normality test, Table 8 shows the results obtained by the NASA TLX in the first and second evaluations after using Shapiro-Wilk. For the first evaluation, the data from the four tasks appear statistically normal. On the other hand, the data from the second evaluation results were not distributed normally, which made the use of non-parametric statistics necessary. The Wilcoxon Test analyzed whether there was a significant difference between the first and the second evaluation results during the four tasks. These results are shown in the last column in Table 8, where the NASA TLX proved to be sensitive enough to detect the impact of experience on the participants’ mental workload.
Normality and differentiation statistical tests for the NASA TLX
The task sensitivity perceived by the users at the mental workload level was analyzed by the Friedman Test. No significant difference was observed in the evaluations (p-value=0.128 for the first evaluation, and p-value=0.758 for the second one); therefore, in this case, the NASA TLX lacked the necessary sensitivity to distinguish one task from another.
Figure 2 shows the transformed results from the evaluation conducted through the mental workload. A decrease in the mental workload during task execution is observed between the two evaluations.

Average results for the WP evaluation.
As for the data normality results, shown in Table 9, some of the data distributions using mental workload results were not normal. Therefore, the Wilcoxon test was used to make the comparison. The WP results after the first and second evaluations showed no differences except for those in the highest mental workload task, mechanical assembly, which featured a p-value lower than α = 0.05. Thus, in this study, WP did not show enough sensitivity to detect the variation in mental workload as a result of experience.
Normality and differentiation statistical tests for WP
On the other hand, the sensitivity of the WP tasks was evaluated through the Friedman Test because this was a four-sample case study. A significant difference was observed in both evaluations (p-value=0.029 in the first one, and p-value=0.000 in the second one), which means that the WP did show enough sensitivity to distinguish among tasks.
To compare the acceptability and intrusivity of the two mental workload evaluation techniques, three aspects were considered: the agreement level among participants, the data normality, and the statistical comparisons. First, the level of agreement among participants was determined by using Kendall’s coefficient of concordance. It analyzed the results of a survey that included two questions assessing both characteristics through a five-point Likert scale. In both techniques, the participants showed a high level of agreement, with a value of 1 for the NASA TLX and 0.867 for the WP. This revealed that participants used a significantly similar criterion at the time of the evaluation.
Secondly, the analysis of the data from the acceptability and intrusivity normality test was done using the Shapiro-Wilk statistical test, which showed a significant level of 0.05. The results are shown in Table 10, where it can be seen that the data are not normal since they show significance values (p-value) below 0.05.
Acceptability and intrusivity normality test
Acceptability and intrusivity normality test
Because of the lack of normality in the data, and to compare the acceptability of both techniques, this study used the Mann-Whitney U test, which showed a p-value of 0.004, revealing a significant difference between the two samples. In addition, from the average ranges in this test and the sum of the ranges in both techniques, shown in Table 11, it can be concluded that the NASA TLX features higher acceptability.
Mann-Whitney U acceptability test for NASA TLX and WP
The same analysis was performed to compare the degree of intrusivity perceived by participants. When performing the Mann-Whitney U test, a p-value of 0.202 was found, indicating that there is no significant difference between the two samples; therefore, the intrusivity perceived by participants is the same for both techniques. The individual answers of the participants regarding acceptability and intrusivity are shown in Table 12.
Individual answer for acceptability and intrusivity
Currently, due to technology’s exponential growth and the increasing cognitive demands in the mass manufacturing context, the study of mental workload has become crucial [3]. Thus, knowledge of the characteristics and advantages that the various mental workload evaluation techniques offer has become of the essence. However, in very few studies are these techniques compared in both, their results and their quality criteria [13, 14]. Furthermore, among the studies available, application in manufacturing environments is quite limited. In addition, their approaches analyze mental workload as perceived by groups of participants with higher degrees of experience as opposed to novice groups [15, 16]. The results of this research offer new perspectives as two widely accepted subjective mental load assessment techniques evaluated mental workload perception on the same individuals once they had gained a certain degree of experience in the execution of similar manufacturing tasks. Therefore, a discussion of these aspects can be offered.
The WP technique showed greater sensitivity to task differentiation. This outcome is consistent with the results obtained by Rubio et al. [13]. Regarding the acceptability of the NASA TLX and the WP techniques, the participants expressed difficulty understanding the WP dimensions; this disadvantage was also reported by Rubio et al. [13]. In this particular study, quantifying and assigning the scores for the WP techniques’ dimensions required further explanation as well as examples from practical situations. On the other hand, the NASA TLX’s stage of weighting by paired comparisons was the most complicated process for participants, who found it challenging and time-consuming, in consistency with the reports by Stanton et al. [10]. However, a new version of NASA TLX, called RAW TLX, has proposed the removal of the weighting paired comparisons and has found acceptable results. Thus, the use of this version has been displacing the original one in recent studies [33, 34].
The intrusivity in both techniques turned out to be low, which confirms what was reported by the literature [10, 13]. Because those techniques are applied after the participant has finished executing the tasks, their level of interference with the task is low. However, Stanton et al. [10] have reported that throughout the various subjective techniques, when the evaluation is carried out at the end, participants often forget some important aspects present during the evaluation of the mental load.
During any mental workload analysis, the recommendation is to use a variety of methods and techniques to obtain better results and to make better decisions regarding the design of the cognitive work, the tasks, the workstation, and the interaction with products or artifacts so that both the worker and the companies can be benefited [10]. The way each of these techniques is used will depend on the objectives and the scope that each researcher or analyst is pursuing. For example, if there is interest in distinguishing the load perception between tasks or task elements, a greater sensitivity to the differentiation between them along with low sensitivity to the workers’ experience will be required; therefore, the WP might be a more suitable technique to use despite its lower acceptability on the part of the participants.
On the other hand, if the objective is to identify task mastery progress, evaluate the modification of a work method in cognitive tasks, or assess the success of a training program, a technique with greater sensitivity to the effect of experience on mental workload will be needed. Thus, the NASA TLX, which features these characteristics and has greater acceptability among participants, could be used.
Conclusions
As seen in the results of this study, the evaluated techniques show certain advantages and disadvantages over each other; thus, none of them can be considered better than the other in terms of the characteristics included in the quality criterion. According to this study, the NASA TLX technique proved to be more sensitive to participants’ experience, whereas the WP technique proved to be better for task differentiation.
Additionally, it should be taken into consideration that the participants were taken from an academic environment; thus, their level of education is higher than the one traditionally found in employees from the manufacturing industry. Therefore, the fact that the participants expressed having difficulty understanding the WP dimensions implies that should the technique be applied to workers from manufacturing environments, it might seem even more difficult. Consequently, there is an opportunity to increase the WP’s acceptability and understanding by incorporating examples related to the tasks to be evaluated, when explaining their dimensions.
Finally, further research proposals should expand to consider mental load as well as the effects of experience on mental load assessment when designing work and when measuring mental workload more effectively and objectively in manufacturing environments. Considering mental workload will allow companies and employers to have more adequate time measurements, better standard production times, more effective cognitive task training programs, cognitive tasks that are more adequate to human capacities, and more adequate boundaries to enjoy the benefits of correct and more complete work measurement.
Conflict of interest
The authors declare that they have no conflict of interest.
