Abstract
Introduction
Interactive voice response (IVR) technology includes automated telephone answering systems, voice response units, and automated attendants. IVR systems are currently used to provide several health-related programs to patients, including self-help, lifestyle change support, and information regarding management of chronic diseases. 1 –3
The impact of aging on the adoption and use of new technologies seems to be related to several factors, such as age-related decrease in cognitive abilities, which makes the interaction difficult, anxiety in approaching technology, and negative attitudes toward new technology. 4 –6 Normal aging is associated with decline in several fluid intelligence abilities such as working memory, prospective memory, processing speed, and slower acquisition of new skills, and these have been shown to have important implications on performance when using unfamiliar technology. 7 In the case of IVR, the abilities that have been shown to impact performance in an older adult population are working memory and auditory memory. 8
The use of participants' feedback in the development of IVR application in older populations can be challenging because older adults were found to give usability ratings comparable to those by younger adults despite having more difficulties in completing the IVR tasks. 4 Similar results were reported in another study evaluating ways to improve IVR design. 9 Collectively, these results indicate that although failure to accomplish one's goal when using technology is associated with a negative perception of the technology, there is a subgroup of people for which this prediction does not hold. These findings may have important implications for IVR design that uses users feedback to improve interfaces. Most importantly, it would be helpful to identify subject characteristics that set apart these users from the rest. There is also evidence for gender differences among seniors in the adoption of new communication technologies. However, two surveys on IVR use and attitudes have failed to find gender differences. 5,6
The goal of the present study was to evaluate participants' attitudes toward the four IVR systems included in our study. We looked at their overall attitudes toward the four IVR systems as well as participants' attitudes in relation to their success or failure to interact with the systems.
Subjects and Methods
Participants
The present study and all corresponding documentation were reviewed and approved by the Research Ethics Board of the University of Ottawa. One hundred eighty-five (120 females [65%]) community-dwelling people between 65 and 92 years of age (mean, 73.32 years, standard deviation [SD], 6.44 years) were recruited from diverse socioeconomic backgrounds using advertisements in two free magazines for seniors and flyers, in community centers and subsidized housing buildings. Participants' education ranged from 7 to 21 years (mean, 13.85 years; SD, 2.79 years); 2% of participants had grade 8 or less, 14% had between grade 9 and grade 11, 34% had a high school diploma, 18% had some college or university, and 32% had a bachelor's, graduate, or professional degree. Full scale IQ ranged from 68 to 134. The only exclusion criteria were age younger than 65 years and lack of proficiency in English. Ten percent reported being diabetic, 1.1% reported having had a hemorrhagic stroke, 1.1% had been treated for a brain tumor, 1.6% reported another unspecified brain disease, and 0.5% had chronic hepatitis. Two percent reported currently seeing a psychiatrist, and 9.2% were currently being treated for depression. Thirty-five percent of the sample reported experiencing memory problems. Participants were invited to use their hearing aid, as they would do normally and adjust the telephone's volume setting as needed.
Materials and Methods
The first two systems chosen were governmental IVR systems: Service Canada and Statistics Canada. Service Canada is the central agency that provides information on all government programs, including old age pensions. Statistics Canada is the source of information regarding various aspects of Canada, many derived from census data. The last two were the IVR systems of United Airlines and Air Canada, which provide flight and ticket information to customers. The United Airlines system is a voice response system, whereas the other three systems require callers to use the telephone keypad to enter their responses.
The usability questionnaire adapted from the study of Dulude
4
is a qualitative measure of participants' experiences with the four IVR systems. The measure consisted of 10 short questions, which provided information regarding participants' perception of the usability of the four IVR systems (see Supplementary Appendix A; Supplementary Data are available online at
The Wechsler Adult Intelligence Scale, 4th edition (WAIS-IV) is composed of 10 core subtests and 5 optional subtests, measuring several cognitive functions. The test provides a global index of intellectual functioning—the Full Scale IQ. The Wechsler Memory Scale, 4th edition (WMS-IV) older adult battery (for people 65–90 years old) produces four index scores: Auditory Memory Index, Visual Memory Index, Immediate Memory Index, and Delayed Memory Index.
Participants' interaction with the four automated systems over the telephone was recorded. A touchtone phone (Mitel® 5212; Mitel Networks Corp., Plymouth Meeting, PA) was used to call the IVR systems. The telephone used in this study had the keypad located on the base of the telephone and not on the headset. A Sony® (Tokyo, Japan) MP3 IC recorder (ICD-UX7 1F/UX81F) was attached to the phone line. Sony stereo headphones (MDR-XD200) were attached to the recorder. Thus, the examiners were able to listen to participants' interaction with the IVR system as the interaction was unfolding.
Procedure
When participants arrived at the memory laboratory at the University of Ottawa, self-reported information on participants' health and memory status was obtained using a health questionnaire. Next, participants completed four IVR tasks over the phone. The tasks asked participants to call an IVR system and obtain specific information. Participants were instructed to use only the automated menus and not use the option that allowed them to speak to an operator. Instructions on the four different tasks were both verbally presented to participants, and a list of these instructions was placed in front of them while they were completing the task. Participants were allowed to take notes while completing the tasks, but most participants did not take advantage of this option.
The instructions for Task 1 required participants to call Service Canada and obtain information on what one needs in order to qualify for an old age security pension. The instructions for Task 2 asked participants to call Statistics Canada and obtain the latest information on the unemployment rate for the Ottawa–Gatineau region. The instructions for Task 3 required participants to call United Airlines and find out if there was a flight from Toronto to New York (Kennedy Airport) around 6:00 p.m. tomorrow evening and write down the flight number and departure time of the flight. Similarly, Task 4 asked participants to call Air Canada and gather information regarding flight availability from Toronto to Vancouver around 12:00 p.m. tomorrow. Participants were again asked to note down the flight number and departure time of the flight. For the Air Canada task, participants were made aware that at some point during the call, they would be asked to provide the three-letter code of the departure and arrival cities. The three-letter codes were provided to participants in the form of written instructions.
No time limits were imposed for the completion of the four tasks, and redialing was only permitted if the phone line was busy or a wrong number was reached. Thus, recovery from errors was expected to happen within the same call. The task ended when participants indicated that they had obtained the information, they had hung up, or if a live operator came on the line. The latter usually happened either because a large number of errors triggered an automatic referral to an operator or because the participants pressed the option to connect to an operator despite being told not to do so at the beginning of each task. On average, participants took about 20 min to complete all four tasks.
The IVR tasks were administered in fixed order starting with Service Canada, which required going through four levels in order to complete the task. Statistics Canada required going through 5 levels for completion, United Airlines required going through 12 levels, and Air Canada required going through 13 levels (see Supplementary Appendix B). Following the completion of each IVR task participants were asked to fill out the usability questionnaire for that particular system. The usability questionnaires were identical for all four systems. Participants' performance on the four IVR systems was scored according to the number of successfully completed IVR tasks (maximum of 4). Following the completion of the four IVR tasks, participants were administered first the WAIS-IV and second the WMS-IV batteries in fixed order.
Statistical Analysis
We used McNemar's tests to examine the differences in participants' performance on the four IVR tasks. A nonparametric test was chosen because of the categorical nature of the success variable (either completed the task or not). The alpha level was adjusted for the six comparisons (p<0.01). An independent-samples t test was used to examine differences between men and women in success in using IVR and attitudes toward the four systems.
Results
Success in Completion of the Four Ivr Tasks
Participants took an average of 20 min to perform the calls. The range was 15–40 min. All testing including the neuropsychological tests lasted between 2.5 and 3 h. The participants were instructed that they could take breaks whenever they needed them, but there was no planned break. Twenty-one percent of participants were not able to complete any of the tasks assigned, and only 3.2% succeeded in completing all four IVR tasks. Seventeen percent of participants completed only one task, 32% completed two tasks, and 17.8% completed three tasks. Statistics Canada was the IVR task with the highest rate of completion at a success rate of 57.8%, followed by Service Canada (52.4% success rate). The United Airlines system (the only voice-recognition system) had the lowest success rate at 21.6%, and Air Canada had a 24.9% success rate. Table 1 shows that participants who could not successfully complete any of the IVR tasks were older compared with those who could complete all four tasks and had lower full scale IQ compared with those who could complete all four tasks. There was no age difference between those who completed one or more tasks. People who completed one, two, or three tasks had similar IQ. In a previous analysis of the psychometric data, working memory and auditory memory measures were the best psychometric predictors of task completion after age was controlled for. 8
Characteristics of Participants Who Successfully Completed the Four Tasks
Data are mean±standard deviation values (range).
There were no significant differences between participants' performance on the Service Canada task and the Statistics Canada task and between the United Airlines task and the Air Canada task: by McNemar's test, χ2(1, n=185)=1.066, p=0.302 and χ2(1, n=185)=0.446, p=0.504, respectively. Participants' performance on the Service Canada task was better than on the Air Canada task [χ2(1, n=185)=27.473, p<0.001] and the United Airlines task [χ2(1, n=185)=36.894, p<0.001]. Participants' performance on the Statistics Canada task was better than on the United Airlines task [χ2(1, n=185)=50.069, p<0.001] and the Air Canada task [χ2(1, n=185)=41.379, p<0.001]. There were no significant differences between the performance of men (mean, 1.49; SD, 1.13) and women (mean, 1.60; SD, 1.09) (t 183=0.683, p=0.50).
Participants' Perceptions of the Ivr Systems
Usability ratings for all four IVR systems are presented in Table 2. Note that the scoring of the negative statements was inverted so that they could be compared with positive statements. There were no significant differences between the ratings of men (mean, 130.51; SD, 19.82) and women (mean, 127.13; SD, 20.33) (t 183=1.09, p=0.28). For the Service Canada system, people who completed the task (mean, 37.76; SD, 6.84) had a better opinion than those who were unable to complete the task (mean, 34.75; SD, 8.21) (t 183=2.72, p<0.01). For the Statistics Canada IVR system, we found no significant differences between people who completed the task (mean, 35.67; SD, 7.71) and those who were unable to complete the task (mean, 33.58; SD, 8.68) (t 183=1.73, p=0.09), indicating that participants' success with the system did not influence their rating of the system. For the United Airlines task, people who completed the task (mean, 33.30; SD, 8.34) had a better opinion than those who were unable to complete the task (mean, 28.58; SD, 7.55) (t 183=3.42, p<0.01). The same was true for the Air Canada task: people who completed the task (mean, 30.09; SD, 7.07) had a better opinion than those who were unable to complete the task (mean, 26.78; SD, 7.07) (t 183=2.75, p<0.01).
Usability Ratings for All Four Interactive Voice Response Systems
A rating scale from 1 to 5 was used: 1=strongly agree, 2=disagree, 3=neutral, 4=agree and 5=strongly agree.
The rating score for questions 2, 4, 7, 8, and 10 was reversed so that higher scores always indicate a more positive judgment (indicate less agreement with the statement). This was done so that the scores on all questions could be compared.
System Usability Analyses
The Service Canada System had the highest and more favorable ratings by participants on the Usability Questionnaire (mean, 36.33; SD, 7.65), followed by the Statistics Canada IVR system (mean, 34.79; SD, 8.17), United Airlines (mean, 29.60; SD, 7.95), and Air Canada (mean, 27. 60; SD, 7.19). There was a significant between-systems effect (F 1, 184=68.870, p<0.01). The pairwise comparisons adjusted for multiple comparisons revealed that the ratings for all four systems were significantly different from each other (p<0.01), except between the Service Canada and Statistics Canada IVR systems (p=0.02) (see Table 1 for mean and SD values).
Next, we examined the differences in ratings of the four systems for each of the 10 questions comprising the Usability Questionnaire using a nonparametric equivalent of related-samples t test, the Wilcoxon signed rank test. The alpha level was adjusted at the 0.01 level to account for multiple comparisons (0.05/6 comparisons). There were no significant differences in ratings for most questions between the Service Canada and Statistics Canada systems and between United Airlines and Air Canada. The only differences were between Service Canada and Statistics Canada against the two airline systems. On Item 5, “The operator's voice was very clear,” only Service Canada was rated higher compared with all other three systems (p<0.01). On Item 10, “I thought the operator spoke too quickly,” there were no significant differences among the four systems.
Although the IVR performance of participants was significantly correlated with the way they rated the systems (people who completed more tasks were likely to give higher scores to the system), the correlation between the IVR success variable and IVR ratings variable was only 0.189 (p<0.01). Thus, we proceeded by dichotomizing the two variables in order to evaluate if our sample included people who did not succeed on the IVR tasks (completed none, one, or two tasks) but rated the system high (overall score on the Usability Questionnaire higher than 120). The results revealed that half of the sample (50.8% of people) gave higher ratings to the systems despite having difficulties in completing the tasks. The rest of the sample included 4.9% of participants who were successful with the systems but gave low ratings, 28.1% who had trouble with the IVR and also gave low ratings, and 16.2% who were successful and gave high ratings.
We ran a series of t tests in order to evaluate the differences on demographic variables between the people who gave high ratings to the IVR despite their lack of success and the rest of our sample. The dependent variables included age, level of education, and self-reported memory status; none of these variables was significantly different between the two groups. The only difference emerged in participants' scores on intelligence testing; participants who gave high scores despite their lack of success tended to have lower full scale IQ (composite score mean, 243.23 versus 260.37; SD, 46.51 versus 39.66) (t 183=2.693, p<0.01).
Discussion
Despite the increased popularity of IVR systems, research regarding the acceptability and usability of these systems, especially in the older adult population, is limited. This study examined the impact of several demographic variables in a sample of community-dwelling older adults 65 years of age and older on their attitudes toward IVR systems.
Contrary to previous research, men and women in our sample did not differ in the number of IVR tasks they were able to complete, but most of our participants had significant difficulties interacting with the systems. There were no gender differences in participants' ratings of the IVR systems, similar to previous surveys. 5,6 We did not measure anxiety related to the use of IVR or familiarity with the technology—two factors that have been previously reported to impact people's ability to interact with new technology 10 ; thus, we are unable to comment on their significance in the case of IVR.
The ratings of the four IVR systems followed the same pattern as the success rate of participants in their interactions with the systems. It is not surprising that the two systems that were easiest to navigate (Service Canada and Statistics Canada) were also rated highest compared with the two airline systems (United Airlines and Air Canada). These findings were also in line with previous research showing a significant relationship between positive experience with IVR and perceptions of the technology. 4,5 Nevertheless, we noted that the strength of the correlation between success and overall ratings of the systems was low. Further evaluation of the data revealed that half of our sample gave much higher ratings of the systems than what would have been expected on the basis of their success with the IVR systems; similar findings were also reported in previous studies. 4,6,11 In our sample, participants who gave higher ratings of the systems despite their general lack of success were only different in terms of their full scale IQ from the rest of the sample. This finding has an important implication for IVR design that relies on customers' ratings in evaluating the usability of their systems. Designers should be mindful of the fact that a subpopulation of people with lower cognitive abilities may rate the systems high despite experiencing significant difficulties in using the technology. Our study design does not allow us to comment on other factors that may also be in play regarding these findings. However, some of the comments that participants made after their interactions with the IVRs led us to believe that many of them attributed their difficulties in using the technology to their personal abilities rather than a faulty IVR design. Because the questionnaire was not designed with this finding in mind, we did not probe further the aspects that may explain these observations.
Closer examination of the 10 questions comprising the Usability Questionnaire revealed that most items followed the same pattern as the one observed in terms of success rate with the four systems. No significant differences emerged between Service Canada and Statistics Canada and between United Airlines and Air Canada IVR systems. However, ratings between the two easier systems and the two airline systems were significantly different. Participants gave much lower ratings of the two airline systems. It is important to note that the one question concerned with the level of concentration required to complete the tasks received low ratings (below the mean of 3) for all systems. This suggests that participants experienced the IVR interaction (even for the two easier tasks) as taxing on their attention and concentration abilities. This finding is in line with our previous investigation showing that working memory (a cognitive ability strongly related to attention and concentration) is a significant predictor of older people's ability to cope with IVR. 8
Footnotes
Acknowledgments
This study was funded by the Natural Sciences and Engineering Council of Canada and by TelASK Technologies.
Disclosure Statement
V.T. is an employee of TelASK Technologies. D.I.M., F.A., M.G., and C.M. declare no competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
