Abstract
Internet Protocol Caption Telephone Services provide access to telecommunication for persons who are d/Deaf or Hard of Hearing as users can listen to and read captions of what the other speaker says. This paper presents the results of Applied Cognitive Task Analyses with d/Deaf or Hard of Hearing users to uncover the strategies experienced caption phone users deploy to address communication challenges. Results indicate that users learn strategies that involve adapting their communication behaviors, obtaining the collaboration of the other speaker, selecting the best technology for specific settings, and disclosing hearing loss. Some situations are particularly challenging, e.g., higher caption error when the other speaker has an “accent” or the conversation includes technical language; having to disclose hearing loss and thus risking negative consequences due to biases against disability. Findings can help accelerate novice users’ learning of strategies and inform the development of sociotechnical solutions for caption phones’ current shortcomings.
Keywords
Introduction
In the U.S. in 2021, 11.6 million people reported being deaf or having serious difficulty hearing, representing 3.6% of the U.S. population. Prevalence increases with age, e.g., from the age brackets of 35-64 years to 65 and over, the prevalence of deafness or hearing difficulty increases from 2.7% to 13.3% (U.S. Census Bureau, 2021). As the U.S. population ages, the prevalence of hearing loss will increase.
Being d/Deaf (including individuals who are deaf in the audiological sense, and those who identify as culturally Deaf) or Hard of Hearing (HoH) is associated with significant barriers to daily life. These barriers include difficulties effectively communicating over the phone, which is critical for a diversity of communication needs including the workplace, healthcare, and at home. Many people who are d/Deaf or HoH and cannot effectively communicate using regular phones rely on Internet Protocol Caption Telephone Services (IP CTS). IP CTS is a telecommunications relay service for users who can speak, but who have difficulty hearing. With this technology, the user can listen to the other speaker (often with added sound amplification) and simultaneously read captions of the other speaker’s words (Payton et al., 2017).
With IP CTS, captions can be provided in different ways: (i) a Communication Agent (CA) hears and manually transcribes what the other speaker says into text and transmits the caption text directly to the IP CTS user’s telephone display; (ii) the CA repeats, or “revoices”, what the other party says and an Automatic Speech Recognition (ASR) engine transcribes the revoiced words into text; (iii) ASR directly captions the other speaker, with no human involvement. IP CTS can be provided for a variety of devices including landline phones, mobile devices, and personal computers.
IP CTS is provided by the Federal Communications Commission (FCC) at no cost to consumers who require it. On behalf of the FCC, MITRE independently conducts research to assess the service quality and usability of IP CTS and explore socio-technical solutions for overcoming IP CTS shortcomings. The research conducted by MITRE and its academic partners indicate that despite technological improvements, IP CTS still presents usability issues that challenge effective communication, namely caption delay (the time elapsed between hearing a voice on the caption phone and the display of captions), and caption error (the proportion of incorrectly captioned words) (Federal Communications Commission, 2018).
Caption delay and error can be relatively harmless at low levels but can often reach levels that have a negative effect for users of the technology. Those levels remain up for debate in large part because of inconsistent knowledge of when and for whom negative effects occur. Therefore, research is needed to uncover the real-life burdens, including excessive communication effort and cognitive demand, that IP CTS users experience. MITRE’s review of the literature found multiple experiments involving caption phone users, but no formal interview-based knowledge elicitation with people regarding their lived experience with caption phones.
To fill this research gap, this study used Cognitive Task Analysis (CTA), a family of methods whose purpose is to directly engage with people performing tasks of interest to systematically elicit, structure, and document the cognitive work involved in performing those tasks successfully (Crandall et al., 2006). During CTA, participants reflect on their current and past usage of a technology, tricks of the trade they discovered or learned from others, challenges that have appeared or disappeared due to changes to the technology over time, and challenges that have persisted regardless. From the diverse body of CTA methods available (Hoffman & Militello, 2009), Applied Cognitive Task Analysis (ACTA; Militello & Hutton, 1998) was chosen based on its structured approach for eliciting knowledge from expert technology users in a way that helps make it transferable to novice users as well as translatable into design and development recommendations for improving IP CTS. To our knowledge, this study was the first time ACTA was applied with IP CTS users.
The goal of this study was to uncover (i) the real-world impact of IP CTS shortcomings on users’ cognitive demands and ability to effectively communicate, and (ii) the strategies that expert IP CTS users have developed to overcome communication challenges. This knowledge can help novice users accelerate their learning of strategies to improve caption phone communication and inform the development of sociotechnical solutions for caption phones’ current limitations.
Methods
ACTA interviews were conducted at the MITRE booth in the 2022 Hearing Loss Association of America (HLAA) conference. The demographics of the HLAA membership makes this conference an ideal place to recruit captioned phone users.
The interview consists of three stages. The first creates a task diagram to decompose a selected activity into tasks and subtasks, highlighting those that are difficult or require the most expertise. The second is a knowledge audit, which elicits examples of different types of expertise required for the task, why it is difficult, and what cues or strategies the expert uses to accomplish the task. The last stage is a simulation interview, which uses a simulated version of the task (such as a table-top exercise) to lead the expert through a series of events that highlight the cognitive demands of the task. For each event, the interviewee is asked to assess the situation, explain any actions that might be taken, describe the critical cues leading to those actions, and potential errors a novice might make in the same situation.
The method was first piloted in 2020, at a time of restricted face-to-face interactions due to the COVID-19 pandemic. Because of this, MITRE modified the original ACTA protocol for a hybrid survey and follow-up interview approach. Based on pilot data, MITRE modified the first stage of the ACTA protocol, as the “task” of using an IP CTS phone is already well-defined. Therefore, the first stage of the ACTA interview consisted of five questions about the participant’s past and present use of IP CTS.
The scenarios used in the last stage of ACTA described situations in which a user must navigate the communication challenges resulting from caption delay and caption errors. For instance, in one scenario the participant was asked to imagine being in a phone call with a nurse who is giving him lab results. The participant was shown captions which unbeknownst to him had errors and asked what he would do next in the phone call. Next, the participant was shown side by side the caption with errors and a transcript of what the nurse had really said and asked what he would do during the call once he realize the errors in the captions.
Following the interview, participants answered a demographic survey which included the HHI, a 10-item questionnaire that assesses how an individual perceives the social and emotional effects of hearing loss (Ventry & Weinstein, 1982; Weinstein & Ventry, 1983).
Participants
Eleven participants were interviewed, seven males and four females. One participant identified as belonging to the 50-59 age range; four were in the 60-69 age range, three in the 70-79, and three in the 80 or older range.
Seven participants self-identified as “Hard of Hearing”, three as “Person with a hearing loss” and one as “Deaf”, with all reporting having received a hearing test by an audiologist or health care professional.
Four participants reported profound hearing loss (90 dB or higher) in both ears, three severe (71-70 dB) in one ear and profound in the other, two severe in both ears, one severe in one and moderate hearing loss (51-70 dB) in the other, and one indicated “80%” loss in both ears. Based on the Hearing Handicap Inventory (HHI) results, four participants had a 50% probability of mild-moderate hearing impairment, and seven had an 84% probability of severe hearing impairment.
Pilot interviews and prior informal engagement with IP CTS users indicated that the learning curve to master a captioned phone is typically three to six months. Individuals with six or more months of experience as users of caption phones were considered sufficiently “expert”, i.e., are familiar with IP CTS, have encountered a variety of typical challenges, and developed strategies for communicating effectively via IP CTS. Two participants reported one year of caption phone use, five between 4 and 8 years, and four had used caption phones between 10 and 50 years. The types of phones most used by participants were cell phone and caption phones. Participants received a $25 Visa gift card for participating in this study.
Results
The following section presents overall themes, key findings within them, and illustrative quotes from participants. We use the term “user” to refer to a caption phone user who is d/Deaf or HoH, and “the other speaker” to refer to the individual with whom the user is communicating.
Adaptation
All participants reported trying to overcome phone communication challenges by adapting their communication behaviors and/or asking the other speaker to modify theirs. Examples include asking the other speaker clarifying questions or to repeat what they said, explaining present problems with the captions, and asking the other speaker to be patient and wait while the user receives the captions.
Many of these behavioral strategies involve users disclosing their hearing loss. However, some participants do not want to disclose this in any situation or when it can have detrimental consequences, e.g., a phone job interview. One user prefers not to disclose her hearing loss “because not hearing has been such a problem that if I can keep quiet then maybe I'm not gonna have problems. And people don't understand… they wanna talk loudly or do something like that.”
Most of these strategies require the cooperation of the other speaker, which does not always happen. One participant developed a strategy to end calls from telemarketers by asking for their cooperation: “When I get an unwanted call from a seller, I tell them ‘Just a second, I'm Hard of Hearing and I'm reading my captions in order to follow you’ and they hang up… it really works wonderful.” Dependency on the other speaker’s cooperation decreases the self-sufficiency of caption phone users.
Selective Use
Participants perceive differences in usability and effectiveness amongst IP CTS providers and believe that no single instantiation of current technologies address all their communication needs. Users select the type of IP CTS technology that works best overall or in specific situations, such as finding one provider that works best for work, and another that works best for calls to friends and family, or learning which platforms have better technical performance in different situations. For instance, when sound quality is too poor in landline, one participant switches to his cellphone because “it streams to my implant …and I got a matching brand hearing aid in the other ear so they co-ordinate without extra equipment and the streaming is really good…it’s almost like I’m a normal person.”
Participants take a variety of actions to try to make the IP CTS technology work effectively, such as unplugging and plugging in the phone to reset it when it acts unpredictably. A participant recounted that when captions stop “I have to like reset the caption button on the phone. Sometimes I can reset it and it'll come right back…sometimes it don't and I have to unplug it, plug it back and in the end it will come back but by that time I’m done with that call.” Participants acknowledge that some of these strategies do not always work, yet they continue to deploy them, indicating a lack of more reliable means of communication.
Emotional and Interpersonal Impacts
Faced with technological challenges and pressure to disclose their hearing loss, users feel an array of negative emotions and worry of being misperceived as less capable. Examples include embarrassment at having to disclose hearing loss and feeling vulnerable about their dependence on IP CTS. A participant related, “when you can’t hear, you rely on captioning so much; when it goes away you are like in a panic.”
Users are concerned that communication challenges can negatively impact how they are perceived in multiple domains (not just hearing ability, but also cognition and social skills), and impair their ability to formulate and maintain social and business relationships. A participant explained, “sometimes people would think you're not really listening…because sometimes if I'm like on a business call like they ask you ‘what's your address?’ And I don't hear them say it right away, so I pause them. And then I'm thinking that people probably think ‘what's wrong with her, she doesn’t know her address?’
Avoidance
Some IP CTS usage strategies consist of avoiding the technology entirely. This results in reducing the use of caption phones, be it the number of calls placed or accepted, and the topics discussed during phone calls. Examples include avoiding caption phones for “high risk conversations” in which misunderstandings can have detrimental consequences and asking to communicate instead in written form, via email or text. One participant sometimes prefers email communication because “…if it’s important, I want to know every word accurately. I mean, it’s one of those situations where you don’t hear ‘not’, it completely reverses, I frequently say to people, ‘I hear much better on email.’”
The avoidance of technology suggests that users view some telephone communication with elevated risk, e.g., misunderstanding a medical diagnosis. A participant explained, “I don't trust…when something that delicate that they…didn't screw up a word that will mess me up, particularly because they're medical terms…that aren’t necessarily are very clear to anybody, let alone through the captioning system.”
Users avoid certain types of communication situations when applying coping strategies is taxing. For example, the effort needed to repair misunderstandings is particularly high when talking to unfamiliar speakers who are not aware of their hearing disability.
“Off the phone” strategies
Other IP CTS usage strategies involve tasks outside of phone calls that increase the users’ workload or using the help of a trusted person which reduces users’ privacy and independence during phone communication. For instance, one participant reads about the topic and context of an upcoming conversation, particularly when it comes to business calls. Gaining this background knowledge helps her understand what captions to expect and compensate for caption errors. This strategy however comes at the cost of increasing the user’s workload for communicating on the phone;“I would always do more research in … the business I'm in, I would do a lot of ancillary reading to build up my intelligent subjects because that would help provide context to me…you know where the conversation might go…I could fill in pieces…So like I spend probably sometimes twice the time you would spend on a meeting…because you're going to the meeting and then you're moving on. And then I'm reading about what would just happen in the meeting.”
A couple of participants have a trusted person help them during phone calls. One user has his spouse participate in the call and let him know of caption errors. The other user has his spouse talk to the speaker if the speaker does not agree to repeat information and speak slowly and clearly.
Learning Curve
Users require time to learn strategies for dealing with technological challenges of using IP CTS. These learned strategies reflect the user’s conceptualizations or mental models of how the technology works and the conditions that decrease its performance.
For some participants, this learning process included gaining self-advocacy skills and experience acting as a hearing loss advocate which in turn helped them be less self-conscious about, and more willing to disclose, their hearing loss as part of their strategies to deal with caption phone communication challenges.
A participant recounted how talking to strangers, compared to family who knew and adapted to his hearing loss, made him more aware of communication challenges and motivated him to find strategies to overcome them; “probably took me close to six months, I think, to start, you know…saying, hmm, let me try to figure this out and see what's going on…it was mostly after I talked to somebody other than my mom and my brother…that I finally realized that maybe there's, you know, some work I got to do to understand this captioning stuff.”
Participants attributed certain strategies to novice caption phone users that are less conducive to successful communication. These novice strategies include simply hanging up, giving up the use of caption phones, or changing the subject or doing the majority of the talking as a means to control the conversation and reduce the need to hear the other speaker. A participant explained that novice caption users resort to “doing the majority of the talking, because that's one way people that can't hear operate. If you dominate the conversation you don't have to hear what other people are saying.”
Participants’ accounts of the technological challenges they face reflect their understanding of when and why caption phone performance decreases. A common perception amongst participants is that caption accuracy is worse when the conversation involves technical language like medical or profession-specific terms. This issue was attributed to CA or ASR engines being unprepared for the technical terms. A participant expressed, “when they [CAs] don’t know what you’re going to be talking about, they may have no idea about the subject matter, they may have no knowledge of the subject matter and the captions you get will be pretty useless…there are times where I have said, this is simply just not working and quit.” Another widely shared perception is that caption error increases when the speaker has an accent, an issue that is known to reduce caption accuracy for ASR engines (Hinsvark et al, 2021).
Discussion
Participants’ accounts indicate that they are highly motivated to use IP CTS and are resourceful in learning and deploying strategies to counteract the challenges of using this technology. Nonetheless, even this group of motivated, resourceful, and experienced caption phone users stressed how emotionally taxing it is to face these challenges including feelings of frustration, stress, and discouragement.
Participants also highlighted their concern of being misperceived as less cognitively and socially skillful because of how technological issues impact their ability to communicate over a captioned phone. That users surveyed at a consumer group conference report these negative impacts suggest that users who are less knowledgeable in caption phone use and self-advocacy skills likely experience higher levels of emotional and interpersonal distress when faced with the same technological challenges. These observations highlight the need to improve caption latency and accuracy to reduce the emotional and interpersonal impacts they exacerbate, and to provide users with early learning experiences about strategies to deal with common caption phone issues.
Participants’ accounts indicate that experienced users of caption phones learn a variety of strategies to deal with the challenges associated with communicating using caption phones. However, the mere observation that a strategy works for a certain user in a particular context cannot be taken as evidence that all users can successfully deploy the strategy, nor that such a strategy will work for all types of phone calls. Numerous factors influence the strategy a user deploys and its efficacy in a particular IP CTS call. These factors include:
The user’s beliefs of what is causing the communication challenge and what actions can repair the issue
For example, if a user believes that caption delays are the result of the current connection, they will try hanging up and calling again hoping that a new connection will improve caption speed. Another user who believes that caption delay is a “bug” that happens every so often will continue the call but use behavioral strategies such as explaining the delay to the other speaker and asking him to wait while the captions appear.
How consequential it is if the user misunderstands information
The more consequential it is to correctly understand the information, the more the user will persist in attempting to resolve technological challenges. For some users the negative consequences of misunderstanding information may be so high that, in the presence of caption errors or other communication dysfunctions, they will stop using the caption phone and communicate instead in writing (e.g., text, email), despite it being slower and asynchronous.
The type of social relationship the user has with the other speaker
Participants differentiated between the challenges of using caption phones with familiar versus unfamiliar speakers (e.g., a relative versus a customer service representative). Overall, dealing with technological challenges during calls with unfamiliar speakers was reported as more challenging. Unfamiliar speakers share less common ground with the user and thus the latter needs to exert more effort to achieve mutual understanding, compared to a familiar speaker who is aware of the users’ hearing loss and who has experience accommodating his communicative behaviors during captioned calls.
Lastly, the user’s preferences and capabilities for deploying specific strategies
For example, most behavioral strategies require self-disclosure of hearing loss, an action that some users avoid. Even a user who is comfortable revealing hearing loss may want to avoid doing so because of the potential negative impact of the disclosure (e.g., during a job interview because of fear of reducing likelihood of being hired). In the U.S. persons with disabilities have the legal right not to disclose disability status when applying for a job (U.S. Equal Employment Opportunity Commission, n.d.), a reasonable strategy given employers’ conscious or unconscious biases against disabilities. However, caption phone users can feel pressured to and ultimately disclose hearing loss against their preferences if caption errors or delays may be causing them to be perceived as less competent candidates because of how the communication is degraded.
Most communication strategies described by participants require the cooperation of the other speaker, for instance asking the other speaker to be patient and wait for the captions to appear, or to rephrase information that was not correctly captioned. These requests increase the demands on the other speaker, compared to calls using regular phones, which can lead to some speakers abandoning the call. Ultimately, if the other speaker does not agree to adjust his communicative behaviors to accommodate the caption phone user, the strategy cannot be deployed.
This dependence on the other speakers’ cooperation decreases the independence and self-sufficiency of caption phone users. These observations indicate a need to develop methods to overcome technological issues that are independent of the other speaker, e.g., providing IP CTS users with sound filters that improve audio quality and reduce reliance on captions. At the same time, it would also help to educate hearing speakers about the impact of hearing loss on telecommunication, the capabilities and limitations of current technology, and the actions they can take to facilitate effective communication with telephone users of all abilities.
Participants’ widely shared perceptions that caption error increases when the speaker has an accent is in line with prior research which indicates that ASR systems perform poorly on accented speech and dialects (e.g., African American Vernacular Language, Deaf accent) that do not fall within the “standard” accent ASR engines are mostly trained on (Glasser et al., 2017; Hinsvark et al., 2021; Koenecke et al., 2020). “Standard” accent refers to the accepted accent of the majority population, which in the U.S. has been identified as the Midwestern Broadcasting Accent. These results indicate the need to train and assess ASR engines using diverse speech datasets so that usability and accuracy metrics are truly representative of the diversity of spoken English.
Participants stressed the poor accuracy of captions for conversations that involve technical language such as medical terms. This is especially impactful for users because technical language is typically involved in highly consequential conversations in which misunderstanding can lead to significant negative consequences (e.g., medical or work call). This finding indicates users’ need for captioning services, whether via CAs or ASR engines, that are trained on technical language. The concept of “skills-based routing” is routinely used in customer service call routing, to quickly direct calls to the best person or group to handle a customer’s specific needs. An analogous approach also exists in the field of language interpretation for hearing individuals with interpreters specialized for example on medical or legal interpreting (Flores, 2005). Further research is needed on methods to apply skills-based routing concepts for more adaptive and user-centric caption telephone services.
In summary, the impact of poor IP CTS service quality and usability can be severe enough that users consider the captions useless and decide to stop communicating over the phone and instead do it in writing. This effectively deprives people with hearing disabilities the opportunity to communicate in a functionally equivalent way as people without such disabilities, as required by Title IV the Americans with Disabilities Act (1990). The communication strategies used by experienced IP CTS users described here reveal many of the limitations of current offerings, which affect novice IP CTS users even more significantly. However, these strategies also point to opportunities to improve IP CTS quality and usability to improve communication effectiveness for all users.
