Abstract
☑ Earn CEs Online
by answering questions on this article.
For more information, visit:
http://my.aerbvi.org/AER-Store/Publications/BKctl/ViewDetails/SKU/AER.
Electronic and optical magnifiers are helpful for people with low vision to access print, although magnifiers are difficult for people with very low acuity to use and are not helpful for people who are blind. Recently, artificial intelligence (AI) vision aids have emerged as useful print-to-speech tools for visually impaired persons to access hardcopy text in books, on product packaging, on signs, and in other environmental contexts (Moisseiev & Mannis, 2016). The types of AI vision aids range widely in size, operating platform, and price. In this study, we aim to compare the print-reading functionality and usability of two devices in this category with the broader goal of illustrating the factors that would likely be important to visually impaired consumers or rehabilitation specialists and relevant to designers of such AI technology. We focused on key physical parameters that were relevant to text reading, and the out-of-the-box usability of the devices for typical activities of daily living. We recognize that modern technology is continually changing, but we expect that the factors described in this report will be relevant to the evaluation of AI-based print-to-speech systems for people with visual impairments.
We compared Orcam MyEye 1 (Orcam; Jerusalem, Israel) and Seeing AI (Microsoft; Redmond, WA, USA) to determine if there are differences in usability despite the differences in design and price point. Although both devices offer features in addition to text reading, such as face recognition, we elected to focus on text reading in this study because we expect reading to be the primary function of these devices for users with visual impairments. Better understanding of these two technologies or other similar devices would help low vision specialists and potential users with severe vision impairment in selection of the most suitable print-to-speech device.
Orcam MyEye 1, which retails for $3500 (all prices are listed in U.S. dollars) in the United States, is a portable device that includes a camera mounted on a pair of spectacles and a handheld control apparatus that operates the camera. The device can begin to convert text to speech when the camera senses a finger pointing toward a block of text or when the user presses a button on the handheld control apparatus. Seeing AI is an application that can be downloaded for free onto any device that utilizes the iOS mobile operating system created and developed by Apple. It utilizes the iOS device’s camera and can be operated through VoiceOver (Apple; Cupertino, CA, USA), screen-reading software that is a standard component of the iOS operating system. Seeing AI can convert print to speech after the user captures text with the iOS device camera.
We carried out within-participant evaluation of the functional performance of these two vision aids with seven individuals with visual impairments.
Methods
Participants and recruitment
From January to March 2018, seven participants were recruited to test the usability of the Orcam MyEye 1 and Seeing AI (see Table 1). Recruitment criteria included: 1. Self-reported best-corrected visual acuity ≤ 20/200 Snellen in the better eye. Two of the seven participants had measurable acuity, and the others had light perception or no light perception. Five out of seven participants have had severe vision impairment since age 2 years or earlier. 2. Unfamiliarity (not current or previous users) of both Orcam and Seeing AI. 3. Familiarity (current user) with iOS and VoiceOver gestures. Although requiring familiarity with VoiceOver may have conferred a bias in favor of performance with Seeing AI, requiring participants to learn VoiceOver gestures as part of the evaluation would likely have conferred an opposite bias. Participant characteristics. Note. BCVA = best-corrected visual acuity; LP = light perception; NLP = no light perception.
The University of Minnesota Institutional Review Board (IRB) ruled that IRB approval was not required for this protocol because it dealt with the usability of technology. The IRB determined the protocol to be a product testing activity, and, therefore, it did not qualify as research involving human participants as defined by US Department of Health and Human Services and US Food and Drug Administration regulations. Nevertheless, informed consent was obtained through email, in which participants were sent a digital copy of the study consent form. They were instructed to reply to the email with a statement of consent and a summary of the goals of the study and their roles in it. The study adhered to the tenets of the Declaration of Helsinki. Participants were provided a gift card of $30 as a token of appreciation.
Device settings
Orcam MyEye 1 had an 8-megapixel (MP) camera and its text-reading capability was initiated by the “trigger” button, which is located on the handheld control apparatus (base unit). The device translated any text that was captured by the device’s camera into speech output. An iPhone 6, which also has an 8-MP camera, was used in conjunction with Seeing AI version 2.0.1. Seeing AI has two functions for reading text, and both were used in this study. The “document” mode provided print-to-speech translation after the user captured an image of the entire page of text; this setting was used to complete tasks involving reading text printed on smooth and flat surfaces. The “short text” mode began to convert print to speech whenever any text was in view of the smartphone camera; this setting was used to complete tasks involving reading text printed on curved and irregular surfaces. “Document” and “short text” modes were each used for half of the Seeing AI tasks.
Assessment of device performance
Testing was conducted in a 16’ × 20’ laboratory room with overhead lighting that provided tabletop surface illumination of more than 400 Lux, in the range recommended for office tasks (Illuminating Engineering Society of North America, 2000). Objects for inspection were placed on a table. There were no glare sources or obvious shadows. We reasoned that typical ambient office lighting was appropriate for device evaluation. Acuity and contrast-sensitivity charts were illuminated to 71 candela per square meter. Device measurements were conducted by laboratory staff members who were familiar with acuity, MNREAD Acuity Chart testing, and other forms of laboratory measurement. Print-to-speech accuracy of Orcam MyEye 1 and Seeing AI for text printed on various everyday objects and surfaces was examined. Furthermore, physical features likely important in reading, including device acuity (the smallest print that could be read), effect of ambient light level, and effect of text orientation, were assessed for each device. Each physical feature was assessed twice with each device, and the best score was recorded.
Participant training
Participants with low vision who were able to read visually with magnifiers and those who had light perception were blindfolded during testing. All participants received a 5-minute tutorial with each device and were given 15 additional minutes to practice reading printed text in varying formats (books, instructions on household objects, and labels on food items) with both the Orcam MyEye 1 and Seeing AI. Presentation of the first device was alternated between participants. During this practice period, the researcher, who was experienced in the nonvisual use of both devices, gave instructions on how the participant could better use the devices for reading.
Specifically, training for Orcam included instruction on how to use the trigger button and the pointing gesture to take a photo. Participants were also instructed that the camera was on the outside edge of the right eyeglass frame, so they might need to tilt their heads slightly left to get an object in the field of view of the camera. Training for Seeing AI included instruction on how to use its two text-reading modes. For “document” mode, participants were told that they would have to get the whole object in view of the camera using the guidance hints provided by the app. For “short text” mode, participants were told to use an especially steady hand and to employ a scan-and-pause technique. More specifically, they were told to scan with the camera until they hear something, listen to the announcement, scan again and pause to listen.
Usability evaluation
Study tasks.
Survey questions and participant responses.
Note: All prices are listed in U.S. dollars.
aParticipants were asked to answer the question on a scale of 1–5, with 1 as strongly disagree, 3 as neutral, and 5 as strongly agree.
Statistics
Given the small sample size, most data were presented descriptively. Two-tailed paired t-test, Mann–Whitney U, and repeated measures ANOVA were used to compare the difference in scores for Orcam MyEye 1 and Seeing AI using Stata 16 statistical software (StataCorp, College Station, TX, USA).
Results
Device performance
Acuity
Tested at a standard reading distance of 40 cm with overhead room lighting, both Orcam MyEye 1 and Seeing AI successfully read down to the 0.8 M line (0.3 logMAR, 20/40 equivalent) on the MNREAD acuity chart, but they could not read smaller print. In both cases, the transition from a readable to an unreadable print size was sudden, so it was straightforward to establish an estimate of device reading acuity.
Contrast sensitivity
Both devices were tested at 1-meter viewing distance on the recognition of reduced contrast letters on the Pelli-Robson chart (letters subtending 2.9°). Orcam MyEye 1 scored 0.15 and Seeing AI scored 0.45, meaning that Seeing AI could decipher lower contrast letters.
Light level
In low light (with overhead lights turned off and the door open in the windowless testing room; text still legible to a sighted reader) and no light conditions (lights off and door closed in a windowless room; text not legible to a sighted reader), Orcam MyEye 1 was not able to function, while Seeing AI activated the smartphone’s camera flash and, thus, could still convert print to speech.
Text orientation
When text was presented upside-down or sideways, Orcam MyEye 1 could not read the text and inconsistently alerted the user about misorientation; Seeing AI was able to read text regardless of orientation.
Text recognition for vertically oriented text
Text-to-speech accuracy of orcam MyEye 1 and Seeing AI.
Human performance
Participant task completion.
Note. O = Orcam, S = Seeing AI, NC = trial not completed correctly within the 3-minute time limit.
Brief Task descriptions: Task 1 (cooking instructions), Task 2 (book title), Task 3 (canned soup identification), Task 4 (medication bottle name), Task 5 (shampoo versus conditioner), Task 6 (restaurant menu).
Questionnaire ratings were on a scale of 1 (“worst”) to 5 (“best”). Individual ratings are shown in Table 3. With regard to how participants viewed ease of text reading, the median (interquartile range) for Orcam MyEye 1 was 4 (3–4) and Seeing AI was 3 (2–5): p > .792, Mann–Whitney U test. Additionally, the two devices were not significantly different in user friendliness, median 5 (4–5) for Orcam MyEye 1 and median 4 (2–5) for Seeing AI (p = .084, Mann–Whitney U test), and helpfulness in daily life, with median 4 (4–5) for Orcam MyEye 1 and median 4 (2–5) for Seeing AI (p = .159, Mann–Whitney U test). On average, participants were willing to spend a median (interquartile range) of $100 (100–500) on Orcam MyEye 1 and $50 (0–500) on Seeing AI (p = .262, Mann–Whitney U test).
Discussion
We found that the text-reading function of two artificial intelligence vision aids, Orcam MyEye 1 and Seeing AI, can be useful tools for persons with severe vision impairment to perform important tasks of daily living that involve reading. We compared Orcam MyEye 1 and Seeing AI, which differ in cost ($3500 versus $300 for a well-conditioned previous generation secondhand iOS device and $0 for the application) and found that there were relatively small differences in physical function and usability between the two products. The accuracy of text recognition was similar for the two devices, and both performed well when reading blocks of plain text. Seeing AI had greater flexibility in dealing with light level, text orientation, and low contrast text.
In user testing, Orcam MyEye 1 seemed to perform better in the baking instructions reading task, and Seeing AI performed better in the medication bottle reading task. These findings may indicate the different affinities that these devices have for text of various appearances, as opposed to superiority of one device over another. As a group, the participants in this study did not decisively prefer one device over the other.
The physical print-to-speech properties of the two devices suggest that both are of high accuracy in reading blocks of text in common fonts with upright orientation in well-lit conditions. However, neither device could reliably translate creatively formatted text in less common fonts, such as that on medicine bottles and processed food items. The two devices achieved the same score on reading small print (0.8 M, Snellen equivalent of 20/40 at the 40 centimeters testing distance), and Seeing AI scored slightly better on the Pelli-Robson contrast sensitivity test (0.45 versus 0.15). Yet, neither device could perform as well as the typically sighted human eye on visual acuity and contrast sensitivity. Thus, further technology development should focus on improving device performance for reading texts with less controlled appearances and in less ideal conditions.
Both Orcam MyEye 1 and Seeing AI have weaknesses in usability and practicality. Seeing AI is less intuitive and requires manual positioning and alignment of the camera, whereas Orcam MyEye 1 could be used hands-free after training. Orcam MyEye 1 has limited ability to read in dimmer lighting, and this limitation could prove to be very challenging to visually impaired users who have trouble determining light levels in a room. Orcam MyEye 1 was unable to read any text that was not in the upright position. In our study, all target text was presented in the upright orientation. Several of the congenitally blind participants expressed unfamiliarity with the typical orientation of text on products (i.e., they did not know if the text on a soup can would appear vertically or horizontally).
Because our participants had only minimal training with the two devices, some of them encountered difficulty in understanding how to align the smartphone camera to the text. This issue is widely recognized for blind people, and studies have examined assistive techniques to potentially overcome the problem (Cutter & Manduchi, 2017). The spectacle-mounted camera used by Orcam MyEye 1 seemed to be more intuitive for some participants. Stands are available for orienting and aligning smartphones for reading text. Although a stand can be helpful in reading lengthy narrative passages, we expect that most blind users of Seeing AI would hold the device by hand for activities such as reading a food label or a restaurant menu
Study limitations
Our study was limited by a small sample of participants, since our goal was to determine if obvious differences in human performance would be observed, and the answer was negative. We tested only nonvisual use of the devices, rather than considering how performance might differ for users with a wide range of visual impairment. For example, people with moderate low vision would likely benefit from the finger-gesture functionality of Orcam and they may have a greater ability to align the iPhone with desired text samples. Both Seeing AI and Orcam have been updated substantially since our study, and their developers may have addressed some of the shortcomings we found.
Suggestions for a larger study design include a wider range of participants, more training with both devices, participants who have sufficient vision to align text but not to read it, and participants without prior experience using an adaptive smartphone. But we believe our modest evaluation established that the two devices have roughly comparable print-to-speech capabilities, apart from the few differences we have described.
Implications
A user’s preference for Orcam MyEye 1 or Seeing AI would likely be influenced by the individual’s background with similar types of technology. Users of iOS devices may prefer Seeing AI, since they may already be comfortable with VoiceOver, and the Seeing AI application would not require them to use another device. On the other hand, those who are less familiar with new technology may prefer Orcam MyEye 1, since it operates with a simple button push or finger gesture and does not require learning the gestural interface needed to operate VoiceOver on a smartphone. However, there is already wide adoption of iOS devices for persons with vision impairment (Crossland, Silva, & Macedo, 2014). For a user who already owns an iOS device and who is familiar with VoiceOver, the use of Seeing AI imposes no additional cost. By comparison, a direct, non-subsidized purchase of Orcam MyEye 1 costs $3,500. This price differential is likely to weigh heavily in the decision-making process of many potential users. For those interested in print-to-speech technology, there are other smartphone print-to-speech applications available at a lower cost than the Orcam MyEye 1, such as the K-NFB Reader (www.knfbreader.com), which retails for $99.99; and the Voice Dream Scanner (www.voicedream.com/scanner), which costs $5.99.
The success of assistive technology relies on simultaneously meeting several constraints, including robustness of functions for the intended purpose, ease of use, the extent of required practice or training, and cost. For purposes of print-to-speech conversion, obvious constraints include the ability of the technology to read text in a wide range of print sizes and fonts, text layouts, orientations and distances, and lighting conditions. The technology also needs to address the challenging problem of nonvisual alignment of the device’s camera on the text. The two devices compared in this report have made substantial progress in making hardcopy text accessible to people with visual impairments.
Footnotes
Authors' note
An initial report was presented at the American Academy of Ophthalmology Annual Meeting, 2018
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: National Institutes of Health Grant EY002934 and the Helen Keller Foundation. The sponsor or funding organization had no role in the design or conduct of this research.
