Abstract
BACKGROUND:
Current guidelines for accessing graphs non-visually are based on giving access to underlying tabular data and reading the values one by one. When data sets are large, it becomes impossible to get an overview.
OBJECTIVE:
This work presents a proof-of-concept of automated audio description of data sets up to 100 data points that can be used by persons with visual impairment or persons who for other reasons are unable to use their visual attention for data access.
METHODS:
A pilot study was conducted to elicit guidelines for oral chart descriptions, after which lo-fi and hi-fi prototypes were designed. Visually impaired and sighted users were involved throughout the process.
RESULTS:
The pilot study pinpointed important issues of oral chart descriptions and provided input for a lo-fi-prototype with three variants. The lo-fi-prototype singled out the most successful way of describing the charts based on which a hi-fi prototype for large data sets was created and tested.
CONCLUSIONS:
The results of the hi-fi tests are promising. Participants listened 1–2 times to the descriptions and were able to discuss details in the data. Thus, the initial guidelines and the following design process provided the necessary information to create a successful proof-of-concept.
Introduction
Accessing graphics is a big hurdle for persons who are blind. Almost everywhere, and in most situations, people are expected to access information in the form of graphics. In some cases, like the constant bombardment of (mostly) unwanted advertisements in the physical environment, newspaper and the web, this might be no great loss. But when it comes to learning and understanding the content and relations of values in graphs and charts it is a skill that is not only needed in school, but throughout life. When data sets are small enough to fit into the short-term memory (the limit, according to Miller [1], being between 5 and 9 items), accessing can be done by tabbing or scrolling through data one at a time. But when graphs become continuous, or charts are set up based on large data sets, understanding content and trends of data based on accessing one value at a time becomes very difficult.
Applications that visualize large data sets are becoming more commonly used, and working with marketing, trends and comparisons is facilitated by well-designed applications. However, the focus on visual design excludes persons with visual impairment from certain kinds of jobs and information that have to do with access to large data sets.
As described below, there are different approaches to attempt to “visualize” (in a metaphorical sense) or rather “perceptualize” graphs and charts to persons who have severe visual impairments or blindness, or are otherwise prevented to understand visual data or use their attention for data access. In the work described in this paper, the goal was to be able to make an automated audio description based on an underlying database with many data points.
Related work
There are several fields of research or practices that could inform how a successful modality transformation can be created to transfer digital graphs or data-sets that can be made into graphs or charts in a non-visual domain. Previous research has investigated how this can be done with sound, haptic and tactile information with different kinds of hardware. In practice, for example in an education setting in school, sound is used, as are tactile paper prints, but also text or audio descriptions.
Tactile paper prints and in advance prepared texts for description of graphs have the drawback of being static, which means that it is impossible to make changes to a graph (such as in mathematics class) and experience how it would affect the data. Similarly, applications for exploring data, understanding trends and relationships between larger sets of data that can be filtered in different ways are still inaccessible.
Different related research areas where the focus is to enable access to graphs and charts, focused on digital domain examples are described below.
Sonification of graphs and charts
Sonification is using non-speech audio to display data. Sonifications of mathematical graphs can be used as an aid in school, for example the Audio Graphing Calculator (AGC) from ViewPlus [2] which is an off-the-shelf Windows application, the vOICe Accessible Graphing Calculator [3] and the Sonification Sandbox [4]. The latter is a tool created at Georgia Tech that does not seem to be updated since 2009. In the context of this work (which was Sweden) such tools are not yet in widespread use.
The principle is relatively simple when it comes to continuous graphs, and was first described in 1985 by Mansur et al. [5]. The x axis is represented by time (or a panning sound from left to right) and the y axis height is represented by the pitch of a tone or musical note. The crossings of x and y axes are also sometimes sonified with for example an extra “clicking” sound, which, according to Smith and Walker [6], enhances the perceivability of the auditory graph.
Interpreting sonified graphs needs to be learned, according to Walker and Nees [7], and they also write that meta information helps the user to understand the graph. The sonification of multiple graphs in the same coordinate system can be understood, and in the article by Brown and Brewster [8] it is shown that two simultaneous graphs that were played in the left and right ear were easier to compare than playing them consecutively. For an example of how the sonification could be designed, the Diagram Center have a page with sample sounds and graphs [9].
Haptic and audio-haptic graphs and charts
As haptic force-feedback devices were introduced in the field of human computer interaction, there was an outbreak of creative ideas to use these to display different kinds of digital information to computer users who have visual impairments. One of these was to create graphs that could be felt with the haptic pen. Sjöström et al. created and evaluated a line graphing tool with haptic feedback in which it was possible to alter variables, resulting in different renderings of a graph depicting a mammal population [10]. The task was to solve an optimization problem by trial and error and 72% of the test persons managed to do that. Yu et al. made initial experiments with only haptics [11], but later added 3D sound [12] to overcome some of the problems they discovered.
Tablet-based audio-tactile graphs and charts
The advent of tablets and the rapid spreading of them, in for example school education contexts has put focus on the accessibility. For persons with visual impairment, the use of tablets is not common, and Andersson and Thorman [13] found that they in general seem to prefer smaller screens like smart phones. However, the possibility to reproduce sound and generate tactile feedback in the form of vibration has made it possible to attempt to perceptualize also graphs and charts on tablets. AudioFunctions uses touch for input and sound for output [14], making it possible to e.g. scan the x axis and listen to the corresponding y value (which varies in pitch corresponding to the height of the y value on the axis). The application can be compared with ViewPlus AGC, and doesn’t seem to enable perceptualization of big data sets. Andersson and Thorman [13] designed charts with less than 10 data points that could either be scanned with the finger on a tablet or smartphone or be tabbed through by swiping. They found that users preferred to tab through the data, as this was the usual manner in which they interacted with their smart phones.
Automated descriptions of graphs and charts
The purpose of audio descriptions or oral interpretation is to give access to visual components of life and society. For example, a movie with audio description has an extra sound track that narrates the context and environments to make the experience of the movie more similar to that of a sighted person [15]. Descriptions of printed graphs and charts are usually provided in, for example, school books, and there are guidelines for how these should be written cf. [9]. The WCAG 2.0 guidelines [16] explain how alternative texts should be used for graphics on web pages, and in order to fulfill a certain accessibility standard, all images that are not for decorative purposes only need to have such texts, including graphs and charts.
Adding a description to an image is still manual work, and therefore, it cannot be applied to dynamically rendered images or illustrations. This leaves out a large body of applications that have to do with the visualization of large datasets or complex data, where the data is fetched on demand from a database. The single values could be queried, but with large datasets that would not be meaningful, as it would be very difficult to get an overview of trends and relations between values or value sets. This was e.g. described by Snow-Weaver and Cragun [17] at the CSUN conference in 2011 and they continued to describe efforts made to research those issues in their company. In 2013, a discussion thread was initiated in the WebAIM forum [18], where someone asked for help and advice to be able to comply with accessibility standards on a system that had 1000 data points. Previously, Ferres et al. described their research system, iGraphLite, which uses a combination of automatic data analysis and natural language processing to create descriptions of data sets. The data sets in their evaluations were relatively small, between 3 and 7 data points, and to be given the whole descriptions, the participants in the test could also navigate the data points by keyboard commands [19]. In [20], Elzer et al. created a plugin for a web browser that converted bar charts into texts. The examples in the paper have less than 10 data points.
Scope
As can be understood by the related work section, there are diverse approaches to perceptualizing graphs and charts non-visually. In this work, the focus has been on investigating the feasibility of creating automatic oral interpretation for large data sets. Therefore, issues regarding the navigation and manipulation of data have been outside the scope of the prototyping, even if these issues have been investigated and discussed with the target group to a certain extent. Comparisons with other perceptualization techniques such as haptics or sonification are also left out of this work.
Method and material
The work has been undertaken in the context of a master thesis study conducted during spring 2015, with a dual emphasis: a user-centered design [21] perspective, and practical design work, conducted in an iterative fashion with several prototype phases, leading to a final prototype with real data from the visualization application Qlik Sense [22].
During the different stages of the process, users have been involved. One of the motivating factors behind the study was to make graphs and charts accessible to persons with visual impairment. Potentially, other users could also be helped by a text description of a chart, like persons who have problems understanding visual information in the form of charts, or persons who need to use their visual attention for other tasks. It is furthermore difficult to recruit many persons with visual impairment. Both sighted persons and persons with visual impairments have been involved in the evaluation phases.
Pilot study – collecting oral interpretations of graphs and charts
In order to understand how someone would construct an oral interpretation of a graph or chart, a pilot study was set up, involving sighted persons. Participants of the study, seven in all, were given graphs on paper that they were expected to describe to a person who had not seen the graph before. The person who listened to the description was, after the description was finished, asked to draw how they had understood the graph. The interaction between the participants was filmed and transcribed, and later analyzed based on eight questions:
What is the first information that is explained? Is the focus on type or content? How are x and y axes described? Are key figures interpreted? Are trends interpreted? How? Are details interpreted? How? At what level? How similar is the drawn graph to the original one? Other observations
Creating an aid to fit into the life of persons with visual impairments requires knowledge about the user group and their context. Methods for gathering user requirements are numerous and diverse, but their main goal is to facilitate conversations with stakeholders [23]. In this case, conducting interviews was chosen, and they were carried out with five persons, three who had low vision and two with blindness. The interview was carried out in a semi-structured [24] manner, but based on a detailed list of issues. The interviewees were recruited through the Swedish Association of the Visually Impaired, the Association of Swedish Visually Impaired Youth, and the local audio newspaper “Skånes taltidning”.
Aside from demographic and context information, the interview was aimed to capture
what computer accessibility tools they used and preferred what kinds of problems they experienced with information access what experience they had with various data visualizations (like mathematics, weather maps, poll results etc.) how they learned about graphs and charts in school how they would prefer to receive complex data information
A Low Fidelity (Lo-Fi) prototype is usually created with pen and paper [25]. It is a visualization of the concept, often merely a sketch, providing a simple and quick way to get a testable prototype of a computer program. A traditional Lo-Fi prototype would be pointless in this case. Instead, an audio prototype was created by recording the synthetic voice Alva (Swedish speech synthesis voice) when it was reading samples of descriptions of chart visualizations. In total, audio description manuscripts for three different bar charts with between 10 and 30 data points were pre-recorded. One of these bar charts is shown in Fig. 1, along with its audio description in text. Note that the text in this phase was written with no automation involved.
Bar chart showing baby animals born. The visual interpretation text (translated from Swedish) was: “This is a bar graph which describes the baby animals born in 2013. The x-axis discloses animal species; the y axis discloses number. There are 10 species, and the number varies between 2000 and 12000. The maximum is held by chicken, the minimum by joey. The mean is 5000. The chart is unsorted. Four bars excel as higher; chicken, duckling, foal and piglet. Joey stands out as the lowest. The other 5 bars are roughly the same height.”.
Bar chart showing number of films per year. The visual interpretation text was (underlined texts are based on the data): “This is a bar chart describing Number of films produced per year. The x-axis shows Year, and the y-axis shows Number of Films. There are 43 x-values, starting with 1931. The final one is 1973. You find the highest bar in the first part of the chart, and it is held by 1940. The minimum is found in the last part, and is held by 1961. The Number of Films varies between 13, and 60, with an average of 32.26. The bars ascend until 1939 and then descend. After that it rises until 1952 and drops. Finally, it ascends until 1967, and then descends.”
Users involved in requirements and evaluation
Scoring of chart description methods. Users gave 2 points to their preferred chart and one to the second best in their opinion
The descriptions of the graphs were written according to results of the analysis in the pilot tests and interviews, but also based on assumptions of what would be possible to extract and automatically analyze from Qlik Sense or a similar tool. The three charts were described in somewhat different ways. In the smallest (Fig. 1, abbreviated BA) key values were grouped and categorized, in the middle (12 points, abbr. UFO) all single values were read and in the largest (30 data points, abbr. CA) the shape was explained in addition to axes and key values described.
The test procedure was similar to that in the pilot session as the participants were asked to listen to the description, and then to recall the chart. Of the five participants, three were sighted, and thus were asked to draw a sketch of how they envisioned the chart, and two had visual impairments and were asked to comment on the chart in words. Finally, all participants were all asked to qualitatively asses which of the different ways of explaining they preferred. They were furthermore asked to rank the different description variants with a point system – giving two points to their preferred description, one point to the second best and no points to the third. This particular method of scoring was introduced ad-hoc to avoid ambiguities in the answers, and force the participants to make a choice.
The Hi-Fi prototype was designed to be able to extract data from a real visualization tool (Qlik Sense [22]). The data was then manipulated to find key values and with the help of regression analysis an approximate shape was calculated, which could be explained in words. Three different description schemes were created that fitted different types of bar chart distributions: chronologic, alphabetical and according to y axis. These description schemes had elements of both the different description variants BA and CA from the lo-fi design. We also took care to limit the amount of values presented, not to cause memory strain.
The example graphs used in the Hi-Fi prototype test had larger data sets than the ones in the Lo-Fi test. The prototype was technically tested with 100 data points and above, but during user testing the sets were between 42 and 45 data points. The sets were explicitly made larger than it would be practical to explore serially one by one.
The final prototype was evaluated by letting three persons listen to the audio description texts from the system, and then discuss the content, but also inquire how they felt that the descriptions could be improved. The number of times a test person felt that they needed to listen to the description to understand it properly was also noted. As the automatic system (and the Qlik tool) is in English, the texts were first translated into Swedish. Of the three test participants, two had low vision and one was blind.
Results
Results are divided in four different categories according to the method sections. Note that the design of the prototypes (both Lo-Fi and Hi-Fi) are in a way also results, but have already been described in the Method and material section.
Pilot study results
Both qualitative and quantitative results were obtained through the pilot studies. The participants presented some successful descriptions while others generated were not as successful. Ten out of 22 depictions (sketches of the graph made by the person listening to the description) resembled the original diagrams and gave a proper overview.
When analyzing the test results some similarities could be found in the descriptions that resulted in well depicted charts:
Description of the shape of the graph Repetition of key values and information about the axes A low amount of values, four to twelve, mentioned in detail Highlighting both maximum and minimum Merging of values in the same category, e.g. days of the week, into chunks
In the explanations that resulted in depictions that differed a lot from the original chart the following similarities were found:
Highlighting all values, one by one Trying to analyze the meaning of values within the diagram
It was clear that the participants paid attention to different key values depending on the type of graphical representation. To focus on maximum and minimum values was common in bar charts. The participants were also tempted to read all values one by one. In scatter plots maximum and clusters were presented in a distinct way and for line charts the participants concentrated the information around maximum, start and end values. There was a noticeable difference in the participants’ approach to the exercise, but it was possible to see similarities between those of equal educational background. The persons having more analytical experience tried to understand the meaning of the graphs and thereby searched for specific values. They also used more technical expressions, while others used everyday language.
In total five persons were interviewed, who had different experiences with magnifying systems and screen readers. Three had low vision and used a combination of magnifying applications and screen readers. They described problems with images, graphs and charts and how the magnifying software, due to the zooming in on a small portion of the information made them loose the overview of images and graphs. One of the interviewees also worked with calculations, and used the Excel software for this, even though the overview was lost. As for the preferences of information display it was clear that it was very individual. One of the interviewees was intrigued by 3D printing technology and thought that it would help in the future. Two preferred sound (text-to-speech, TTS) but also spoke of the possibilities of tactile feedback for touchscreens.
The two interviewees with blindness had somewhat different backgrounds but spoke of their mathematics education as being inadequate to some extent. One of the interviewees, a college student, had later in their studies encountered a pedagogically accomplished teacher who used alternative means and aids to clarify some of the abstract and visual parts, e.g. graphs and diagrams. The interviewees agreed that the teacher’s attitude to problem solving is a significant part of learning. Both interviewees used tactile digital information aside from TTS systems. One of the participants discussed the possibility to use non-speech sound to a greater extent.
Lo-Fi prototype test results
Overall the drawings of the charts created by the sighted participants had a high level of similarity with the original ones that the audio description texts were created from. However, the 30 data point chart CA was distinguished as resulting in better resemblance to the original chart in all tests. When CA was depicted both the values and categories on the axes were correct and the general shape of the bars agreed with the original graph. The second largest (12 data points, UFO) was done close to perfect by one participant, while the others struggled with it. In the description of BA (Fig. 1) only five x-values were mentioned. The participants recollected three and five of these values respectively. All remembered the maximum and minimum.
The participants with visual impairment did not attempt to draw the chart, but instead discussed the descriptions. Their comments were relevant for different design considerations for a future system which would allow manipulation of data. For example, the description which read all individual values (UFO) was considered advanced, but at the same time they asked for possibilities to drill down into interesting areas and make individual choices. The audio description of BA (illustrated by the chart and text in Fig. 1) was considered to give the appropriate amount of information and was presented in an objective but informal way. CA (the 30 data point chart, with some automatic trend information) seemed a bit overwhelming, but still got a good score in the point system.
Hi-Fi prototype test results
Three participants, two with low vision (
“It’s not surprising that the movie production decreased during the 1940s because of World War II.”
Even though the observations indicated a slightly more positive result for participant
Discussion
In his book The Elements of Graphing Data, William S. Cleveland writes:
“The important criterion for a graph is not simply how fast we can see a result; rather it is whether through the use of the graph we can see something that would have been harder to see otherwise or that could not have been seen at all.”
It explains that a graph is not useful unless it shows something that could not have been clearly presented in any other way. Today the common way of explaining graphs for persons with visual impaired is by recounting all values one by one. The first bullet in the STEM Guidelines for graphs [26] is: Bar graphs should be converted into accessible tables. By doing so, a graphical representation does not add any information to a person with a visual impairment. There is no help to gain an overview, and it is also apparent from the examples given, that large datasets are not at all considered in the guidelines.
Large data sets from visualization tools are not considered in the previous work on automatic audio descriptions [19, 20]. Other means of perceptualization research, like haptics and sonification have explored continuous graphs but also bar charts [27]. In the case of haptics, the bar charts seem all to contain fewer data points than 10, see for example Paneels and Roberts [28]. However, in 2014, Moraes et al. [29] studied natural language representation of on-line graphs. Their work has similarities to this proof-of-concept, but is based on images rather than data sets. This means, for example, that in their case it is more difficult (perhaps impossible) to query the data for more detailed information, or choose to access data in alternate ways.
The final summative testing showed that the prototype worked in the desired manner for the involved participants. It was clear that it was a major advantage to have previous training in synthetic speech programs. Most people with normal vision do, for example, not have this experience. Therefore, they would need training to use the synthetic speech and be able to benefit from it. However, as an addition to the visual parts of the program it could work in the user’s advantage, since speech can free the visual attention for other tasks. It could also benefit those with reading disabilities.
All participants claimed to have understood the overall shape and behavior of the graphs. They did not wish for more or less information but thought it was well balanced. In addition to that the person with experience of synthetic speech remembered all the important key values, which we believe further confirms that the amount of information was appropriate.
While only sound can give a means to perceptualize graphs and charts, their display is serial and requires some training. The user also needs to have good hearing, and be able to discriminate well between notes of different pitch. Tactile displays in the form of Braille is used by a smaller portion of the population with severe visual impairment, as it is harder to learn tactile reading of Braille in (late) adulthood. Similarly, fluent reading and understanding of tactile drawings and charts could be hard to learn. Haptic devices are, despite being around for more than two decades, still rare both in education, work and in the home. Thus, they cannot be expected to solve the information problem with graphs and charts. As also apparent in the interviews, people have different preferences for information display and access.
Limitations of the work
The aim of this work has been to develop a proof-of-concept prototype for how large data sets can be made accessible by audio descriptions. It describes a design-oriented work, where investigations are not necessarily carried out to single out the best method to describe charts from a fixed set of approaches, but to widen the design space and find solutions not previously envisioned. Despite this, our approach has at least two methodological weaknesses.
Firstly, the group of participants involved representing the envisioned user group is limited. It would have been beneficial to involve 4–5 users with low vision and 4–5 users with blindness in interviews, lo-fi prototype tests and hi-fi-prototype tests. It would have given more diverse input to the design, as well as clearer arguments for which description strategies were less successful. A larger group would be out of scope for our goal, as this is early conceptual work. The reason for the relatively few participants involved is a combination of time constraints and limited availability of participants with an appropriate profile.
Secondly, the lo-fi-prototypes have different amounts of data points as well as different approaches to the description strategy. This means that the support for the best (according to the lo-fi-prototype test) audio description strategy is weaker that it could have been. The strategies BA and CA also received similar scores in the rating table. For the hi-fi-prototype these two strategies were combined, and this, together with the technical constraints (what was possible to extract and automatically generate from the system) led to three different strategies, based on the type of chart that was fed into the prototype. The strategy that got the lowest rating, UFO, was not used in the hi-fi prototype. However, the possibilities to drill down into single values was requested for further work, which resembles the UFO strategy somewhat (see also below).
Concept ideas for future work
The intention of this work is not to put one solution above the other, but rather explore an under-explored area. The aim of designing a full system should take different modalities for information display into account, and, of course, have some means for manipulating data.
In the context of the visualization application Qlik Sense, a sketch of possible simple entry-points to the data were suggested when it comes to examining the data more closely. The options suggested for accessing and retrieving information were:
Read all x-values (by stepping through with arrow keys or similar) Read all y-values combined with their x-value (by stepping through) Read all y-values exceeding the average Read all y-values that are close to, below, exceeding or exactly the same as a certain value Search for a specific x- or y-value Read key values like outliers, maximum, minimum, start- and end-values Repetition of the audio description
Audio descriptions and tabbing/stepping through the data need not be the only means for accessing it. Ideally, several modes could be chosen from, like for example sonification by tones or continuous audio. Using those kinds of interfaces in real applications intended for work could also create a higher motivation for using similar applications, like [2], in school.
This article describes the design process and the design decisions that led to a Hi-Fi proof-of-concept prototype of an automatic audio description software for large data sets. During the process, a set of guidelines for creating the automatic audio description were created:
Describe the overall shape of the graph Repeat key values and information about the axes Mention a low amount of values, four to twelve, in detail Highlight maximum and minimum Merge values in the same category, e.g. days of the week, into chunks
The final prototype, based on the guidelines, was evaluated by three persons with severe visual impairment. It was found that the information that was extracted and read to the test participants did indeed give them an overview of the bar graphs described. One participant, who was more skilled in using screen reading software, and thus had better training in understanding verbal descriptions and TTS understood the finer details of the chart by just listening to it once.
Although using automatic audio descriptions seem to be a viable way of conveying an overview of bar chart data, it could be combined with entries for detail information and perhaps also overviews in other modalities, like sonification and haptic feedback.
Footnotes
Conflict of interest
None to report.
