Abstract
Visuals are often used to enhance learning of scientific information. The recent emergence and popularity of comic-style instruction books for adults, such as the ‘manga guide to …’, shows the possibility of comic style visualisations for the communication of science with adults. This study investigates whether the addition and style of visual accompaniment of scientific information, as used in comic books, influences immediate and short-term fact recall in an adult audience. Participants (n = 310 aged 18–79, 52% identified as female) were presented 20 general science facts in one of five styles: (1) text alone, (2) photo with text caption, (3) cartoon with text caption, (4) photo with explanatory agent and a speech bubble, (5) cartoon with explanatory agent and a speech bubble. Immediate recall, and confidence in that recall, was tested following a brief distractor. Participants indicated their preferred presentation style, and short-term recall was tested by a final quiz of all 20 facts. Overall, the most preferred presentation style was cartoon with explanatory agent and text in a speech bubble (26% preferred). There was no single most effective presentation style; there was no significant difference in immediate recall, short term recall or confidence in answers depending on whether the fact was presented as text, photo or cartoon, or the presence or absence of an explanatory agent. However, immediate recall was significantly better when preference was met (p < 0.02). We found that the style of visual accompaniment of scientific information in accordance with the ‘manga guide to…’ format influenced immediate, but not short-term, fact recall in an adult audience when written English literacy, scientific literacy and non-verbal intelligence were taken into account. Short term recall of scientific facts may best be served by presenting facts in multiple styles, or enquiring about and then meeting participant preference for visual accompaniment.
Introduction
There is a long history of using graphs, photographs and illustrations to communicate scientific information, and there is similarly a long history of assessing their impact.1,2 In scientific communication, these visual formats can be used to either visually represent accompanying textual data, or visualise information in a new way that reinforces the text.3,4 Extensive research attention has been paid to the ideal characteristics and application of visuals to enhance scientific education for children.1,5–8 However, the recent emergence and popularity of comic-style instruction books such as ‘The manga guide to …’ shows the possibility of comic-style visualisations for the communication of science with adults. It remains to be established if the illustration visual style favoured in the ‘manga guide to…’ format is more, less or equally successful in promoting learning in adults. The degree to which visual elements common to this format, such as information being presented by speech bubbles and character interaction (rather than as figure captions), has not so far been explored. This study fills this gap. This study sought to clarify whether the use of comic-style visuals alongside and integrated with textual scientific facts had an effect on immediate and short-term recall in adults, and whether the nature of the relationship between visuals and text influenced recall. This study has implications for how visuals are used in educational materials to support scientific learning in adults.
Literature review
Highly visual educational books for adults have been produced in the past decade, such as the ‘manga guide to …[statistics/calculus/physics etc]’, which consists of textbook information presented in comic book format (often with a Japanese ‘manga’ aesthetic), intended as a supplement to education at a tertiary level (Google ngram results for ‘manga guide’, accessed 10/08/2020 https://books.google.com/ngrams/graph?content=manga+guide&year_end=2019&year_start=1800&smoothing=3&corpus=26&direct_url=t1%3B%2Cmanga%20guide%3B%2Cc0#t1%3B%2Cmanga%20guide%3B%2Cc0). Research on the utility of this format has focussed on how humour, narrative and visual representation together affect learning. 6 A regularly used characteristic of the comic book format is the inclusion of an ‘explanatory agent’, defined here as a human or anthropomorphic animal or object that communicates to other characters or the reader via speech or thought bubbles. It is ambiguous whether the inclusion of an explanatory agent in visuals, accompanied by shifting explanatory text from caption to speech bubbles, impacts information recall in adults. Related evidence suggests an explanatory agent can enhance engagement (hence encoding) of information. For example, students have a preference for the lecturer being visible in online or recorded lectures. 9 Furthermore, anthropomorphising to introduce topic-specific explanatory agents has been a key tool in engagement with non-human topics such as robotics. 10 Another characteristic of the comic book format is the use of illustrations to depict both the substantive content and explanatory agent. This is at odds with the prevalence of photographs, rather than cartoons, in formal scientific textbooks. 11
Both photographs and illustrations can equally be used to decorate, motivate or visually represent salient elements of a concept.10,12 It is typically assumed that photographs can provide useful reference for concepts that are physically observable (e.g. the pattern of wing motions in a bird’s flight can be directly viewed), providing ecological validity by depicting the physical reality of an object (albeit a depiction curated by the photographer, as discussed by Berger. 13 Conversely, illustration is particularly useful in depicting concepts that cannot be directly experienced (e.g. atomic relationships too small to see), 5 and present only the elements of the visual scene that are relevant. 14 Photos and illustrations can be thought of on a spectrum from realism to more abstract iconography. 15 If ‘style’ is defined as in Moere et al. 16 to refer to an abstract concept that relates to how a visualisation is recognised and may be categorised, photographs and illustrations of varying complexity can be considered different ‘styles’ of visual.
Learning is not a unitary concept. It consists of acquiring new knowledge, understanding, skills, behaviours, attitudes, preferences or values, 17 and in an educational setting involves focus on attention, engagement, knowledge retention and recall. Recall is defined here as ‘the mental process of retrieval of information from the past’. 18 This is distinct from comprehension, which is the process of meaning extraction. 19 Though recall and comprehension share several mutually reinforcing processes, 20 this manuscript focusses primarily on recall to reflect the necessity of recall to demonstrate the occurrence of learning in education and experimental settings. In its simplest form, a test of recall involves presentation of novel information of some kind (encoding), some passage of time (storage) and then retrieval. A classic real-world example are the exams conducted in schools, designed to test the recall of content taught in class, an area that has received much attention in the education literature (e.g. see 12 ).
Recall is related to, but distinguishable from working memory. 21 Working memory serves as a highly short-term ‘buffer’ of information encoding on the order of seconds, 22 primarily allowing the manipulation of information. Working memory and recall share similar cognitive processes (e.g. central executive function) 23 and neural substrates (e.g. prefrontal activity).24,25 However, working memory consists of multiple non-overlapping components specific to different information modalities (phonological loop, episodic buffer and visuo-spatial sketchpad), 26 and relies more heavily on the dorsolateral prefrontal cortex than short-term recall). 27 Compared with recall, working memory is therefore more amenable to disruption from attending to other information immediately following presentation of a stimulus, 28 and the alignment between the form of presented information and the distracting stimulus is more pertinent to performance. 29
There is substantial evidence that the nature, reliability and cognitive process of memory changes with the amount of time between encoding and recall. 21 Similarly, although all recall relies somewhat on attention and engagement and can be facilitated by comprehension, 19 there is differential performance as the time between coding and recall is extended. Here, we use ‘immediate recall’ to refer to a minimal passage of time (<1 min) between encoding and recall, and ‘short-term recall’ to refer to recall between 1 and 30 min from encoding. ‘Long term’ recall on the order of months and years is widely considered, particularly in the neurodevelopmental 30 and educational 31 literature, but is beyond the scope of this manuscript.
Immediate and short-term recall can be distinguished on the basis of differential underlying cognitive processes, manifest in differential performance relating to how information is presented, and in particular contextual cues. 32 Immediate recall is more closely tied with cues specific to the information presented, such as the exact wording of text accompanying an image. 33 Conversely, short-term recall relies more on an individuals’ experience relative to the information presented, as it allows the use of ‘convergent’ retrieval where cues from one memory are used to assist recall from another, 33 similar to connectionist architecture, spreading activation and mapping processes in comprehension as described in McNamara et al. 19 This can improve recall, or damage it if information is paired with an incorrect or unrelated memory. 34 Anglin and Stevens 35 experimentally manipulated the use of images accompanying text in teaching university students scientific facts and procedures, and concluded that the use of images improved immediate recall (test immediately after materials were studied), but not delayed recall (test at 28 days).
This highlights the importance of considering individual characteristics when exploring learning and recall.1,36 Broadly, four key individual characteristics salient to recall pertinent to topics such as scientific facts are reading proficiency (general and scientific), non-verbal intelligence, prior knowledge and preference. Where information is presented in text, reading proficiency in that language – here, English – is a prerequisite to access the information. In particular, learning scientific information and reasoning requires proficiency beyond general use and interpersonal communication. Conceptualised by Cummins 37 , ‘cognitive-academic language proficiency’ focusses on the ability to read technical materials such as textbooks, and, particularly important for the testing of recall, the ability to produce appropriate responses to written test questions. At least in children, cognitive academic language proficiency in particular is associated with scientific learning and consequent fact recall. 7 This should not be conflated with intellect or learning capacity, as one can score poorly on written tests but nonetheless be highly intelligent. 38 Nonverbal intelligence, measured by tools such as Raven’s Progressive matrices, 39 is therefore a useful complimentary measure to English language proficiency to account for variability across participants in learning capacity. Prior knowledge is likely to be linked with cognitive-academic language proficiency and nonverbal intelligence, which both can pose limitations to the attainment of a knowledge base. For example, the presentation of information in the context of a comic book provided recall benefits specifically in poorly performing students in grade 10 (median age of 15). 6
Cognitive fit theory proposes that individual performance in engagement, recall and comprehension may differ according to the ‘fit’ between how information is conveyed (e.g. text, visuals, graphs or combinations thereof) and their individual cognitive characteristics, as well as the nature of the information conveyed or task undertaken. 40 If the nature of the information and/or task is held constant, the likelihood of ‘fit’ can somewhat be inferred from both performance, and an individual’s stated preferences. 36 This is similar to the learning styles hypothesis, which posits different individuals learn in different ways, and that learning is facilitated when information delivery matches that individual’s learning style (e.g. visual learner learns most effectively when information is presented. 41
Building from the design maxim ‘attractive things work better’, 42 the benefits of accompanying text with visualisation is particularly salient in scientific textbooks.43,44 Relevant imagery is generally beneficial for recall.41,45 The chief explanation for this is dual coding theory, which postulates that cognition (a subset of which is learning) is characterised by a word-focussed symbolic/verbal system, and an image-focussed sensitometer/visual system. This theory holds that information presented via both pathways (e.g. text accompanied with a visual) will be better encoded and recalled than information presented by one pathway. 12 Other theories suggest different mechanisms for improved recall performance when information is presented via text (and tested via written means). The first is learning-test congruence, which suggests test performance is highest when the modality in which information is learned matches the modality in which it is tested. 46 The second is that people tend to preferentially attend to text: eye tracking studies of text in the context of comics for communication of scientific facts has indicated significantly longer dwell time on text than visual elements. 47 While the potential utility of relevant imagery is clear, it remains unclear what constitutes ‘relevant’ imagery, because recall is also linked with contextual cues not necessarily linked with substantive informational content, such as narrative 48 and humour. 49
It is unclear why ‘relevant’ imagery is apparently beneficial to recall of scientific information, 41 yet comic book formats that include an explanatory agent and cartoon ‘style’ are particularly popular. It might be that the perceived benefits of that irrelevant imagery, (visual content that is not semantically related to informational content) are actually due to style. Visuals that include an explanatory agent are typically simple illustrations or cartoons, which have the benefit of simplified visual scenes containing only relevant information, while it is rare to encounter such an explanatory agent in science textbooks. This could be disambiguated by comparing preference and recall performance when an explanatory agent is included in both illustrations and photographs, or removed altogether. Another possibility is that the explanatory agent’s role is purely to increase the reader’s enjoyment, which in turn may facilitate encoding and recall independently of informational content. 50 If preference maps to cognitive fit,36,40 adults who generally prefer visuals including an explanatory agent should exhibit superior recall compared with when the agent is not present. Simpler styles of illustrations, such as cartoons, bootstrap recall on enhancing engagement, motivation and ‘fun’ even in absence of humour. 51 Indeed, gaze tracking studies have found that simpler illustrations promote better learning and recall. 52
Purpose of current study
This study investigated whether the addition and style of visual accompaniment to textual information influenced immediate and short-term recall of scientific facts in adults, when controlling for cognitive-academic language proficiency, nonverbal intelligence and general scientific literacy. In particular, we conducted an experiment on the influence of visual elements common to the ‘manga guide to…’ format by testing (1) whether there are any differences in recall accuracy if a visual is presented alongside text, (2) whether a photographic or cartoon style visual influences recall and (3) whether the presence or absence of an explanatory agent (person) has any effect on recall accuracy. The scope of this study was to replicate the ‘manga guide to…’ format, rather than test the informational or semantic content of visuals accompanying information. We focussed on recall because it is a fundamental, readily testable aspect of learning. The time between encoding and recall can impact the degree to which cues adjacent to information may impact performance. 21 The role of visuals as a cue has been well established when contrasting short and long-term recall, 33 but it is unclear whether immediate and short-term recall performance may differ on the basis of how visuals are designed.
Method
This section describes how the study was designed and executed. We begin by describing how the pilot was used to develop, evaluate and refine the experimental protocol. After outlining the results from repeated pilot tests, we outline the methods of the main study, including development of the materials, the participant cohort, stimulus set, measures used and study procedure.
Pilot study: Choice of questions about scientific facts
This research started with a study to pilot and refine the questions that would form the basis of testing scientific fact recall in the main study. A series of scientific facts was first compiled and a series of questions about them developed. We then conducted two rounds of pilot testing to calibrate the difficulty of these questions, aiming to identify questions about scientific facts that were not general knowledge. Other aspects of the main study protocol were not pilot tested.
The number of questions required for the main study was chosen to balance the needs of statistical testing against the burden on participants. A total of 20 scientific facts gives each participant four exposures to each of the five alternative presentation modes. In order to serve their purpose as core stimulus for immediate and short-term recall of scientific information, the majority of these questions needed to be novel to the majority of participants. To achieve this, we developed new questions analogous to the PEW research centre Science and Technology quiz (henceforth PEWST), presented in a modified CLOZE procedure. As in Porter, 53 this is a response format where an incomplete sentence was accompanied by five possible completion options. An example is: The _________ is the formal name for a flower’s stalk (with the possible options of peduncle, filament, xylem, style and propposite). Topic area was also considered, aiming for questions from a variety of physical and natural sciences. A result of 20% correct, or lower, indicates that a scientific fact is not general knowledge, that is, less than the one in five chance of randomly guessing the correct answer. Questions were suitable for the main study if the general knowledge was low – allowing greater scope for measuring scientific fact recall in the main study. However, we aimed to retain some variability in question difficulty, including a mix of ‘easier’ and ‘harder’ questions to avoid poor data quality that can eventuate when participants become frustrated (e.g. see 54 ).
These pilot tests were undertaken via online self-report questionnaire on the Qualtrics platform by adult participants from Australia, the United States of America, the United Kingdom, New Zealand and Canada who were recruited via online crowdsourcing website, Crowdflower. Participants disclosed age, gender, first language, most used language in the past year, country of current residence and highest level of education. While data was recorded for all participants, only those who endorsed the item ‘I took this survey seriously, and answered every question to the best of my ability’ and self-reported either first or most used language as English were include in the analyses.
Participants were presented with all facts in a single block, in randomised order. The order of possible CLOZE responses was also randomised. Different individuals completed each round of pilot testing and the main study, to ensure that the first presentation of the CLOZE procedure was the first time participants had encountered the presented facts.
Pilot 01 results
Thirty-eight respondents completed the first questionnaire and data from 11 was disregarded on the basis of poor survey engagement or their reporting of not using English as a first language. The included 27 participants were aged 18–57 years (mean 32 years), 70% were female, and 48% had a university education. Candidate facts and their performance in pilot testing are exhaustively listed in Supplemental Table S2. Briefly, the average percentage correct was 64%, indicating most were too easy or familiar.
Pilot 02 results
As a result of the first test of the scientific facts, all statements were reviewed and either entirely replaced, or updated to difficult variants (e.g. The part of the flower that makes pollen is the
The overall average percentage correct remained high (34%). A selection of 20 scientific facts was made for the final study (Supplemental Table S2), for which the average percentage correct was 25%.
Main study
Development of materials
Materials to present each of the 20 piloted scientific facts in five style variants were prepared (e.g. Figure 1): text only, text caption with cartoon, text caption with photograph, cartoon with explanatory agent, and photograph with explanatory agent. The photographs were selected from open source repositories such as Wikimedia Commons, on the basis of key words included in the scientific facts (e.g. ‘clouds’). These images were chosen to be relevant to the scientific fact, but not to visually communicate the information independently from the text. Photographs of the person used as the explanatory agent were taken by author EW for the purposes of this study. Four variants reflecting common poses in ‘manga guide to…’ comics (eye contact with the viewer, cropped at the waist, one finger pointing at the background stimulus) were variably used for the purposes of visual variety. The explanatory agent was superimposed onto the photographs and a text bubble including the scientific fact was added. Cartoon versions of each were created by author EW. To perceptually match the photograph and cartoon as much as possible, EW directly traced the photographs for lines and used the colour picker tool for colour fills. For the full list of facts and their variants, see Supplemental Table S3.

Example of the five style variants used to present each scientific fact: (a) text alone, (b) text accompanying cartoon, (c) cartoon with explanatory agent, (d) text accompanying photograph, and (e) photograph with explanatory agent.
Participants
Adult participants from Australia, the United States of America, the United Kingdom, New Zealand and Canada were recruited via online crowdsourcing website, Crowdflower, to complete an online self-report questionnaire via the platform Qualtrics, in return for a small monetary incentive. The ethical aspects of this research were approved by the Australian National University human research ethics committee (protocol number 2014/553). While data was recorded for all respondents, only data that met the following inclusion criterion were retained for analysis: (a) genuine engagement, satisfied by completion time of 5 min or more, and endorsing the item ‘I took this survey seriously, and answered every question to the best of my ability’; and (b) English Language proficiency, identified by self-reported residence in Australia, the United States of America, the United Kingdom, New Zealand or Canada and self-reporting English as either a first language, or most spoken language. Of an original 385 respondents, data from 65 were excluded on the basis of criterion (a), and none on the basis of criterion (b), resulting in a final sample size of n = 310.
Measures
Demographic measures
Participants reported their age, gender, first language, the language they had used most in the previous year, their country of current residence and the highest level of education they had attained.
Covariates
A brief battery of items were included to allow insight into participant’s individual capacity to understand written text (English language proficiency), prior scientific knowledge (PEWST), and nonverbal capacity for induction and learning (Raven’s progressive matrices).
English language proficiency CLOZE procedures correlate strongly with norm-referenced English language tests and language proficiency interviews. 55 Accordingly, English language proficiency was measured via a modified CLOZE procedure where participants were invited to complete a simple paragraph by selecting a correct word from a drop-down menu (Supplemental Table S1). This was scored between zero (no words correct) and five (all words correct). Science literacy was quantified with the PEW Research Centre Science and Technology quiz, which consists of 13 multiple choice questions about a wide range of science topics under the groupings of ‘science in the news/daily life’ (scored 0–5) and ‘Textbook science’ (scored 0–8), where higher scores indicate better scientific literacy. 56 Nonverbal intelligence was quantified by a subset of six images from Raven’s progressive matrices. 39 This is a problem-solving, abstract reasoning task where participants are presented a short sequence of images that change in a systematic way, and are then asked to select one of six possible images that would logically complete the sequence. This has a possible score of zero (implying lower nonverbal intelligence) to six (implying higher nonverbal intelligence).
Participant confidence
Confidence in responses was measured via a visual analogue scale slider, where 0 was ‘Not at all sure (just a guess)’ and 100 was ‘Completely sure (know it is correct)’.
Participant preference
All participants responded to this question: ‘You’ve probably noticed that we have been telling you about the scientific facts in different ways. Please select the option you preferred’.
Participant engagement
All participants responded to this question: ‘To help us analyse your results, we need to know how seriously you took this survey. Please be honest – your answer here will not affect any incentive you will receive for completing this survey’.
Procedure
Participants completed all measures via Qualtrics, an online survey platform, following the procedure outlined in Table 1. Note that, rather than the more traditional approach of asking demographic and covariate questions in a single block, they were mixed and split into 20 segments of approximately equal length (e.g. age + two PEWST questions; years of education and one of the Raven’s progressive matrix tasks, etc.). These were interleaved between fact presentation and immediate recall. This was done to prevent participants engaging in verbatim mental rehearsal (the temporary storage and manipulation of information), as doing so would result in data more indicative of working memory capacity than immediate recall. 28
Overview of main study procedure.
Reported time to complete is mean and standard deviation in brackets.
Data inclusion
Questions that individuals correctly answered pre-test were excluded from subsequent analysis for that fact, for that participant. Chi square (χ2) tests were used to check whether these exclusions resulted in an unbalanced use of the five presentation styles across participants.
Statistical analysis
A series of linear models were fit to investigate whether the method of scientific fact presentation were associated with immediate recall, short-term recall, and confidence in immediate recall. Mixed effects (also known as hierarchical) linear regression was applied to properly account for between-fact variability, such that all predictors and responses (level 1) were nested by fact (level 2). All models controlled for PEWST, cognitive-academic English proficiency scores, and Raven’s progressive matrix tasks. Short-term recall additionally controlled for whether immediate recall was correct or incorrect. Alpha was set at 0.05, with Bonferroni correction for multiple comparisons reducing this to α = 0.02, hence 98% confidence intervals.
Mixed effects models were fit using the lme4 package, 57 with glmer function and binomial link for fact recall accuracy (correct/incorrect) and lmer function for confidence (continuous variable). Model assumption testing is described and reported in Supplemental Table S4, and Figure S1.
Sensitivity analysis
Three sets of sensitivity analyses were conducted: basic models with the key predictor only (to allow the greatest possible predictive power), models with Raven’s progressive matrices and English proficiency scores treated as ordinal variables (due to their restricted range, see 58 ), and models where individuals with cognitive-academic English language proficiency scores <5 were excluded (to further remove bias from poor comprehension).
Results
Participant characteristics
Participants were aged between 18 and 79, and 52% (n = 161) identified as female. The majority (43%, n = 133) currently resided in the United States of America, followed by the United Kingdom (30%, n = 93), Canada (22%, n = 68) and the remainder residing in Australia or New Zealand. All participants had completed at least a high school education, and over half (63%, n = 195) had completed one or more years at university. They demonstrated mixed scientific literacy as indicated by the PEWST (M = 10, SD = 3, range 0–13). Similarly, Raven’s progressive matrices indicated a range of nonverbal intelligence (M = 3, SD = 2, range 0–6). The majority (79%, n = 244) of participants scored perfectly (5/5) in the basic English language proficiency measure. All participants were included in the main analysis, but sensitivity analyses were conducted where only those with a perfect 5/5 score were included.
Data inclusion
Pre-test accuracy indicated participants were generally unfamiliar with the 20 facts (within participants M = 5, SD = 3, range 0 to 18 correct; for further detail see Supplemental Table S2). Removal of data for facts correctly answered at pre-test from the dataset resulted in a total of 4547 answers across the 310 participants for subsequent analysis, balanced well across presentation variants (text only n = 908, text caption with cartoon n = 899, text caption with photograph n = 916, cartoon with explanatory agent n = 908, and photograph with explanatory agent n = 916). There was no significant association between style and whether or not a given fact was excluded from analysis χ2(4) = 2.51, p = 0.54).
Hierarchical linear model results
A summary of hierarchical linear models can be seen in Figure 2, with full model coefficients reported in Supplemental Table S5. PEWST, cognitive-academic English language proficiency and Raven’s progressive matrix scores were significantly positively associated with correct recall and confidence across all conditions.

Summary of hierarchical linear model results.
There was no significant difference in immediate recall, short-term recall or confidence in answers related to whether the fact was presented as an image (rather than text), a cartoon (rather than a photo) or a text caption (rather than an explanatory agent). Immediate recall was substantially higher than pre-test accuracy (M = 18, SD = 2, range 3–20 correct), falling slightly for short-term recall (M = 17, SD = 3, range 4–19 correct), demonstrating sufficient variability in learning and forgetting for subsequent analysis.
Participant preference for a particular presentation variant was, from most to least preferred: cartoon with explanatory agent (26%), ‘I did not really prefer any option’ (21%), ‘Anything with a picture’ (15%), text only (12%), text caption with cartoon (10%), text caption with a photograph (8%) and photograph with explanatory agent (8%). Model fit tests indicated assumptions were met for immediate and short-term recall (Supplemental Table S4 and Figure S1), and that models had conditional R2 ≈ 0.6 (able to detect medium to large effects, see Bosco, Aguinis). 59 Heavy skewing in confidence ratings effect (M = 92, SD = 17, range 0–100) led to a ceiling effect which destabilised aICC (0.01), model dispersion, and power (conditional R2 ≈ 0.05), indicating conclusions relating to confidence should be interpreted with caution.
Where preference for a particular presentation variant was not met (e.g. those facts presented in photos to a participant with a stated preference for text), immediate recall was statistically significantly worse (β ranging −3.89 to −0.54, corresponding to up to four more facts recalled incorrectly), and confidence was significantly lower (β ranging −1.5 to −1.8, corresponding to an approximately 2% drop in confidence) than where preferences were met. Unmet preference was not significantly associated with short-term recall, and there were no significant differences in recall or confidence between those for whom preferences were met, and those who expressed no preference. All results were generally consistent across sensitivity analysis (Supplemental Tables S6–S8).
Discussion
This study investigated whether visual accompaniment of scientific information influences immediate and short-term fact recall in an adult audience and if that is influenced by different visual styles. We found no evidence of a particular benefit to scientific fact recall arising from any tested aspect of the ‘manga guide to…’ format; there was no systematic difference in fact recall depending on whether information was presented alongside a supporting image, whether that image was photograph or cartoon, and whether that image included an explanatory agent. This is distinct from several prior studies, which found a particular visual accompaniment improved information recall in comparison with text alone.35,60 We also did not find a clear preference for any particular presentation variant. However, alignment of visual presentation with participant’s stated preference for presentation variant did significantly improve immediate and immediate fact recall. Our findings are supported by prior studies indicating that individuals both prefer and are more confident in technical information conveyed by visuals accompanied by text), 61 and that confidence in information recall is positively associated with attitude to the style of information delivery. 62
Given prior evidence that immediate recall is more sensitive to contextual cues, 33 it was expected the different styles of visual accompaniment would influence immediate recall to a greater degree than short-term recall. The high accuracy and confidence at immediate recall, and subsequent lower accuracy at short-term recall suggests participants used working memory at immediate test. Following Baddeley’s model of working memory, 26 engaging in such temporary mental rehearsal relies on cognitive processes that are more sensitive to the nature of distraction tasks than the procedure of information encoding, storage and retrieval inherent in recall. For example, a verbal distractor is less effective at disrupting a visual working memory task, but more effective at disrupting a verbal working memory task. 29 The phonological loop is implicated in sentence comprehension, 63 so is the most likely component of working memory participants used in the current study.
It is likely that the distraction tasks did not prevent participants using working memory because they were either insufficiently difficult (demographic, English language proficiency, scientific knowledge) or of a sufficiently different modality (the Raven’s progressive matrices used here rely on the visuo-spatial sketchpad. 64 . If this is the case, then results could be taken to suggest that the style of information presentation impacted working memory (rather than immediate recall), possibly due to the irrelevance of visuals to the capacity of the phonological loop. This account would align with previous findings that in the context of comics, people spend longer looking at text than illustrations. 47 Further, these results indicate that future work seeking to understand the impact of visual accompaniment on scientific information my find it fruitful to either explicitly focus on working memory, or short-term recall rather than immediate recall.
Given the predominance of scholarly studies suggesting that visuals of varying kinds in combination with text should promote superior recall to text alone,6,16,35,65 it was surprising that we found no such effect. This discrepancy may be due to the predominance of research examining hypotheses generated by the dual coding theory, which comment on information presented across the modalities of symbolic/verbal and sensorimotor/visual. This paradigm implicitly focusses research attention on comparing two or three conditions only – text alone, visuals of some style and perhaps a pairing of text and visuals.6,16,35 In this study, we contrasted recall performance and preference across multiple combinations of text and different visual styles. This greater number of conditions allowed different spatial relationships between text and visuals across the styles (e.g. as a caption, or within a speech bubble). Generally, attention and recall is most pronounced when the text is presented as close as possible to the corresponding visual. 66 Eye-tracking studies from the advertising literature take this further, suggesting that text within a visual (e.g. a speech bubble) enhances attention to both visuals and text, possibly by physically pairing them in the visual field. 67 There is considerable scope for future examination of whether the physical proximity between text and visuals may alter how they are perceptually processed in relation to the dual coding theory.
In this study, the most preferred style was ‘cartoon with an explanatory agent’, this mirrors the real-world popularity of ‘manga guide to…’ format. Yet, results were unhelpful in disambiguating whether visual simplicity (cartoon versus photograph) or presence of an explanatory agent were more impactful for promoting recall because neither factor was unambiguously associated with recall performance. It is, however, useful to consider that the most popular visual style was not associated with better recall. This comports with a recent review of eleven studies, where Farinella (2018) noted no difference in scientific knowledge acquisition from text or comic book formats, but an almost universal preference for the comic format. 45 This supports the notion that irrelevant imagery, that is content that is not semantically related to the content, such as the inclusion of an explanatory agent, is preferable for reasons that do not directly benefit recall. It is unclear what these reasons may be, as non-substantive visual content such as narrative 48 and humour 49 have been linked with enhanced recall.
Prior research indicates marked individual differences in preferences and styles of learning, 68 and there is evidence this manifests in clear preferences for particular styles of information presentation. 1 Yet, preferences in the current study were diffuse. The most popular – cartoon with an explanatory agent – was endorsed by just over a quarter of the participants, and a large number of participants indicated no preference for any visual style. If this reflects a genuine phenomenon, the reasons for this are unclear, beyond the extremely unlikely possibility that our sample by chance included a group with atypically heterogeneous preferences. It is instead more likely due to methodological differences between prior work and the current study. Unlike our findings, several studies have clearly indicated majority preference within their sample, ascertained by an ipsative paradigm where a preference must be stated. 1 Forcing a choice when there is genuinely no preference can lead to nonsensical responses that fit well into statistical models, but do not map onto the participant’s reality. 69 A useful model that future research could adopt is the preference rating in Moere et al., 16 which deconstructs preference into perception of varied aspects of a visualisation style (e.g. a series of questions rating factors such as ugly/beautiful on a scale of 1–5), or as in Luo, 36 which rates preference for each visual style separately on a scale of 1–7.
Aligning with both cognitive fit theory and the learning styles hypothesis, a variety of information presentation methods may be useful for recall of scientific facts in adults, the key factor being a match between information presentation and a particular individual’s predisposition for learning.40,41 While stated preference is not synonymous with learning predisposition, style, or preference, there is evidence that an individual’s preference for a particular style of visualisation is linked with cognitive style. 36 Causality remains unclear – it is possible people prefer a particular information presentation style because it facilitates memory, or conversely that it facilitates memory because they prefer it.
This study had a number of notable strengths, as well as limitations. To our knowledge this is the first examination of visual accompaniment of scientific information on recall in adults where participants were presented with multiple methods of combining text and images. This approach allows robust insight into within- and between- participant variability, resulting in findings that reflect individual characteristics that would be lost in simpler between-person analyses. Repeated pilot testing and exclusion of correctly answered facts at pre-test minimised bias from prior domain-specific knowledge of facts. A further strength was the design of the visual stimuli for comparison between photographs and illustration; the illustrations were developed directly from photographs, removing bias from mismatches in salient characteristics such as content, colour, and complexity. However, due to the focus on exploring elements of the ‘manga guide to…’ format, we did not explore whether specific visuals semantically reinforced or contextualised the presented facts. This lack of semantic connection may explain the lack of significant improvement in any recall when images were paired with text. Further, we did not ascertain whether the visual content of the styles was consistently perceived (e.g. abstracted cartoon clouds may not be as clearly cloud-like as a photograph, hence may prove a less salient cue for a fact relating to clouds). We also did not explore intermediate forms of abstraction in representation between photograph or illustration, or inherent information richness of visuals (as discussed in McCloud S and Manning 15 )). Both of these shortcomings could be addressed in future work through the generation and pilot testing of stimuli according to some measure of semantic distance, considered here as the closeness between a visual and the meaning it holds. This could be achieved through the qualitative techniques outlined in Silvennoinen et al. 70
Unfortunately, variability within and between participant recall was somewhat limited by high accuracy at immediate and short-term test, which may have led to our missing trends within the data (type II error). Our focus on immediate and short-term memory has limited ecological validity for application in adult education settings, where long-term recall is of more interest. Similarly, due to how recall was tested, our results are only generalisable to rote memorisation via CLOZE procedures. Further, though comprehension relates to similar processes as recall (e.g. prior knowledge, mapping and memory constraints per McNamara et al., 19 the current study provides no insight into whether the presented facts were understood, rather than rote memorised.
Conclusions and next steps
In conclusion, we found that the style of visual accompaniment of scientific information in accordance with the ‘manga guide to…’ format influenced immediate, but not short-term, fact recall in an adult audience when written English literacy, scientific literacy and non-verbal intelligence were taken into account. Recall was enhanced when the participant’s preference for presentation variant was met, but there was no clear majority preference across participants. This indicates that short-term recall of scientific facts in adults may be best served by enquiring about, and then meeting participant preference for visual accompaniment. Future research should investigate how to proactively map out respondent preference, in order to most effectively use visuals and text for conveying scientific information.
Supplemental Material
sj-pdf-1-ivi-10.1177_14738716211027587 – Supplemental material for Not just a pretty picture: Scientific fact visualisation styles, preferences, confidence and recall
Supplemental material, sj-pdf-1-ivi-10.1177_14738716211027587 for Not just a pretty picture: Scientific fact visualisation styles, preferences, confidence and recall by Erin I Walsh, Ginny M Sargent and Will J Grant in Information Visualization
Footnotes
Acknowledgements
The authors would like to thank Dr. Elizabeth Huxley for granting us permission to use her likeness as our ‘explanatory agent’, and patience in the photography session.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
