Abstract
The goal was to determine the effects of bilingual cigarette warning labels on the recall performance and attention of young bilingual Lebanese college students. Forty-eight students were shown English-only, Arabic-only, or bilingual cigarette warning labels in 2020. Participants recalled as many of the labels as they could after the experiment and then two weeks later. Eye tracking was used to determine attention to the label and subjective data were collected. Results showed that bilingual labels did not lead to better recall; participants needed more time to extract data from bilingual labels and first looked at them later in time, although bilingual labels were revisited more. However, participants believed that bilingual labels were better. It appeared that bilingual labels led to clutter rather than helped recall.
Introduction
Cigarette use remains one of the leading causes of death in the world, killing an estimated 6 million people per year (World Health Organization (WHO), 2020). The WHO Framework Convention on Cigarette Control (FCTC) has established policies and regulations that countries should follow in order to protect their citizens from the harmful effects of cigarettes (Nakkash et al., 2018). Among these policies is the development and enforcement of effective cigarette warning labels that attempt to reduce the number of people who take up smoking or continue to smoke (Cummings et al., 2004; Hammond et al., 2006). In this context, a cigarette warning label includes any image and/or text that is used to warn of the dangers of smoking.
In Lebanon, however, progress has been slow. Around 21% of all deaths in the country are linked to smoking, with an estimated 57 male deaths per week (Drope, 2018). Lebanon signed the WHO FCTC agreement in 2006, but not until August 2011 was a law related to smoking enacted (Nakkash et al., 2018). Despite this positive step, Lebanon remains one of the weakest countries in the Middle East in applying cigarette control policies (Nakkash & Lee, 2009; WHO & Control, 2008). Currently, warning labels in Lebanon are text-only and occupy 40% of the size of the package, as opposed to the 50% recommended by the FCTC (WHO, 2019b).
Moreover, the cigarette warning label situation in Lebanon is further complicated by the issue of language. Lebanon is known to be a country of many languages; while the official language is Arabic, French (recognized status) and English are widely spoken and written (Bacha & Bahous, 2011; Esseili, 2011, 2017; Shaaban, 2017). However, cigarette products in Lebanon contain warning labels in Arabic only, in conformance with FCTC Article 11. This is in contrast with several bilingual or trilingual countries, such as Belgium, Ireland, and Cyprus, that have a requirement that cigarette warning labels should display all the official languages of the country (Houghton et al., 2019).
To date, there has not been any study that directly explored the benefits and limitations of bilingual cigarette warning labels, although there have been studies about bilingual labels for other consumer products (e.g., Bialkova et al., 2013; Lim & Wogalter, 2003). For example, Lim and Wogalter (2006) investigated whether the location of bilingual warnings on pesticide products has any effect on native English and Spanish language users in terms of acceptability and purchasing decision. Results showed that the design of packages with English text on the left half and Spanish text on the right half of the package is the most preferred by both English and Spanish language users. However, it is not clear how bilingual cigarette warning labels affect the short and long-term recall of a bilingual population such as the Lebanese one. There is also a need to more carefully examine how labels attract attention, something that can be accomplished by means of an eye tracker.
The overall goal of this study was to carry out a controlled, eye-tracking based study to investigate the effects of monolingual vs. bilingual text-only cigarette warning labels on young bilingual Lebanese people. To this end, we analyzed the effects of monolingual (English-only or Arabic-only) and bilingual (English and Arabic) cigarette warning labels on young people's 1) ability to recall warning information in the short and long term and 2) their attention to the warning labels. English rather than French was chosen given that it is the language of instruction at the institution where the research took place. We expected that bilingual warnings would lead to better short- and long-term recall performance than having Arabic-only or English-only text warnings, with the presence of more text not negatively impacting performance. In addition, we expected that the amount of attention to the warning labels will be more for bilingual labels than for Arabic-only or English-only labels.
The scope of this study was limited to young, college-age students, who were assumed to be more inclined to read non-Arabic material and who are also possibly still debating whether to smoke or not. Knowing how best to present warning labels to this population will help reduce the number of young people who smoke or take up smoking in Lebanon (Hammond et al., 2018; Skurka et al., 2018). It will also help other bilingual countries who are applying text-only cigarette warning labels, such as Syria and Tunisia (Belazi, 1993; Drope, 2018). Moreover, having bilingual labels might help attract attention to the labels in the first place among young people, given that message avoidance is one of the major issues with cigarette warning labels (Hall et al., 2018). Even though many studies have shown that graphical warnings are better than text-only warnings (e.g., Klein et al., 2017), the fact remains that many countries do use text warning labels (WHO, 2019a). And even when graphics are used, a choice will have to be made in bilingual environments about the language used for the accompanying text.
Methods
Participants
The participants were 48 undergraduate and graduate students from the authors’ institution (see Table 1). They all had to be at least 18 years old and be able to speak, read, and write both English and Arabic. The participants were recruited using flyers attached around campus. Four participants had poor eye tracking data quality and were discarded, making for a total of N = 44 participants. Participants were not compensated for their time.
Summary of participant demographic information.
Experiment Stimuli
Stimuli consisted of a set of color cigarette packages that were created specifically for this study in Adobe Photoshop (see Figure 1). Each stimulus (height: 17 and width: 12 degrees visual angle) consists of a background, brand name, and a warning label that could be in Arabic, English, or both. The warning labels were written in Adobe Arabic Bold 14-point font. The warning was located at the bottom of the package and covered 40% of the package height, in conformance with packages in Lebanon. The backgrounds of the stimuli were selected from the website Cigarette Labels (Tobacco Labelling Resource Centre, 2013) as well as packages obtained from the Lebanese markets. The warning text was taken verbatim in Arabic from the ones available in Lebanon and translated to English by one member of the team, before being verified for accuracy by another member (see Table 2). Both team members are fluent in English and Arabic. Only one warning label available in Lebanon (“smoking kills”) was not used as it was found to be much shorter than the others; otherwise, all other existing warning labels in Lebanon were used.

Sample of experiment stimuli and AOIs on the warning texts; images were in color in the experiment.
Warning labels used in the experiment.
A total of 12 stimuli were created, each with a unique background and warning label. Each of these 12 stimuli had 3 versions: one in Arabic, one in English, and one with both (bilingual). All bilingual labels also had one version with English first, followed by Arabic, and another version with Arabic first, for a total of 48 stimuli. In addition to those experiment stimuli, 14 “dummy” stimuli were used in order to prevent participants from realizing that this study is about cigarette warnings. The dummy stimuli consisted of pictures of non-cigarette related products (e.g., canned food, detergents, etc.) that also contained warning labels. Half of the dummy stimuli were in English and the other half were in Arabic. No data were collected for these stimuli.
Experiment Design
The independent variable was the language of the warning labels (English-only, Arabic-only, and bilingual), which was varied between subjects. Each participant was randomly assigned to one of three groups, with the only consideration being the participants’ gender, which tried to keep balanced across the three groups. As a result, there were 13 participants (six women and seven men) in the English group, 17 participants (nine women and eight men) in the Arabic group, and 18 participants (nine women and nine men) in the bilingual group. There were no significant differences between the three groups in terms of age, smoking status (when combining social smokers, non-daily smokers, and daily smokers), or first language. Participants in each group viewed their corresponding 12 experiment stimuli and the same 14 dummy stimuli in one of two randomly-assigned but fixed orders (the second sequence was the inverse of the first). Participants in the bilingual group were randomly assigned either to Arabic-first or English-first stimuli.
The dependent variables in this study were divided into performance measures, eye tracking measures, and subjective measures. The performance measures included participants’ short- and long-term recall of the 12 warning labels. These were gathered by means of a post-experiment interview (short-term recall) and an email sent two weeks after the experiment (long-term recall).
The eye tracking measures were calculated for each experimental stimulus. The eye tracking data obtained from an eye tracker can be expressed in term of fixations and the rapid movement in between, called saccades (Poole & Ball, 2006). In addition, the time spent by an individual looking at an area of interest (AOI) is known as dwell or gaze time (Meernik et al., 2016). Four eye tracking metrics were calculated in this study using these building blocks: total fixation duration, total number of gazes, time to first fixation, and mean fixation duration (see Table 3). In line with this study's hypothesis that bilingual labels will be more effective and draw more attention than English-only and Arabic-only labels, it was expected that the number of gazes and total fixation duration would be higher for bilingual labels than for monolingual labels, while time to first fixation and mean fixation duration will be shorter.
Summary of eye tracking metrics used in the study.
The metrics were calculated using experimenter-set AOIs that targeted the warning labels (see Figure 1). The size of the AOIs was equal for all stimuli, with two AOIs in the bilingual group, each traced around one warning label. Each of the two AOIs in the bilingual group was the same size as one AOI in the English or Arabic group (i.e., there was no wasted white space between the English and Arabic labels) and they were combined for the analysis. Finally, the subjective measures, which were collected by means of a post-experiment debriefing questionnaire, were used to explore people's opinions of warning label language.
Experiment Setup
Participants were seated at around 60 cm from the monitor. A Tobii X3-120, desktop-mounted and infrared-based eye tracker was placed underneath the monitor and used to record the eye movement of the participants at a sampling rate of 120 Hz and accuracy of 0.4 degrees visual angle. The eye tracking data were analyzed and extracted using iMotions software.
Experiment Procedure
Participants were informed that the aim of the study is to examine the benefits and limitations of different product and label designs so they would not make a conscious effort to memorize the cigarette warning labels. Participants were told that they will be shown different package designs and labels for 10 s each and that they have to assess them as they will have to evaluate them at the end. The time limit was based on what previous studies have used (Kessels & Ruiter, 2012; Munafò et al., 2011). Next, the eye tracker was set up and calibrated using a nine-point grid, after which the actual experiment started. A set of crosshairs were displayed at the center of the screen in between images, and participants were asked to focus on the crosshairs to ensure a common gaze position at image inception. Participants were shown the next image immediately after the 10-s limit elapsed; participants could not choose to go faster through the images.
After viewing all stimuli, participants were then asked to complete an oral post-experiment interview to check how much they recalled from the cigarette warning labels. The interview was done by the experimenter and started by asking participants to verbally list all the cigarette warning labels they could recall. Both full and partial recalls of the warning labels were considered as recalled labels. It was enough for participants to mention one of the words in the warning label, apart from “smoking”, for that warning label to be counted as recalled. For the bilingual group, a label was considered to be recalled if they remembered it in English or in Arabic. The intervew results were noted by the experimenter at the time of the interview.
Participants were then debriefed about the main purpose of the experiment and asked to confirm that they agreed for their data to be used. If not, participants were excused at this point and their data were not used. If they consented, participants were asked to fill a post-experiment questionnaire about their smoking behavior, preferred language, and thoughts about bilingual warning labels. Finally, participants were asked if they are ok with being contacted at some later point for some follow-up questions. If they consented, participants were contacted two weeks after the study by email and asked to recall as many of the warning labels as they could (exactly as in the interview). The whole experiment took around 25 min.
Results
Analysis Approach
The results were analyzed using a one-way ANOVA to determine the main effects of the type of warning label: English-only, Arabic-only, and bilingual labels. The dependent measures (both performance and eye tracking) were averaged across all experimental stimuli for each participant. Analysis was done in IBM SPSS version 20. Bonferroni-adjusted post-hoc tests were used for pairwise comparisons. Normality was checked using a Shapiro-Wilk test and visual inspection of a normal Q-Q plot. Only the email recall performance failed both of these tests, in which case a non-parametric Kruskal-Wallis H test was carried out. All participants consented to the use of their data and to be contacted by email two weeks after the experiment, but only 33 participants responded to the email.
Recall Performance
Short-term recall performance
There was no significant difference in the mean of total short term recall rate, F(2, 41) = 6.902, p = .089. Participants could fully remember an average of 2.41 (SD = 1.44), 2.62 (SD = 1.31), and 3.12 (SD = 1.45) warnings out of 12 in the English, Arabic and bilingual groups, respectively.
Long-term recall performance
The results of the Kruskal-Wallis test showed that there was no significant difference found in the mean of total number of long term recalled warnings between groups χ2 (3) = 2.426, p = .297. The total email recall was an average of 1.22 (SD = 1.09), 2.08 (SD = 1.62), and 1.83 (SD = 1.19) warnings in the English, Arabic and bilingual groups, respectively.
Eye tracking metrics
Total fixation duration
There was no significant difference between groups, F(2, 41) = 5.09, p = .157. The total fixation duration was 201.0 (SD = 94.8), 191.5 (SD = 60.5), and 286.1 (SD = 110.86) milliseconds in the English, Arabic and bilingual groups, respectively.
Number of gazes
There was a significant difference in the mean number of gazes between groups, F(2, 41) = 23.38, p < .001, ηp2 = .533. Bonferroni-adjusted post-hoc tests revealed significant pairwise differences between the bilingual group and each of the English and Arabic groups (both p < .001; Figure 2), with the bilingual group having the highest number of gazes (6.05 gazes), followed by the Arabic (3.49) and then the English (3.27) group.

Mean number of gazes on health warning labels for the three groups.
Time to first fixation
The mean time to first fixation showed a significant difference between groups, F(2, 41) = 16.961, p < .001, ηp2 = .453. Bonferroni-adjusted post-hoc tests revealed significant differences between the bilingual group as compared to the English (p < .001) and Arabic (p < .001; Figure 3), with the bilingual group having the longest time to first fixation (5068.47 ms), followed by the English (2538.26) and then the Arabic (1997.28) group.

Mean time to first fixation on health warning labels for the three groups.
Mean fixation duration
The mean fixation duration showed a significant difference between groups, F(2, 41) = 4.422, p = .018, ηp2 = .177. Bonferroni-adjusted post-hoc tests showed a significant difference between the Arabic and bilingual groups (p = .016; Figure 4), with the bilingual group having the longest mean fixation duration (167.46 ms), followed by the English (117.40) and then the Arabic (87.94) group.

Mean of mean fixation duration on health warning labels for the three groups.
Subjective Results
There were no significant differences between the performance of smokers and non-smokers, or between those who claimed English or Arabic as their first language. From the questionnaire, it was seen that 64% of participants thought that blinigual labels were the most effective, 7% thought English-only were best, 9% preferred Arabic-only, 4% believed that it depended on the user, and 16% thought that the language did not make any difference. However, most participants (57%) thought that cigarette warning labels do not affect people's decision to buy cigarettes, and only 7% thought they do. The rest were not sure. At the same time, 84% of the participants believed graphical cigarette warnings would be more effective than text-only warning labels; 9% were not sure if they would be.
Discussion and conclusion
The aim of the study was to analyze the effects of monolingual and bilingual cigarette warning labels on young people's ability to recall warning information in the short- and long term as well as on their visual attention using eye tracking metrics. It was expected that bilingual warnings would lead to better short- and long-term recall performance than having Arabic-only or English-only text warnings, with the presence of 2 languages (and, as a result, more text or clutter) not negatively impacting performance and rather increasing attention allocation to the warning labels.
In terms of recall performance, the results did not reveal any significant differences or clear pattern. In general, the recall rate across all types of recall was very low, which may be attributed to the relatively large number of warning labels and dummy stimuli. On average, participants could only recall around two or three warning labels in the short term, and then only one or two in the long term. Thus the hypothesis about bilingual labels was refuted in the case of performance. It did not seem that bilingual warning labels provided any benefits in terms of better recalling warning labels.
However, the eye tracking metrics provided further insight into what happened at the level of attention allocation. In general, the eye tracking results confirmed the notion that bilingual labels do not provide benefits in terms of diverting people's attention more to warning labels. There was no significant difference in total fixation duration, meaning that even with exactly double the AOI size, participants did not look at the bilingual warning labels for a significantly longer period of time. What did happen is that there was a significantly higher number of gazes to the bilingual labels, suggesting that participants revisited the labels several times, going back and forth between the labels and the package image. Combined with the fact that mean fixation duration was significantly longer for bilingual labels, this suggests that participants would several times visit the labels, struggle to extract information, and then switch back to the package, without ever spending a long period of time on any one label. In addition, the time to first fixate the bilingual labels was longer than for monolingual ones, as if participants delayed looking at the bilingual labels, perhaps because the presence of more text was less appealing. All of these results suggest that the bilingual labels put participants off rather than attracted them, and made it more difficult for them to extract information, even though the font type and size were exactly the same as the monolingual labels. These results once again refute our hypothesis about bilingual labels and confirm what we found with the performance results. The findings are also in line with research on clutter in warning labels, where people paid less attention to more cluttered labels (Bialkova et al., 2013; Wogalter et al., 1991). It does not seem like bilingual text is any different than other forms of clutter (Moacdieh & Sarter, 2015); more text, in whatever language, led to poorer attention allocation.
Moreover, it is interesting that when asked about what language(s) they thought would be best for warning labels, most participants said that Arabic and English together would be best. This is similar to a previous study on bilingual labels (Lim & Wogalter, 2006). where participants preferred to have both languages. It would seem that people's subjective preferences are not aligned with their actual performance and attention, providing further evidence for the use of objective measures such as performance and eye tracking in investigating warning labels.
In terms of intellectual merit, this study fills a gap in the literature on the benefits and limitations of bilingual warning labels and established that they resulted in clutter in this context, rather than tangible benefits. In terms of broad impact, the findings of this study suggest that Arabic warning labels could be the best choice for the Lebanese population, even the young population that ostensibly favors everything in English (this despite Arabic being officially most people's first language). This can potentially extend to other bilingual contexts as well, especially those in the Arab world. The addition of graphic labels, as also suggested in the survey, would be something that is likely to improve the recall rate and attention to labels, as established in previous studies (Klein et al., 2017). The results also sound a warning about necessarily expecting multi-lingual warning labels to simply multiply the benefits of one warning label; at best, it seems there is no effect, and the space required would probably be better off being used for an image. It would appear to be best to push for graphical images to be added on Lebanese cigarette packages, with only Arabic text accompanying that image, rather than bilingual text.
Further research will look into the design of warning labels in other contexts, such as alcohol warning labels. In addition, further research can address the limitations of this work. First, this study was limited to college students, who represent only a small subset of the young Lebanese population, although we can safely assume that they represent a subset that heavily favors English. However, the sample size was smaller than planned due to the lockdown imposed by the COVID-19 pandemic. Future studies will look to expand on this subset. Second, future studies will make sure to recruit equal numbers of smokers and non-smokers in order to provide a rigorous comparison of the differences between the 2 groups, as the population in this study was mostly non-smokers. A smoker would likely have different attention allocation patterns and recall rates than a non-smoker. Third, in this study, participants had to be debriefed before receiving the 2 week recall email. This may have affected the recall rate, so future experiments could delay the debriefing until after participants respond to the email. Another limitation is that the way that the packages were displayed on a screen was not realistic; however, given the specifications of the eye tracker this is the only way it could be done.
Footnotes
Acknowledgments
The authors would like to thank Firas Bahsoun for his help with running the study, as well as all of the participants who volunteered their time.
Author's Note
Reem Jalal Eddine is also affiliated at Faculty of Human Kinetics, University of Windsor, Windsor, Ontario, Canada. Nadine Marie Moacdieh is also affiliated at School of Computer Science, Carleton University, Ottawa, Ontario, Canada.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
