Abstract
Objective:
The aim of this study was to investigate the impact of illiteracy on physical health and mental health.
Design:
Matching methods (nearest neighbour matching, Mahalanobis metric matching, and propensity score matching).
Setting:
Elderly people at least 65 years old in 22 provinces of China.
Methods:
The analysis used data from Chinese Longitudinal Healthy Longevity Survey (CLHLS). The independent variable was a dummy variable, which was coded as 1 for illiterate or semiliterate, and 0 for literate. Dependent variables were indicators of physical and mental health derived from the survey results. Matching methodologies controlled for confounding variables including age, sex, living sites, access to tap water and financial support.
Results:
Illiteracy was found to have a significant impact on physical health, exercise habits, anxiety, loneliness and happiness. On average, illiteracy decreased physical health by 19.9%, decreased exercise habits by 7%, increased anxiety by 11.56%, increased loneliness by 17.6% and decreased happiness by 11.3%.
Conclusion:
Findings confirm the past literature in which illiteracy has been found to be adversely associated with physical and mental health. The analysis uniquely found that illiteracy had a higher cost on mental health as compared to physical health for elderly people in China.
Introduction
According to the United Nations Educational, Scientific and Cultural Organisation (UNESCO), although the rate of illiteracy has been steadily decreasing in the past decades, there are still 773 million illiterate adults worldwide, most of whom are women (UNESCO, 2020). Amidst collective efforts to achieve universal literacy, the Chinese government is making progress towards the goal of illiteracy eradication. National Census data have shown that the number of illiterate people in China has fallen from 182 million in 1990 to around 55 million in 2010 (National Bureau of Statistics of China, 2010). However, most illiteracy eradication efforts, including the implementation of the nine-year compulsory education in 1986, target young people aged below 24 years old (Ministry of Education, 2020). The illiteracy rate among Chinese elderly people is still high. This paper focused on life quality of Chinese people aged 63 years and older to investigate the impact of illiteracy on health.
Past literature has established a positive association between education and health using both theory and empirical evidence. In line with the demand for health model, more educated people are theoretically more efficient producers of health since they demand better quality and more medical care (Grossman, 1972). Grossman (2015) reviewed past quantitative studies and found that many studies reached consensus that years of schooling are strongly correlated with good health. The causal linkages between schooling and health include health knowledge, fertility choices, and unhealthy habits, such as cigarette smoking, lack of exercise and excessive alcohol consumption. Apart from these explicit mechanisms, schooling also influences health condition indirectly through higher income, more comprehensive skill sets and accumulated social capital (Berkman et al., 2014).
Illiterate people, who generally represent the population with the shortest years of schooling have been found to have lower health status and poorer use of healthcare services than their more educated counterparts in multiple studies (Berkman et al., 2011; DeWalt et al., 2004). Being illiterate restricts one’s access to health knowledge. Specifically, illiteracy can lead to a lack of health literacy. According to Brabers et al. (2017), health literacy is ‘the personal characteristics and social resources needed for people to access, understand and use information to make decisions about their health’. In the USA, a significant association has been found between inadequate health literacy and poorer physical and mental health (Wolf et al., 2005, 2007). A multivariate US study found that individuals with inadequate health literacy had 2.7 times the odds of being depressed and that poor physical health was associated with depressive symptoms (Gazmararian et al., 2000). A study with under 40-year-olds in China found a strong and positive relationship between illiteracy and schizophrenia with a 2.08 odds ratio (Liu et al., 2013). The past literature has also identified associational relationships between illiteracy and health; however, more research is needed to answer whether these associations are due to causal mechanisms or reverse causality.
Methods
Data source
Observational data from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) of Parent-Child Dyads, conducted in 2002 and 2005, respectively, was used in this analysis (Xiao et al., 2019). The CLHLS surveys mainly collected interview data about physical health, mental health and quality of life from elderly people aged 65 years old or older in 22 provinces of China. The survey results provide information on baseline characteristics, including the aged population’s socioeconomic characteristics, family, lifestyle, and demographic profile. The main outcome of interest in this research is physical and mental health, as shown in respondents’ health conditions, daily functioning, self-perceptions of health status, quality of life, life satisfaction, mental attitude, and feelings about ageing. This dataset contained 4,240 observations and 2,957 variables. The second wave in 2005 followed up with the same sample group from the first wave in 2002, but suffered a loss of data due to some individuals passing away in the intervening period. This analysis mainly used cross-sectional data collected in 2002 to conduct matching analyses, with the exception of the outcome variable health deterioration, which reflected changes in the person’s health condition between 2002 and 2005.
Variables
In total, 4,240 Chinese residents aged 65 years old or more participated in face-to-face interviews. The 2002 wave has a complete set of data (n = 4,240), whereas the 2005 wave (n = 2,435) lost contact with 12.5% (k = 531) and suffered a death rate of 30.0% (k = 1,274) among original participants. CLHLS survey participants were asked: ‘What kind of educational qualification did you gain finally?’ – ‘illiterate or semiliterate’, ‘elementary school’, ‘junior high school’, ‘senior high school’, ‘technical secondary school’, ‘junior college’, ‘undergraduate’, ‘graduate’ or ‘PhD’ The Chinese population above 63 years old has survived the Great Leap Forward, the Henan Famine and the Cultural Revolution. During those difficult times, obtaining higher education was almost a luxury and illiteracy is not uncommon among members of this age group. Among the 4,240 participants in the CLHLS data, around 25% (k = 1,041) of the sample (n = 4,240) reported being illiterate or semiliterate. This literacy indicator is the independent variable in the matching ordinary least squares analyses, with the illiterate treatment group being coded as 1 while the literate control group is coded as 0. This division is appropriate because matching method requires a larger control group than treatment group to ensure efficient matching (Chan, 2019).
To provide a more holistic health indicator, physical health is represented by a combination of self-reported health and interviewer-rated health (physical health = self-reported health + interviewer-rated health). CLHLS participants were asked ‘How do you rate your own health?’ — ‘very good’, ‘good’, ‘so so’, ‘bad’, ‘very bad’ and ‘not able to answer’. As shown in Table 2, the self-reported health variable was coded in the range -2 to 2. Moreover, CLHLS interviewers were asked to rate the health status of interviewees as ‘surprisingly good’, ‘relatively good’, ‘moderately ill’, ‘very ill’ and ‘missing’. As shown in Table 3, the interviewer-rated health variable was coded from a range of -2 to 2. The mean interviewer-rated health was 1.076 while the mean self-reported health was 0.467. Since self-reported health may suffer from cognitive biases and individual differences, a combination of self-reported health and interviewer-reported health yielded a more holistic indicator for physical health. Another indicator called health deterioration depicted changes in health by comparing the difference between health status in 2005 and health status in 2002.
In addition to physical health, illiteracy has been found to be negatively associated with mental health. With rapidly decreasing illiteracy rates in modern China, literacy is an assumption taken for granted while illiteracy is almost equivalent to a disability in most social settings. Among elderly people, illiteracy may result in difficulties in daily shopping, in signing for mail packages, when ordering restaurant food, or even finding toilets if no graphics are displayed. Uncomfortable encounters due to illiteracy may cause frustrations, anxiety or even depression. Using a modified version of the Geriatric Depression Scale (Greenberg, 2012), the participants were surveyed on the frequency of ‘feeling anxious, lonely, useless with age, and as happy as younger’ by choosing from ‘always’, ‘often’, ‘sometimes’, ‘seldom’, ‘never’, and ‘not able to answer’ instead of a numeric value. As shown in Tables 4 to 7, these frequencies were coded into integer numbers while ‘not able to answer’ is coded as NA.
The indicators of physical and mental health are listed in detail in Tables 1 to 7. Other outcomes of interest include edentulism (toothlessness), death, cancer, number of serious illnesses in the past 2 years, smoking habits, alcohol drinking habits and exercise habits. Through those indicators, the impact of illiteracy on physical health, mental health and lifestyle habits was examined.
Literacy indicator.
Self-reported health indicator.
Interviewer-rated health indicator.
Anxiety indicator.
Loneliness indicator.
Feeling useless with age indicator.
Feeling as happy as younger indicator.
Analysis
Matching is an statistical tool used to ensure a similar distribution of covariates in treatment and control groups. In research contexts where randomised experimental data are lacking, matching is a promising means of reducing confounding biases in observational data. In this study, three matching methods were used: nearest neighbour matching, Mahalanobis matching and propensity score matching. Information with regards to age, sex, area of residence (rural, city) and ethnicity were included as potential matching covariates. Income covariates were accounted for using responses to the survey question ‘do you have access to tap water’ and ‘do you have enough financial support at the moment’. By controlling these identified covariates, it was possible to reduce the effect due to background differences between treatment and control groups. In Table 8, a Welch Two Sample t-test is used to test the hypothesis whether the covariates’ means were significantly different in literate and illiterate groups. All except ethnicity were found to be statistically significant. Thus, in the following matching analysis, ethnicity was deleted as a matching covariate.
Welch two sample T-test on whether the covariates’ means differ in the literate and illiterate groups.
p-value smaller than .05.
The analysis was conducted using R version 4.0.0, with the MatchIt R package for matching. Since MatchIt does not work with incomplete data, 879 observations with missing values were removed from the dataset, reducing the total sample size from 4,240 to 3,361. Specifically, illiterate group size decreased from 1,041 to 925 and literate group size decreased from 3,199 to 2,766. Table 9 summarises the main characteristics of the independent, dependent, and confounding variables.
Summary statistics after removing missing values.
Physical health
The effects of illiteracy on physical health were examined using outcome variables including physical health, physical health deterioration from 2002 to 2005, exercise habits, and smoking habits. Before matching, box plots assisted the visual detection of differences between the literate and illiterate groups (Figures 1-3).

Box plot for physical health.

Box plot for physical health deterioration.

Box plot for physical health: exercise habits.
Mental health
The effects of illiteracy on mental health were examined using outcome variables including anxiety, loneliness, feeling of uselessness with age and relative happiness levels compared to their youth. Before matching, box plots assisted in the visual detection of differences in literate and illiterate groups (Figures 4-7).

Box plot for mental health: loneliness.

Box plot for mental health: anxiety.

Box plot for mental health: uselessness.

Box plot for mental health: happiness.
Nearest neighbour matching
Nearest neighbour matching is one of the most frequently employed methods to conduct causal inferential matching (Rubin, 1973). In nearest neighbour matching, the treatment group stays constant while the control group is partially discarded to ensure a more accurate k:1 matching (Chan, 2019). The nearest matching method matched 829 treatment group members with 829 control group members on variables including age, sex, area of habitation (rural or urban), access to tap water and financial support. The regression results of nearest matching are shown in in Table 10. Illiteracy significantly decreased exercise habits by 6% (p = .004). In addition, illiteracy significantly decreased physical health by 20% (p = .012). The counter-intuitive result was that illiteracy decreased reported rate of cancer by 4.3% (p = .09). Compared to impact on physical health, illiteracy had a more significant and large-scale impact on mental health. Specifically, illiteracy increased anxiety by 11.3% (p = .009), increased loneliness by 17.9% (p = .0001), increased useless feelings by 10.9% (p = .037), and decreased happiness by 1.34% (p = .03).
Nearest matching results.
Significant Codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
Mahalanobis metric matching
The Mahalanobis metric matching uses Mahalanobis distance as a distance criterion to judge matches. Using the statistician suggested calliper of 0.25 (Rosenbaum and Rubin, 1985), 829 treatment groups were successfully matched with 829 control groups. The regression results of Mahalanobis matching are displayed in Table 11. Illiteracy significantly decreased exercise habits by 10.3% (p = 1.51e-6). In addition, illiteracy significantly decreased physical health by 21.1% (p = .007). The effects of being illiterate on reported cancer were shown to be not significant. Compared to its impact on physical health, illiteracy had more significant and large-scale impacts on mental health. Specifically, illiteracy increased anxiety by 12.4% (p = .0039), increased loneliness by 18.6% (p = 18.19e-5), and decreased happiness by 19.2% (p = .00178).
Mahalanobis metric matching results.
Significant Codes: 0 ‘***’ 0.001 ‘**’
Propensity score matching
The regression findings for propensity score matching are shown in Table 12. Illiteracy significantly decreased exercise habits by 4.8% (p = .019). In addition, illiteracy significantly decreased physical health by 18.6% (p = .0193). The effects of being illiterate on reported cancer were not statistically significant. Compared to its impacts on physical health, illiteracy had more significant and large-scale impacts on mental health. Specifically, illiteracy increased anxiety by 10.98% (p = .0108), increased loneliness by 16.4% (p = .00044) and decreased happiness by 13.5% (p = .0276).
Propensity score matching results.
Significant Codes: 0.001 ‘**’ 0.01 ‘*’
Discussion
All three matching methods identified a significant impact of illiteracy on physical health, exercise, anxiety, loneliness and happiness at α = .05. On average, illiteracy decreased physical health by 19.9%, decreased exercise habits by 7%, increased anxiety by 11.56%, increased loneliness by 17.6% and decreased happiness by 11.3%. These findings therefore align with the existing literature on the relationship between illiteracy and health, but do so with respect to elderly people in China. Importantly, illiteracy seemed to have a higher mental health cost as compared to its effects of physical health. This result calls for more support for mental health care for elderly people in China.
Limitations
This study has four main limitations. First, the analysis undertaken was cross-sectional in character which limits the attribution of causation. Second, the sample size was reduced during the analysis when missing values were removed from the data. Third, the indicator of physical health was not an objective measurement, but the sum of two subjective measures: self-reported health and interviewer-reported health. Last but not least, although matching is a promising tool for causal inferential research, this method is not perfect. With respect to nearest neighbour matching, one limitation is that the matching order can harm the matching quality (Stuart, 2010). Despite this limitation, the quantile-quantile plots in Figures 8 and 9 demonstrate high-quality matchings. Propensity score method requires the treatment assignment to be ignorable in order to compute unbiased estimates, a condition that is practically difficult to achieve in most cases (Thapa, 2015). Matching quality is also sensitive to researcher-defined parameters, such as the propensity score specification. Moreover, propensity score matching requires large sample sizes and substantial overlap between the treatment and control groups (Shadish et al., 2002). The quality of Mahalanobis matching depends on the number of covariates. In general, statisticians suggest including less than eight covariates (Rubin, 1979; Zhao, 2004). Since this study used five covariates, the matching quality should not be harmed.

QQ plots for nearest neighbour matching.

QQ plots for nearest neighbour matching.
Footnotes
Acknowledgements
I thank my parents and friends for their support while developing this paper. Special thanks go to Wendy Chan for research support and to Robert Boruch for encouragement to publish. I also thank Akash Pallath for support with editing and proofreading.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
