Evaluating Reporting Heterogeneity in Self-Rated Health Among Adults Aged 50 Years and Above in India

Abstract

Objective: To use anchoring vignettes to evaluate reporting heterogeneity (RH) in self-rated mobility and cognition in older adults. Method: We analyzed vignettes and self-rated mobility and cognition in 2,558 individuals aged 50 years and above. We tested for assumptions of vignette equivalence (VE) and response consistency (RC). We used a joint hierarchical ordered probit (HOPIT) model to evaluate self-rating responses for RH. Results: The assumption of VE was met except for “learning” vignettes. Higher socioeconomic status (SES) and education significantly lowered thresholds for cognition ratings. After correction for RH, women, lower SES, and older respondents were significantly more likely to report greater difficulty in mobility. The influence of age, SES, and education on thresholds was less apparent for cognition. Discussion: Our study provides strong evidence of RH in self-rated mobility and cognition. We highlight the need to formally test basic assumptions before using vignettes to adjust self-rating responses for RH.

Keywords

reporting heterogeneity differential item functioning mobility cognition self-rating anchoring vignettes India

Introduction

The global self-rated health (SRH) question “in general, how would you rate your health today?” and similar questions on functional ability, for example, mobility (“overall in the last 30 days, how much difficulty you had with moving around?”), cognition, sleep, self-care, and so forth, are used in national surveys to assess overall health and subjective well-being. The SRH question is known to predict health outcomes such as disease and death, independent of sociodemographic and known risk factors (DeSalvo, Bloser, Reynolds, He, & Muntner, 2006; Frankenberg & Jones, 2004). Such questions assessing functional disability ask respondents to rate their perceived level of health as discrete responses such as none, mild, moderate, severe, and extreme. An inherent problem with self-rating is that respondents from different populations may use response categories differently (Brady, 1985). Two persons or groups with identical functional ability may rate their functional limitation differently on the discrete scale based on their understanding of the categories and their expectations of their functions. This difference in reporting behavior referred to as response-category differential item functioning (DIF; King, Murray, Salomon, & Tandon, 2004) or reporting heterogeneity (RH; Bago d’Uva, Van Doorslaer, Lindeboom, & O’Donnell, 2008). DIF/RH has been studied largely in developed countries, across sexes (Grol-Prokopczyk, Freese, & Hauser, 2011), socioeconomic strata (Dowd & Zajacova, 2007), race and ethnicities (Menec, Shooshtari, & Lambert, 2007; Shetterly, Baxter, Mason, & Hamman, 1996), and countries (Jurges, 2007; Lindeboom & van Doorslaer, 2004; Murray, Tandon, Salomon, Mathers, & Sadana, 2002; Zimmer, Natividad, Lin, & Chayovan, 2000). Such RH, unless identified and adjusted for, can result in misleading and incorrect interpretations when comparing self-rating responses (Banks, Marmot, Oldfield, & Smith, 2007; King et al., 2004).

“Anchoring vignettes” is a promising strategy to overcome the problem of RH in survey questions (Murray et al., 2002; Tandon, Murray, Salomon, & King, 2003). Anchoring vignettes are brief texts describing a hypothetical character who exemplifies a certain fixed level of the trait of interest. The respondent is asked to rate the level of the trait for the vignette character as he or she would do for his or her own. Anchoring vignettes technique has been increasingly used in the last decade to improve interpersonal and cross-cultural comparability of survey questions in areas of political efficacy, work disability, job satisfaction, life satisfaction, health and health system responsiveness (Bago d’Uva, O’Donnell, & van Doorslaer, 2008; Bago d’Uva, Van Doorslaer, et al., 2008; Christensen, Herskind, & Vaupel, 2006; Hopkins & King, 2010; Kapteyn, Smith, & Van Soest, 2007; King et al., 2004; Kristensen & Johansson, 2008; Rice, Robone, & Smith, 2010). The application of anchoring vignettes requires two assumptions—vignette equivalence (VE) and response consistency (RC). VE assumes all respondents understand the level of trait represented in the vignette in the same way apart from random measurement error. RC assumes that each respondent uses the same response-category thresholds for rating vignettes and for rating the self-assessment question. Earlier studies have used informal checks such as looking for rank inconsistencies in ordering of vignette severity or non-parametric methods requiring fewer assumptions such as testing for systematic difference in vignette rankings to evaluate these assumptions (King & Wand, 2007; Kristensen & Johansson, 2008). Newer developments in hierarchical ordered probit (HOPIT) modeling techniques allow use of information from anchoring vignettes to adjust self-rating responses for RH (Bago d’Uva, Lindeboom, O’Donnell, & van Doorslaer, 2011; Datta Gupta, Kristensen, & Pozzoli, 2009; Rice et al., 2010; Van Soest, Delaney, Harmon, Kapteyn, & Smith, 2007).

The World Health Organization (WHO) Study on global AGEing and adult health (SAGE) implemented by the International Network for the Demographic Evaluation of Populations and Their Health (INDEPTH Network) aims to improve understanding of health and well-being of older adult and elderly populations in low- and middle-income countries. In this article, we use the Indian data from SAGE to evaluate the problem of RH in self-rated mobility and cognition.

Method

Ethics Statement

The SAGE was approved by the King Edward Memorial Hospital Research Center Ethics Committee, Pune, India and the Ethics Review Committee of the WHO, Geneva. All respondents participated in the study after an informed written consent.

Study Data

Our analysis was based on data from the first wave of the shortened version of the SAGE survey conducted in 2007 among older adults aged 50 years and above in a rural population under health and demographic surveillance system (HDSS) in Vadu, India (Kowal et al., 2012). A simple random sample of 6,000 individuals out of 14,749 adults aged 50 years and above was generated from the HDSS database for enrollment into SAGE. The short version of the SAGE questionnaire asked respondents to grade their ability to perform tasks in eight functional domains of health (mobility, self-care, pain, cognition, relationships, sleep, affect, and vision) in the preceding 30 days. Each domain included two self-rating questions, one for a lower and another for a higher level of functional ability. Respondents were randomly allocated to four groups and each group was administered a vignette set comprising two domains (affect/mobility, pain/relationships, care/cognition, and sleep/vision). Following the self-reported difficulties, a total of 10 vignettes in two functional domains were administered to a respondent at random. The vignettes for the two function domains in a set and the severity order of the vignettes were administered in random order. The names of the hypothetical persons in the vignettes were chosen so as to be of the same sex as the respondent and be locally and culturally appropriate. Before administering the vignettes, respondents were advised to think of the hypothetical person’s experience in the vignette as if they were their own. After each vignette, the exact same questions asked for self-assessment were then asked in the context of the hypothetical person in the vignette. For all assessments of self and vignette characters, the respondent was asked to rate on a 5-point ordinal scale of increasing difficulty (no difficulty, mild, moderate, severe, extreme difficulty). For this article, we used anchoring vignettes to evaluate RH in mobility (physical health) and cognition (mental health) for which objective measures required to test for assumption of RC were available (see the appendix).

To assess mobility and cognition more objectively, physical and cognitive tests were administered to a random subset of respondents. The time taken (seconds) to walk 4 m at normal and rapid speed was measured. Handgrip strength (kilograms) was measured separately for both hands using Smedley’s hand dynamometer. Word recall was tested both immediately and after a short delay during which other cognitive tests were performed. The average of the number of correct words recalled from a list of 10 words from three trials was taken as the score for the word recall test (maximum possible score 10). The length (number of digits) of the longest series of digits repeated by a respondent without error was taken as the score for the forward and backward digit span test (maximum possible score 9). The maximum number of animals correctly named by the respondent in 1 min was taken as the score for the verbal fluency test. All cognition test scores were rescaled from 0 to 1 with a higher score reflecting better performance. The SAGE survey questionnaires were translated into the local language and pre-tested. Local graduates were trained in administration of the questionnaire and conducting physical and cognitive tests using a detailed manual developed by the WHO. A random sample of 10% subjects was re-tested for quality assurance. All interviews, and physical and cognitive assessments were conducted in the respondent’s homes.

The SAGE data were linked to the HDSS database to include sociodemographic variables. Age was categorized as 50-59, 60-69, and 70+ years. Education was categorized into three groups: primary or less, secondary, and more than secondary. As part of HDSS, socioeconomic status (SES) of all households had been separately assessed 2 years earlier based on the Indian National Family Health Survey (NFHS) wealth index where the facilities (e.g., toilet, electricity, drinking water source, etc.) and physical assets (e.g., land and livestock ownership, household appliances like refrigerator, etc.) of each household were recorded, each asset was weighted with a factor score generated through principal components analysis, and the resulting asset scores were standardized and then summed to produce a wealth index score for each household. All households in the area were then assigned into quintile groups based on the wealth index score.

Statistical Methods

We based our vignette analysis on the statistical approach proposed by Tandon et al. (2003) and Bago d’Uva et al. (2011). If we assumed that the discrete response category (y_i = k) that the respondent i chose to best describe his or her own health was generated from an underlying continuous latent variable $y_{i}^{*} ~ N (μ_{i}^{s}, σ^{2})$ , then the probability of observing response category k conditional on covariates X could be specified by a standard ordered probit model (Equations 1a and 1b) as follows:

y_{i}^{*} = β X_{i} + ε_{i},

y_{i} = k \Leftrightarrow τ_{i}^{k - 1} \leq y_{i}^{*} < τ_{i}^{k}, - \infty = τ_{i}^{0} < τ_{i}^{1} < \dots < τ_{i}^{K - 1} < τ_{i}^{K} = + \infty,

where $τ^{k} = 1, \dots, K - 1$ were the thresholds the respondent used to distinguish between different categories. We further allowed the thresholds to be dependent on covariates by a relationship specified by γ^k as a way to model RH (Equation 1c):

τ_{i}^{k} = γ^{k} X_{i} .

However, it is apparent that the generalized ordered probit model with varying thresholds (Equations 1a-1c) is underidentified, and estimation of β separately from γ^k is possible only when additional information on RH (γ^k) is provided by anchoring vignettes as follows:

V_{i j}^{*} = α_{j} + ξ_{i j},

V_{i j} = k \Leftrightarrow ν_{i}^{k - 1} \leq V_{i}^{*} < ν_{i}^{k}, - \infty = ν_{i}^{0} < ν_{i}^{1} < \dots < ν_{i}^{K - 1} < ν_{i}^{K} = + \infty,

τ_{i}^{k} = ν_{i}^{k} = γ^{k} X_{i} .

Similar to self-rating responses, we assumed that the discrete response (V_i) that the individual chose to best describe the health state depicted by the vignette was generated from an underlying continuous latent variable $V_{i j}^{*} ~ N (μ_{i j}^{v}, 1)$ . The vignette response was modeled (Equation 2a) only with an intercept and error term (βX term was omitted) as we assumed that the fixed level of health state depicted by the vignette would be understood in the same way by all individuals (VE assumption). We further forced the response-category thresholds that the individual used for rating self (τ^k) as well as the vignette (ν^k) to be identical (Equation 2c; RC assumption). The thresholds in turn were allowed to vary by covariates (γ^k) to account for RH. Equations (1a), (1b), (2a), (2b), and (2c) were combined as the HOPIT model to identify and adjust self-rating responses for RH. The HOPIT model is a variation of the standard ordered probit model that combines information from vignette ratings and self-assessment rating. (Tandon et al., 2003).

To test the assumption of RC, additional information on objective measures of health was required. We tested for equality of the thresholds identified using the objective measures with those identified using the vignettes $τ_{i}^{k} = ν_{i}^{k}$ To test the assumption of VE, we specified a variant of the HOPIT model to include interactions (λ_j) between each covariate and all except one vignette to avoid underidentification. We did a Wald test for the parameters (λ_j) of all covariate–vignette interactions to be equal to zero (global test for VE) and for individual covariate interactions to determine which covariates influenced VE (Bago d’Uva et al., 2011).

Results

A total of 5,432 respondents responded to the survey (response rate 90%); 9% (568 individuals) of respondents who could not be traced due to migration and other reasons were not included in this study. A further 345 individuals (6%) were excluded as we were unable to match their identity information with the HDSS data set. Vignettes were not administered in 1 respondent. Of the 5,086 respondents, 1,307, 1,251, 1,287, and 1,241 were administered vignettes for mobility/affect, self-care/cognition, pain/relationships, and sleep/vision domains, respectively. The analysis for this article was restricted to 1,307 and 1,251 respondents who were administered mobility and cognition vignettes, respectively. Information on objective measures for mobility and cognition required for testing the assumption of RC was available in a random subset of 287 and 318 respondents, respectively.

There was no significant difference in the mean age between men (65.2 years) and women (64.8 years). A significantly higher proportion of women (98%) were less educated than men (85%). The proportion of men and women in the poorest and richest SES quintile was similar. There were more women who lacked spousal support compared with men (42% vs. 12%, respectively, p < .001). A significantly higher proportion of men (63%) reported good or very good health than women (52%). Men reported significantly less limitation in their functional ability across all health domains than women (Table 1). Men scored significantly better than women in handgrip strength test, but women did significantly better in the mobility test of 4-m normal walking speed (p = .040). Men performed significantly better than women in all cognitive tests (except for delayed verbal recall where difference in scores was not significant). Less than 2% respondents reported severe or extreme difficulty in memory or learning new tasks compared with 4% for moving around and 13% for performing vigorous activity. A high proportion of rank ties (43%-57%) were seen for cognition vignette ratings. Inconsistent ratings between vignette severity Levels 4 and 5 (9%-11%) were seen for cognition vignettes. There were no inconsistent ratings or high proportion of ties for mobility vignettes (results not shown).

Table 1.

Sociodemographic and Health Characteristics of Men and Women, Vadu, India (N = 2,558).

	Men (n = 1,367)			Women (n = 1,191)
		SD	n		SD	n	p
M age (years)	65.2	8.92	1,367	64.8	9.30	1,191	.311
Education							.000
Primary or less	85%		1,155	98%		1,163
Secondary	10%		142	1%		17
Higher secondary/more	5%		70	1%		11
No spousal support	12%		162	42%		495	.000
SES							.022
1st quintile (poorest)	11%		132	12%		124
2nd quintile	15%		171	14%		138
3rd quintile	19%		228	24%		239
4th quintile	23%		268	18%		182
5th quintile (highest)	32%		372	32%		314
SRH							.000
Very good	5%		70	3%		39
Good	58%		790	49%		584
Moderate	33%		461	43%		509
Bad	3%		44	5%		58
Very bad	.2%		2	.1%		1
Mean self-rating^a for difficulty with			1,367			1,191
Moving around	1.74	0.81		1.90	0.85		.000
Vigorous activity	2.25	1.05		2.36	1.01		.007
Sadness	1.80	0.78		1.93	0.81		.000
Worry	1.89	0.85		2.01	0.89		.001
Body aches	1.96	0.85		2.11	0.84		.000
Discomfort	2.01	0.87		2.15	0.87		.000
Relationships	1.75	0.74		1.85	0.77		.001
Conflicts	1.79	0.78		1.92	0.79		.000
Waking up	1.66	0.81		1.80	0.91		.000
Feeling rested	1.79	0.86		1.92	0.86		.000
Far vision	1.99	0.84		2.08	0.91		.009
Near vision	1.98	0.86		2.06	0.92		.019
Bathing	1.55	0.76		1.65	0.82		.011
Maintaining appearance	1.59	0.78		1.72	0.82		.000
Memory	1.63	0.71		1.74	0.76		.000
Learning	1.80	0.83		1.92	0.86		.000
Mobility test measures			145			142
Normal 4 m walk (s)	5.0	1.8		4.5	2.8		.070
Rapid 4 m walk (s)	3.4	1.4		3.3	2.1		.652
Grip strength (right hand) (kg)	24.8	9.5		15.8	8.2		.000
Grip strength (left hand) (kg)	23.9	8.8		15.1	7.9		.000
Cognition test measures^b			163			155
Immediate verbal recall	0.50	0.12		0.45	0.13		.000
Delayed verbal recall	0.40	0.20		0.37	0.17		.203
Digit span (forward)	0.49	0.17		0.42	0.15		.000
Digit span (backward)	0.33	0.16		0.25	0.16		.000
Verbal fluency	0.45	0.14		0.37	0.15		.000

Note. SES = socioeconomic status; SRH = self-rated health.

Self-ratings range from 1 = no difficulty to 5 = extreme difficulty.

Cognition test scores are rescaled on an improving scale of 0 to 1.

Testing Assumptions

Mean vignette ratings increased as vignette severity level increased except for the cognition vignette severity Level 5 (Table 2). The global tests showed that the assumption of VE was met except for the difficulty in learning (χ² = 56.59, p = .016) that seemed to be largely driven by SES (Table 3).

Table 2.

Mean Ratings^a of Vignettes for Mobility (n = 1,307) and Cognition (n = 1,251), Vadu, India.

	Difficulty—Moving around		Difficulty—Vigorous activity
Mobility	M rating	SD	M rating	SD
Vignette 1	1.55	0.855	1.70	0.939
Vignette 2	2.23	0.831	2.55	0.977
Vignette 3	2.88	0.963	3.08	0.954
Vignette 4	3.44	0.956	3.53	0.944
Vignette 5	4.18	1.184	4.12	1.212
	Difficulty—Memory		Difficulty—Learning
Cognition	M rating	SD	M rating	SD
Vignette 1	1.83	0.922	1.76	0.944
Vignette 2	2.16	0.818	2.11	0.911
Vignette 3	3.24	0.859	3.35	0.793
Vignette 4	3.48	0.881	3.47	0.959
Vignette 5	3.33	0.949	3.38	1.087

Note. Vignettes 1 to 5 are ordered by increasing level of severity.

Ratings range from 1 = no difficulty to 5 = extreme difficulty.

Table 3.

Wald Tests for Vignette Equivalence and Likelihood Ratio Tests for Response Consistency for Mobility and Cognition Domain, Vadu, India.

		Mobility—Moving around		Vigorous activity		Cognition—Memory		Learning
	df	χ² statistic	p	χ² statistic	p	χ² statistic	p	χ² statistic	p
Vignette equivalence
Global test	36	40.99	.261	34.67	.532	37.01	.422	56.59	.016
Men	4	1.64	.802	0.39	.983	2.72	.606	3.07	.546
Age	8	8.51	.386	6.49	.593	5.53	.700	5.42	.712
SES	16	24.44	.080	20.98	.179	25.02	.069	40.85	.001
Education	8	6.25	.620	8.87	.354	2.41	.966	5.87	.662
Response consistency
Global test	36	267.3	.000	347.1	.000	457.4	.000	533.3	.000
Men	4	10.51	.033	33.12	.000	18.25	.001	20.00	.001
Age	8	24.38	.002	80.91	.000	12.38	.135	17.34	.027
SES	16	62.86	.000	69.47	.000	39.10	.001	38.50	.001
Education	8	32.11	.000	77.07	.000	17.14	.029	20.68	.008

Note. SES = socioeconomic status.

The global tests showed that the assumption of RC was not met for both mobility and cognition vignettes (Table 3) largely driven by age, sex, SES, and education. Normal timed walk was a significant predictor for both difficulty in moving around and vigorous activity while all cognitive tests were significant predictors of perceived latent cognition (results not shown). The two sets of thresholds predicted from entirely different set of variables (vignettes and objective measures) seemed to be concordant (Figure 1). However, the thresholds predicted from the vignettes and objective measures were less similar for the “moving around” subdomain of mobility.

Figure 1.

Threshold locations for mobility and cognition predicted from vignettes and from objective measures.

Evaluating Self-Rating Responses for RH

Table 4 presents the results of the joint HOPIT model with self-rating and vignette rating components that shared the same thresholds that varied with the effect of covariates. Positive coefficient implied increased difficulty in mobility or cognition. Men (β = −.16) and higher SES (β = −.08) were significantly less likely to report difficulty in moving around. This pattern was also seen for difficulty in vigorous activity and for cognition but was not significant. Similarly, older respondents were significantly more likely to report difficulty in moving around (β = .09) with the same but non-significant pattern seen for difficulty in vigorous activity and for cognition. Higher educated respondents were also significantly less likely to report difficulty with memory (β = −.35; similar pattern but not significant for learning and mobility). There was no significant effect of SES or education on thresholds used for mobility ratings. In the cognition domain, higher SES significantly lowered the “none–mild” threshold (τ₁ = −.25), that is, higher SES respondents were more demanding in ratings of their difficulty in cognition (Table 4). Similar results were also seen for the effects of education on cognition thresholds.

Table 4.

HOPIT Model Parameters for Predictors for Perceived Latent Mobility (n = 1,103) and Cognition (n = 1,065), Vadu, India.

	Mobility		Cognition
	Moving around	Vigorous activity	Memory	Learning
Men	−.17	−.06	−.11	−.12
Age
60-69 years	.10	.20*	.11	.01
70+ years	.18	.27**	.15	.00
SES
2nd quintile	−.12	−.01	.02	−.10
3rd quintile	−20	.00	−.02	.08
4th quintile	−.27	−.11	−.04	.07
5th quintile (richest)	−.32	−.15	−.11	.10
Education
Secondary	−.11	.09	−.04*	−.06
Higher	−.17	−.26	−.63*	.00
Threshold (none–mild) (τ₁)
Men	.03	.06	.08	.08
Age (60-69 years)	.15**	.03	.05	.06
Age (70+ years)	.08	.02	−.03	−.05
SES (2nd quintile)	−.07	−.03	−.15	−.07
SES (3rd quintile)	.06	−.09	−.11	−.13
SES (4th quintile)	−.01	.01	−.16	−.20*
SES (5th quintile)	.05	−.09	−.25**	−.09
Education (secondary)	.38***	.20	−.06	.12
Education (higher)	−.10	−.09*	−.70***	−.08
Constant	−.59***	−.64***	−.18	−.25*

Note. Reference categories for sex is women, for age is 50-59 years, for SES is poorest quintile, and for education is primary or less. Results for thresholds τ₂ to τ₄ are not shown in table. HOPIT = hierarchical ordered probit; SES = socioeconomic status.

p < .05. **p < .01. ***p < .001.

Discussion

RH in self-rating responses is often seen between populations of different countries and cultures. Our focus on the Indian context allows us to study the sociodemographic factors (other than country) that drive RH in self-rating health responses among older adults. The SAGE has vignettes data in eight function domains of health—in our article, we compared self-rating and vignette ratings in mobility (physical health) and cognition (mental health), two distinct and dissimilar aspects of elderly health known to be affected by RH (Gill, Desai, Gahbauer, Holford, & Williams, 2001; Reed, Jagust, & Seab, 1989). Our results showed strong evidence of systematic variation in reporting behavior when respondents were asked to self-rate their mobility and cognitive ability. Higher SES and higher educated respondents significantly lowered the “none–mild” response-category threshold for cognition ratings, that is, they were more likely to be “demanding” in self-rating their cognitive ability compared with lower SES and less educated respondents. After correction for RH, women, lower SES, and older respondents were significantly more likely to report greater difficulty in mobility. Similar pattern was seen for cognition self-rating but was not significant. Similar results are seen in other studies (Grol-Prokopczyk et al., 2011).

The assumptions of VE and RC cannot be taken for granted as some studies have found adherence while others have found violation of these basic assumptions (Bago d’Uva et al., 2011; Datta Gupta et al., 2009; Grol-Prokopczyk et al., 2011; Hirve et al., 2012; Rice et al., 2010; Van Soest et al., 2007). Earlier studies have used informal checks and less rigorous tests for testing these assumptions. In our study, the assumption of VE was met except for the learning vignettes. A strict parametric test (equality of location of thresholds) showed that the RC assumption was not met largely driven by age, sex, SES, and education of the respondents. However, a visual assessment of predicted thresholds used for self-rating and for vignette rating was similar, for both mobility and cognition. A lack of adherence to the assumptions using strict parametric tests does not discredit the overall use of vignettes approach in identifying RH. Using fewer vignettes, using fewer response categories, dealing with missing data, or using statistical models that relax assumptions to some degree are alternate strategies that may improve the adherence to the assumptions. We used vignettes adapted from the World Health Survey of 2003—further research is needed to see whether revising the contents and wordings of the vignettes improves their performance from the perspective of VE and RC.

This article focuses on using anchoring vignettes to improve interpersonal comparability of self-rating health responses. While this technique does not explain the reasons for differential reporting behavior, it allows us to adjust responses for these differences and study individual or group characteristics that may predict them. Vignettes could potentially be used to adjust for RH between sexes or across age, education or SES groups. It was beyond the scope of this study to assess some of the methodology issues in the use of anchoring vignettes—whether the sequence of the self-rating and vignette rating influences RC (Hopkins & King, 2010) or whether matching the age and sex of the vignette character and the respondent influences RH (Grol-Prokopczyk, 2010) or whether improving the wordings of the cognition vignettes would enable respondents to better differentiate between “memory,” a lower cognitive function based primarily on short-term memory, as compared with “learning,” a higher cognitive function requiring both prospective memory and executive function. It would be interesting to test whether vignette validity is higher in certain sociodemographic contexts or whether the assumptions of VE and RC are more likely to be upheld in panel studies.

Finally, as a note of caution, anchoring vignettes may be used to correct differential reporting behavior only, not for all forms of DIF (King & Wand, 2007). If respondents understand and interpret the rating question in fundamentally different ways, then anchoring vignettes may only partly fix the inferential problems. The question still remains on how to address these other aspects of interpersonal incomparability aside from RH.

Conclusion

The anchoring vignettes approach shows that there is strong evidence of RH in self-rating responses to questions on difficulty in mobility and cognition largely driven by education, SES, age, and sex of the respondent.

Footnotes

Appendix

Text of Mobility and Cognition Vignettes

Domain question	Vignette
Mobility—Overall, in the last 30 days, how much difficulty did [you/name_] have . . . with moving around? . . . with vigorous activity (such as cycling, working in fields, etc.)?	Severity Level 1: [name_a] has no problems with walking, running, or using her hands, arms, and legs. She jogs 4 km twice a week.
	Severity Level 2: [name_b] is able to walk distances of up to 200 m without any problems but feels tired after walking 1 km or climbing up more than one flight of stairs. He has no problems with day-to-day physical activities, such as carrying food from the market.
	Severity Level 3: [name_c] does not exercise. She cannot climb stairs or do other physical activities because she is obese. She is able to carry the groceries and do some light household work.
	Severity Level 4: [name_d] has a lot of swelling in his legs due to his health condition. He has to make an effort to walk around his home as his legs feel heavy.
	Severity Level 5: [name_e] is paralyzed from the neck down. He is unable to move his arms and legs or to shift body position. He is confined to bed.
Cognition—Overall, in the last 30 days, how much difficulty did [you / name_] have . . . with concentrating or remembering things? . . . learning a new task (e.g., learning how to get to a new place, learning a new game, learning a new recipe)?	Severity Level 1: [name_a] is very quick to learn new skills at his work. He can pay attention to the task at hand for long uninterrupted periods of time. He can remember names of people, addresses, phone numbers, and such details that go back several years.
	Severity Level 2: [name_b] can concentrate while watching TV, reading a magazine, or playing a game of cards or chess. He can learn new variations in these games with small effort. Once a week, he forgets where his keys or glasses are but finds them within 5 min.
	Severity Level 3: [name_c] can find her way around the neighborhood and know where her own belongings are kept but struggles to remember how to get to a place she has only visited once or twice. She is keen to learn new recipes but finds that she often makes mistakes and has to reread several times before she is able to do them properly.
	Severity Level 4: [name_d] cannot concentrate for more than 15 min and has difficulty paying attention to what is being said to him. Whenever he starts a task, he never manages to finish it and often forgets what he was doing. He is able to learn the names of people he meets but cannot be trusted to follow directions to a store by himself.
	Severity Level 5: [name_e] does not recognize even close relatives and gets lost when he leaves the house unaccompanied. Even when prompted, he shows no recollection of events or recognition of relatives. It is impossible for him to acquire any new knowledge as even simple instructions leave him confused.

Acknowledgements

We thank Marton Ispany, Teresa Bago d’Uva, and Hanna Grol-Prokopczyk for sharing the STATA source codes for the hierarchical ordered probit (HOPIT) model and guidance for the analysis. We acknowledge the support of Kathy Kahn and Paul Kowal for coordinating this multi-country study. Thanks are due to the Vadu Health and Demographic Surveillance System (HDSS) team for their quality work and the older adult population of the Vadu Demographic Surveillance Area for their willing consent to contribute their knowledge to this study.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article uses data from World Health Organization (WHO) Study on Global AGEing and Adult Health (SAGE). The SAGE is supported by the U.S. National Institute on Aging through Interagency Agreements (OGHA 04034785, YA1323-08-CN-0020, Y1-AG-1005-01) and through a research grant (R01-AG034479). Health and Demographic Surveillance System, Vadu, is a member of the International Network for the Demographic Evaluation of Populations and Their Health (INDEPTH Network). This work was supported by the Umeå Centre for Global Health Research, Umeå University, with support from FAS, the Swedish Council for Working Life and Social Research (Grant 2006-1512) through its PhD Fellowship to the first author.

References

Bago d’Uva

Lindeboom

O’Donnell

van Doorslaer

(2011). Slipping anchor? Testing the vignettes approach to identification and correction of reporting heterogeneity. Journal of Human Resources, 46, 875-906.

Bago d’Uva

O’Donnell

van Doorslaer

(2008). Differential health reporting by education level and its impact on the measurement of health inequalities among older Europeans. International Journal of Epidemiology, 37, 1375-1383.

Bago d’Uva

Van Doorslaer

Lindeboom

O’Donnell

(2008). Does reporting heterogeneity bias the measurement of health disparities? Health Economics, 17, 351-375.

Banks

Marmot

Oldfield

Smith

J. P.

(2007). The SES health gradient on both sides of the Atlantic. In Wise

D. A.

(Ed.), Developments in the economics of aging (pp. 359-406). Chicago, IL: University of Chicago Press.

Brady

H. E.

(1985). The perils of survey research: Interpersonally incomparable responses. Political Methodology, 11, 269-290.

Christensen

Herskind

A. M.

Vaupel

J. W.

(2006). Why Danes are smug: Comparative study of life satisfaction in the European Union. British Medical Journal, 333, 1289-1291.

Datta Gupta

Kristensen

Pozzoli

(2010). External validation of the use of vignettes in cross-country health studies. Economic Modelling, 27(4), 854-865.

DeSalvo

K. B.

Bloser

Reynolds

Muntner

(2006). Mortality prediction with a single general self-rated health question: A meta-analysis. Journal of General Internal Medicine, 21, 267-275.

Dowd

J. B.

Zajacova

(2007). Does the predictive power of self-rated health for subsequent mortality risk vary by socioeconomic status in the US? International Journal of Epidemiology, 36, 1214-1221.

10.

Frankenberg

Jones

N. R.

(2004). Self-rated health and mortality: Does the relationship extend to a low income setting? Journal of Health and Social Behavior, 45, 441-452.

11.

Gill

T. M.

Desai

M. M.

Gahbauer

E. A.

Holford

T. R.

Williams

C. S.

(2001). Restricted activity among community-living older persons: Incidence, precipitants, and health care utilization. Annals of Internal Medicine, 135, 313-321.

12.

Grol-Prokopczyk

(2010). Age, sex and race effects in anchoring vignette studies: Methodological and empirical contributions (CDE Working Paper No. 2010-18). Unpublished manuscript.

13.

Grol-Prokopczyk

Freese

Hauser

R. M.

(2011). Using anchoring vignettes to assess group differences in general self-rated health. Journal of Health and Social Behavior, 52, 246-261.

14.

Hirve

Gomez-Olive

Oti

Debpuur

Juvekar

Tollman

. . . Ng

(2012). Use of anchoring vignettes to evaluate health reporting behavior amongst adults aged 50 years and above in Africa and Asia—Testing assumptions. Global Health Action, 6, Article 21064.

15.

Hopkins

D. J.

King

(2010). Improving anchoring vignettes: Designing surveys to correct interpersonal incomparability. Public Opinion Quarterly, 74, 201-222.

16.

Jurges

(2007). True health vs response styles: Exploring cross-country differences in self-reported health. Health Economics, 16, 163-178.

17.

Kapteyn

Smith

Van Soest

(2007). Vignettes and self-reports of work disability in the US and the Netherlands. American Economic Review, 97, 461-473.

18.

King

Murray

C. J.

Salomon

J. A.

Tandon

(2004). Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review, 98, 191-207.

19.

King

Wand

(2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15, 46-66.

20.

Kowal

Chatterji

Naidoo

Biritwum

Fan

Lopez Ridaura

. . . Boerma

J. T.

(2012). Data resource profile: The World Health Organization Study on global AGEing and adult health (SAGE). International Journal of Epidemiology, 41, 1639-1649.

21.

Kristensen

Johansson

(2008). New evidence on cross-country differences in job satisfaction using anchoring vignettes. Labor Economics, 15, 96-117.

22.

Lindeboom

van Doorslaer

(2004). Cut-point shift and index shift in self-reported health. Journal of Health Economics, 23, 1083-1099.

23.

Menec

V. H.

Shooshtari

Lambert

(2007). Ethnic differences in self-rated health among older adults: A cross-sectional and longitudinal analysis. Journal of Aging and Health, 19, 62-86.

24.

Murray

C. J.

Tandon

Salomon

J. A.

Mathers

C. D.

Sadana

(2002). New approaches to enhance cross-population comparability of survey results. In Murray

C. J.

Salomon

J. A.

(Eds.), Summary measures of population health: Concepts, ethics, measurement and applications (pp. 421-432). Geneva, Switzerland: World Health Organization.

25.

Reed

B. R.

Jagust

W. J.

Seab

J. P.

(1989). Mental status as a predictor of daily function in progressive dementia. The Gerontologist, 29, 804-807.

26.

Rice

Robone

Smith

P. C.

(2010). International comparison of public sector performance: The use of anchoring vignettes to adjust self-reported data. Evaluation, 16, 81-101.

27.

Shetterly

S. M.

Baxter

Mason

L. D.

Hamman

R. F.

(1996). Self-rated health among Hispanic vs non-Hispanic white adults: The San Luis Valley Health and Aging Study. American Journal of Public Health, 86, 1798-1801.

28.

Tandon

Murray

C. J.

Salomon

J. A.

King

(2003). Statistical models for enhancing cross-population comparability. In Murray

C. J.

Evans

D. B.

(Eds.), Health systems performance assessment: Debates, methods and empiricism (pp. 727-746). Geneva, Switzerland: World Health Organization.

29.

Van Soest

Delaney

Harmon

Kapteyn

Smith

(2007, June). Validating the use of vignettes for subjective threshold scales (IZA Discussion Paper No. 2860). Institute for the Study of Labor, Bonn, Germany. Retrieved from http://ftp.iza.org/dp2860.pdf

30.

Zimmer

Natividad

Lin

H. S.

Chayovan

(2000). A cross-national examination of the determinants of self-assessed health. Journal of Health and Social Behavior, 41, 465-481.