Abstract
The reliability of an item designed to measure health belief is often confounded with response consistency at the person level. The study applied contemporary measurement methods to an inventory of common sense beliefs about diabetes and used a sample of N = 563 adults with diabetes to test the hypothesis that individuals whose beliefs are congruent with a biomedical model are more consistent in their responses. Item-level analysis revealed that the domains of Causes and Medical Management were the least reliable. Person-level analysis showed that respondents who held views congruent with the biomedical model were more consistent than people who did not.
Keywords
With the increasing interest in patient-centered care, it has been recognized that patients’ beliefs, values, preferences, and needs form the basis for patient/care-provider communication, decisions about treatment options, and long-term disease management. Health psychologists and sociologists have developed different models to explain and predict how beliefs drive health-related behavior. The health belief model (HBM; Rosenstock, 1974; Rosenstock and Kirscht, 1979) states that a person’s perception of different aspects of a disease—susceptibility to and severity of the disease, for example—is related to the person’s specific preventive efforts. Thus, the HBM represents a cognitive, rational view of human health activities. In the self-regulatory model of illness (Leventhal et al., 2003), individuals are believed to create a lay model of their illness based on their beliefs, knowledge, and experiences; variations in such beliefs, knowledge, and experiences lead to differences in behaviors undertaken to prevent, control, and manage the disease.
While models for health beliefs grounded in social and psychological theories are now relatively well developed, the measurement of beliefs, especially the consistency of individuals’ beliefs, is still an understudied area. In particular, the assessment of the stability of beliefs is now possible in studies with a design that includes repeated measurements, but the sources of stability, or the lack thereof, have often not been clearly delineated. For example, Brock (1984) found that persons with more stable beliefs about susceptibility to swine flu were more likely not to take flu shots in an immunization campaign. However, in the analysis, an individual’s stability of belief is confounded with measurement error of the items. Grzywacz et al. (2011) found that lay beliefs about diabetes were not always stable across specific domains of diabetes such as causes of the disease and its medical management, as well as across different ethnic groups. Their analysis made use of a small sample of test–retest data and examined the concordance of responses in the repeated measures at the item level. Similar to Brock’s study, item-level concordance was confounded with individual consistency. Some patients tended to have large variability in their responses, while others were more consistent. Without a method to tease apart differences in response consistency from the item-level reliability, it is difficult to pinpoint the source of variation, which could lead to biases in belief measurement.
This study aimed to apply a novel approach to delineate stability of response at two levels—reliability at the item level and response consistency at the person level. To achieve this goal, our analysis used baseline and 1-month follow-up data from a large, multi-ethnic sample of older adults with diabetes, with their beliefs about diabetes being measured under the frameworks of the Explanatory Model (EM) of illness (Kleinman, 1980, 1988; Kleinman et al., 1978) and the Common Sense Model (CSM) of illness (Leventhal et al., 2003). The EM of illness posits that individuals make sense of an illness in the context of their knowledge and experience. EMs often include ideas about etiology, symptoms, physiology, treatments, and consequences. They may be only partly articulated, inconsistent, and even self-contradictory. Lay EMs may overlap with those of medical professionals but often show significant differences (Baer et al., 2004, 2008; Chavez et al., 1995). The idea of EM arises from anthropology and has been used to contrast the views that lay persons and professional possess for the same condition. CSMs of illness are similar to EMs. They are the representation of an illness that patients develop to help them make sense of their condition and develop responses to it. With its roots in psychology, the CSM emphasizes the active cognitive process patients use to deal with an illness. In a sense, patients go through an active hypothesis-testing process as they deal with an illness, so their CSMs can be fluid.
The two goals of this study were (1) to examine the reliability of individual health belief items after taking into account the “fluidness” (level of consistency) of individual health beliefs and (2) to characterize individuals who have high levels of response consistency. We expected that beliefs congruent with the biomedical model are less likely to vary in comparison to lay beliefs, as individuals adhering to biomedical health beliefs tend to “anchor” these beliefs with a set of mechanistic rules of clinical reality (Kleinman et al., 1978: 89). Thus, their beliefs are less likely to vary as much as individuals with lay beliefs. Specifically, we tested the hypothesis that participants with beliefs that are more congruent with the biomedical model will be more consistent in their belief item responses over time.
Method
Data
Participants with diabetes were recruited from various organizations and locations within multiple counties in North Carolina to represent site-based sampling, with a total of 596 participants being recruited at baseline. Only 563 completed the data necessary for this study. Formal and informal community leaders provided support with study recruitment by introducing the study staff to recruitment locations and by verifying the legitimacy of the research project to elder participants. The number of participants from each type of recruitment location included 50 from community-based organizations (e.g. veteran, civic groups), 39 from community events, 43 from churches, 11 from flyer postings, 92 from senior housing, 65 from senior centers, and 104 from congregate meal sites. Recruitment also included 165 participants who were recruited through individual community members through word-of-mouth referral, and 24 participants from an existing participant database compiled from previous rural aging studies, which had used site-based sampling. We included all persons who had had a diabetes diagnosis for at least 2 years, and we did not distinguish between type 1 and type 2 diabetes.
More details are provided in Arcury et al. (2012), Kirk et al. (2012), Nguyen et al. (2012), and Quandt et al.(2012). The Institutional Review Board of the relevant institution approved the research protocol.
Measures
CSM of Diabetes Inventory
The Common Sense Model of Diabetes Inventory (CSMDI) consisted of 94 individual belief items obtained from a multi-ethnic sample of rural-dwelling older adults reflecting several belief domains (Grzywacz et al., 2011), and the response options for each item were “agree,” “disagree,” or “don’t know.” After receiving input from focus groups and, more importantly, analyzing results from previous pilot studies, the study team reviewed the item pool and selected 31 items for the final battery (these 31 items are included in Appendix 1) that effectively distinguished between groups while also adequately representing the core belief domains, including “Cause, Symptoms, Behavioral Management, Medical Management, and Consequences” (Kleinman et al., 1978; Leventhal et al., 2003).
CSMDI data were collected from two visits (separated by 1 month; neither treatment nor education intervention was administered by the study between the two visits) on the 31 CSMDI items, which belong to six different domains—Symptoms, Causes, Consequences, Informa-tion, Behavioral Management, and Medical Management of diabetes. The CSMDI items were designed to capture common sense beliefs, and thus, they do not have “correct” or “incorrect” answers. In other words, the CSMDI items do not form a scale and traditional psychometric validation procedure was not directly applicable. A preliminary validation study based on latent class analysis for CSMDI was reported in Grzywacz et al. (2011).
Biomedical belief score
In order to measure the extent to which a participant’s beliefs agreed with a biomedical model, all items were coded according to the American Diabetes Association (ADA, 2012) guidelines to reflect congruence with the current biomedical understanding of type 2 diabetes. If the belief statement contained content supported by clinical research, then “agree” responses were considered consistent with biomedical understanding (coded 1). By contrast, if the belief statement referenced content for which there was little clinical evidence, or if it contradicted clinical evidence, “agree” responses were considered not consistent with biomedical understanding (coded 0). The sum score was then used to measure the construct of biomedical belief. “Don’t know” responses were treated as not consistent with the biomedical model.
Statistical analysis
An item response theory (IRT)-based model (Embretson and Reise, 2000; Lord, 1980) delineates the effects due to items that do not function reliably and due to differences in response consistency at the person level. Rooted in educational testing, IRT models have been used to separate potentially confounding effects of person-level ability and item-level difficulty. Briefly, IRT asserts that the likelihood of concordance between two administrations of the same belief item is a function of the reliability of the item as well as of the personal trait of response consistency in the individual’s beliefs about diabetes.
Evaluation of basic IRT assumptions
Two basic IRT assumptions—unidimensionality and local independence—were evaluated. Local independence refers to the property that responses from the same individual are independent after accounting for the individual’s underlying trait. Three different IRT models—the unidimensional model, the two-dimensional model, and the 7-factor bifactor model—were fitted to the concordance data, and their Bayesian Information Criterion (BIC) values were compared (lowest is optimal). The bifactor model (Gibbons and Hedeker, 1992) posits that for each item, there first exists a common underlying dimension and that there also exists a domain-specific dimension for the individual item belonging to the domain. For the unidimensional model, model fit statistics including the Tucker–Lewis Index (TLI), the comparative fit index (CFI), root mean squared error of approximation (RMSEA), and standardized root mean square residual (SRMR) were computed. The following criteria were used to determine the goodness of fit of the unidimensional model: TLI ≥ 0.9, CFI ≥ 0.95, RMSEA ≤ 0.06, and SRMR ≤ 0.08 (Hu and Bentler, 1998; Sharma et al., 2005). Local dependency was evaluated by examining pairwise residual correlation through the Q3 statistic (Yen, 1984), with multiple comparison controlled by the Hochberg (1988) procedure.
Item-level analysis
The specific procedure for IRT analysis is as follows: When individuals’ responses (agree, disagree, or don’t know) across the two visits were identical, their responses were said to be in concordance. The outcome variable
where
In an ideal situation, a reliable item would have a high likelihood of soliciting concordant responses (outcome = 1) from the same individual after controlling for differences of response consistency at the person level. Figure 1 shows several hypothetical item response curves as functions of person-level consistency. An ideal item exhibits a generally high level of concordance in a test–retest situation regardless of differences at the person level (solid line). A more practical requirement for item reliability would be for an item to demonstrate a high level of concordance only for individuals with a “normal” level of response consistency (dashed line), whereas an item with poor reliability would have generally low concordance across different levels of individual response consistencies (dotted line).

Hypothetical item response curves showing ideal, reliable, and unreliable items.
The item response curves for each CSMDI item were examined to identify unreliable items. Furthermore, the differential item functioning (DIF) method (Holland and Wainer, 1993) was used to examine whether a CSMDI item functioned differently for different demographic subgroups. In other words, DIF evaluates if an item exhibits bias (e.g. more difficult to generate concordance) for one subgroup versus another. The demographic subgroups assessed included gender and ethnicity (White, African American, and American Indian). Statistical hypothesis tests for DIF were set at the α = 0.05 level, and the Bonferroni procedure was used to adjust for multiple comparisons across items and groups. Items that exhibited differential functioning were removed from the analysis.
Individual-level analysis
To test the hypothesis that participants with a biomedical view of diabetes are more consistent in their responses to the same items, regression analysis was used, with the response consistency score as the dependent variable and the biomedical score as the primary predictor variable. Individual response consistency score was measured by an estimate of
The programs IRTPRO 2.1 (Scientific Software International Inc., Skokie, IL), Mplus 7 (Muthén & Muthén, Los Angeles, CA), and SAS 9.3 (SAS, Inc., Cary, NC) were, respectively, used for IRT calibration and DIF analysis, model assumption evaluation, and biomedical score analysis.
Results
IRT assumption checking
The BIC values for the unidimensional, two-dimensional, and the bifactor model were, respectively, 21,056, 21,253, and 21,165, suggesting that the unidimensional model has the best fit among the three IRT models. The TLI, CFI, RMSEA, and SRMR values of the unidimensional model were 0.60, 0.61, 0.034, and 0.08, respectively. Both TLI and CFI were lower than the recommended thresholds (≥ 0.9 and 0.95, respectively), but RMSEA and SRMR were deemed satisfactory (≤ 0.06 and ≤ 0.08, respectively). The 7-factor bifactor model “passed” all the tests in model fit indexes. There appears to be evidence that the items are sufficiently unidimensional but also contain small domain-specific dimensions. The Q3 statistic for pairwise residual correlation showed 14 significant p-values in testing 465 pairs, and only 1 pair (between items 20 and 23) remained significant after adjusting for multiple comparison using the Hochberg procedure. Both items 20 (“Drinking lots of water helps to flush extra sugar out of the body”) and 23 (“The only thing people with diabetes need to know is to stay away from sweets”) were related to sugar consumption but the contexts in which sugar was mentioned seemed unrelated. A decision was made not to remove either item.
Item-level analysis results
A total of N = 563 participants completed CSMDI at both visits. All participants were 60 years or older, and 11% were 80 years or older; 38% were males, and the percentages of White, African American, and American Indian participants were 36%, 34%, and 30%, respectively. Only 30% of the sample had attained more than a high school education, and 61% had had diabetes for more than 10 years. Figure 2 shows the item response curves for the CSMDI grouped by domain. Of the six domains, items for Symptoms (with the exception of item 4, “Falling down is a sign of diabetes”), Consequences, and Information (with the exception of item 8 “People with diabetes understand their disease better than their doctors”) appeared to have relatively high levels of reliability. Causes and Medical Management had lower levels of overall reliability, and item 14 (“Being overweight makes people get diabetes” within the Causes domain) and item 29 (“Low blood sugar can be managed by adjusting medication” within the Medical Management domain) had low concordance even with participants who had relatively high response consistencies. Particularly, item 29 showed only a 0.5 probability of concordance across all levels of response consistency. In the Behavioral Management domain, some items had high levels of reliability (item 21 “Stress makes your blood sugar go up,” and item 22 “Managing the size of each meal helps control diabetes”), whereas others had moderate to low reliability.

Item response curves for CSMDI grouped by domain.
As a validation of the reliability results, we calculated the averaged concordance (agreement between responses at visits 1 and 2) for each domain—65% (Symptoms), 73% (Information), 60% (Causes), 65% (Consequences), 67% (Behavioral Management), and 57% (Medical Management). After removing item 14 (Causes) and item 29 (Medical Management), the averaged concordance of the corresponding domains, respectively, improved to 63% and 59%.
The DIF analysis suggested that several items (6, 13, and 23) exhibited DIF by gender at the nominal level of α = 0.05. However, after the Bonferroni adjustment, the results for these items were no longer significant. For ethnicity, eight items (2, 7, 18, 21, 26, 27, 29, and 31) exhibited DIF at the nominal level, but none were significant after adjustment. Item 2 (“Blood sugar will go up if you eat too many white foods”) was marginally significant (p = 0.0035, as compared with the threshold of 0.0016) in that the item was less discriminating for White than for the other two race groups (American Indians and African Americans). Consequently, no item was removed from the final analysis.
Person-level analysis results
Table 1 summarizes the multiple regression model, in which individual response consistency was the dependent variable. The biomedical score was highly significant (p < 0.001), which strongly supported the belief-anchor hypothesis that individuals with beliefs congruent with the biomedical model were more consistent in their responses about beliefs regarding diabetes. Individuals with a higher educational level also tended to be more consistent, but the significance level was marginal (p = 0.054).
Summary of the multiple regression model predicting response consistency (N = 563).
p ≤ 0.1. ***p ≤ 0.001 (all two-tailed tests).
Discussion
The confounding of response consistency and item reliability is an important issue that has not been entirely resolved in the field of psychological and social theories. By taking advantage of a unique data set from a study of participants with diabetes that contained a relatively large sample of test–retest data, we were able to apply IRT in a novel way for delineating the two distinct constructs. Because IRT is a mature and proven measurement technology, it has the advantage of being readily accessible to researchers. Furthermore, its features have depth that opens solutions for answering questions that have been difficult to answer using traditional methods. For example, DIF analysis could identify items that contained bias against one specific subgroup of the population. IRT also allows efficient study designs that administer smaller subsets of survey items to each individual for the purpose of reducing respondent burden. On the contrary, as our analysis has shown, estimates of individuals’ response consistency, free of confounding with item (un)reliability, could be used to examine factors that contribute to people’s ambivalence about statements of health belief in health surveys.
Our result demonstrates that (1) health belief items are not created equal and they differ in their measurement properties including reliability, (2) there exist individual differences in response consistency—some have more stable health beliefs while others are more fluid, and (3) individuals who hold more beliefs congruent with the biomedical model tended to be more consistent over time.
Individuals with college education also tend to be more consistent than individuals with less than high school education, although the significance level was only marginal. These findings have several implications. First, for social measurement specialist, it is important to realize that some belief items tend to generate inconsistent responses, which could lead to large measurement error. Second, for social science researchers, the study provides empirical evidence that individuals may not be able to consistently articulate their health beliefs, and their CSMs are fluid, some more so than others. Thus, assessing health belief at one time point may not be sufficient to capture all the necessary individual information for treatment and prevention purposes. The study also offers insight into the belief-anchor hypothesis. Overlap between individual CSM and the biomedical model produces more consistent responses to belief items, which can be explained by individuals using the biomedical model as a coherent belief-anchor.
Perhaps, the most important clinical implication of the study is this: For people with diabetes who have lay beliefs about the disease, such beliefs may not be robust and could be influenced by health education provided by health-care professionals as well as disease-related educational materials. In a recent study of familial risk perception of diabetes threat in relatives, Van Esch et al. (2013) reported that patients with coherent illness understanding reported positive beliefs regarding type 2 diabetes prevention in relatives. They also reported the presence of subgroups (e.g. those with high disease burden) that had elevated family risk perception. The authors argued that these findings could be used to guide patients in family risk disclosure. For the current study, our findings regarding patients with different robustness in beliefs about diabetes could be used to guide interventions designed for influencing patients’ beliefs.
Some caveats need to be noted for using the IRT procedure. Although the IRT approach allows the collection of data that could be used in designing interventions, the method requires strong assumptions and ample data. Model assumptions include unidimensionality and local independence, of which we have demonstrated ways for their evaluations in this article. Sample size is another important issue to consider prior to applying the procedure. First, criteria for checking IRT assumptions might not work well for small sample size, and we suspected that the moderate goodness-of-fit indexes for the concordance data could be a result of their sensitivities to sample size. Furthermore, IRT calibration typically requires a relatively large sample size (e.g. n > 500; see Reise and Yu, 1990; for a more recent review, see Orlando and Reeve, 2007). Thus, the typical test–retest sample size seen in health-related applications, which could be as few as several dozens, may not suffice. However, if the purpose is to use IRT for identifying unreliable items, then a smaller sample size could be justified because the purpose is less demanding in the accuracy of the item parameters compared to other applications such as for scoring individuals in a high-stake situation. Finally, it should be noted that the CSMDI was designed for assessing lay belief, thus some “unreliable” items, as identified by IRT, could still be useful in pinpointing deficits in an individual’s understanding of the disease and be retained in the inventory.
The study has limitations. First, the participants all resided in areas that are predominantly rural, possibly limiting the generalization of the findings. Only 30% of the participants had more than high school education and 31% were under the poverty line. Therefore, it is likely that their beliefs were less congruent with the biomedical disease model than that of a subpopulation with a more formal education and higher social economic status. For example, it is not clear whether or not the CSMDI items will function the same way for an urban population, though there is no particular reason to expect that they will not. A second limitation is that the IRT procedure described here only applies to concordance across two time points (test–retest). Extension of the method will be required for handling multiple time points. Finally, this article only describes a method to distinguish the concept of response consistency (as a person trait) and item unreliability (as an instrument measurement issue). The construct of response consistency has yet to be scientifically evaluated and validated. Furthermore, it would be of interest to investigate whether individuals hold a mix of lay beliefs and biomedical beliefs, and whether or not these separate beliefs are stable over time.
Diabetes is a chronic disease that requires long-term self-care and management. Improvement in diabetes self-management can be at least partly ascribed to the concordance between the explanatory systems of health-care provider and patient. Understanding the consistency of a patient’s belief system and how it can be changed is essential to a sustainable model of health care of the disease. This article, despite its limitations, offers an important step in this direction.
Footnotes
Appendix
The Common Sense Model of Diabetes Inventory (CSMDI).
| Domain | Item | Description |
|---|---|---|
| S | 1* | Feeling nervous is a sign of low blood sugar. |
| S | 2 | Blood sugar will go up if you eat too many white foods. |
| S | 3* | People with diabetes have tingling in their feet due to high blood sugar. |
| S | 4 | Falling down is a sign of diabetes. |
| S | 5* | Having to go to the bathroom often at night is caused by diabetes. |
| S | 6* | Diabetes makes people feel thirsty all the time. |
| I | 7 | Family members with diabetes are good sources of diabetes information. |
| I | 8 | People with diabetes understand their disease better than their doctors. |
| I | 9 | People could better control their diabetes if they were given the right information. |
| C | 10 | Weight does not cause diabetes because thin people also get diabetes. |
| C | 11 | Diabetes cannot be hereditary because not everyone in a family gets it. |
| C | 12 | Some people get diabetes because they ate too many sweets when they were young. |
| C | 13 | Everyone is born with diabetes but it develops at different times for different people. |
| C | 14* | Being overweight makes people get diabetes. |
| C | 15* | Diabetes runs in families. |
| Co | 16* | Diabetes causes high blood pressure. |
| Co | 17* | It is difficult for people with diabetes when they have a full-time job. |
| Co | 18* | Diabetes has serious financial consequences. |
| Co | 19* | Diabetes makes it difficult for your body to fight infection. |
| BM | 20 | Drinking lots of water helps to flush extra sugar out of the body. |
| BM | 21* | Stress makes your blood sugar go up. |
| BM | 22* | Managing the size of each meal helps control diabetes. |
| BM | 23 | The only thing people with diabetes need to know is to stay away from sweets. |
| BM | 24* | Doing household chores is enough exercise for someone who has diabetes. |
| BM | 25* | The body processes sugar in fruits and vegetables differently than sugar in sweets and starches. |
| BM | 26 | Blood sugar often goes up and down for no reason. |
| MM | 27 | Taking extra medication helps to manage high blood sugar. |
| MM | 28 | People should adjust their diabetes medication depending on how they feel. |
| MM | 29 | Low blood sugar can be managed by adjusting medication. |
| MM | 30 | Medical treatment cures diabetes. |
| MM | 31 | Taking extra medication makes it okay to eat something sweet. |
S: Symptoms; I: Information; C: Causes; Co: Consequences; BM: Behavioral Management; MM: Medical Management.
Congruent with the biomedical model.
Funding
This work was supported by NIH grant 2 RO1 AG017587-05A1, and NSF grant SES-1229549.
