Abstract
The aim of this study is to conduct a multimethod psychometric reduction in the Parents’ Beliefs about Children’s Emotions (PBCE) questionnaire using an item response theory framework with a pediatric oncology sample. Participants were 216 pediatric oncology caregivers who completed the PBCE. The PBCE contains 105 items (11 subscales) rated on a 6-point Likert-type scale. We evaluated the PBCE subscale performance by applying a partial credit model in WINSTEPS. Sixty-six statistically weak items were removed, creating a 44-item PBCE questionnaire with 10 subscales and 3 response options per item. The refined scale displayed good psychometric properties and correlated .910 with the original PBCE. Additional analyses examined dimensionality, item-level (e.g. difficulty), and person-level (e.g. ethnicity) characteristics. The refined PBCE questionnaire provides better test information, improves instrument reliability, and reduces burden on families, providers, and researchers. With this improved measure, providers can more easily identify families who may benefit from psychosocial interventions targeting emotion socialization. The results of the multistep approach presented should be considered preliminary, given the limited sample size.
Keywords
Statistics from the National Cancer Institute (2010) estimate over 15,000 new pediatric cancer diagnoses each year worldwide, with rates by race and ethnicity being fairly similar across all diagnostic groups (Jemal et al., 2008). Fortunately, survival rates for pediatric cancer have greatly improved over the last half-century, such that over 80% of patients attain five-year survival following diagnosis. Given the average pediatric cancer treatment duration of three years, we can expect several thousand families have cancer-specific psychosocial stressors at any given time (Graziano et al., 2016; Kazak et al., 2001). These data suggest an increased need for continued research on quality-of-life outcomes among pediatric oncology survivors (Eiser et al., 2000).
Extant research suggests approximately 25% of pediatric cancer survivors experience significant problems with social competence, self-esteem, and academic and employment achievement (Aldridge and Roesch, 2007). A significant subset also experience life-disrupting symptoms of depression, anxiety, and treatment nonadherence (Kazak et al., 2005). Researchers have begun evaluating the unique role of parent emotion socialization (ES) in pediatric chronic illness groups, including pediatric oncology, in predicting patients’ psychosocial outcomes and long-term adjustment; however, many of these studies remain ongoing and have not disseminated findings. ES refers to ‘the socialization of children’s understanding, experience, expression, and regulation of emotion’ (Eisenberg et al., 1998). Eisenberg et al. (1998) and Morris et al. (2013) posited parents’ ES beliefs influence children’s socioemotional outcomes by teaching children about the acceptability of emotional experiences and about emotion display rules. Most research has supported the idea of dichotomous supportive versus nonsupportive parent ES practices (e.g. Hildenbrand et al., 2013; Sanger et al., 1991); however, nearly all published empirical studies evaluating the links between parents’ ES practices and child outcomes have been limited to healthy families and have not considered the potentially important role of childhood chronic illness in moderating outcomes. Children with cancer are removed from their social context (Corsano et al., 2015; Diller et al., 2009; Gurney et al., 2009) and behavior management is often interrupted during treatment (Kazak et al., 2005), which may make ES a particularly salient predictor of this group. Additionally, parents may view their ill child as vulnerable and their parenting role as that of making children happy (Mullins et al., 2007; Whiting, 2013), which may interfere with their ability to teach children how to appropriately manage both positive and negative emotions. Unfortunately, no published studies have examined ES in families with children with cancer.
Empirical studies with healthy children have utilized the Parents’ Beliefs about Children’s Emotions (PBCE; Halberstadt et al., 2013) scale to measure parents’ beliefs about shaping children’s emotional experiences; however, the 105-item PBCE may place undue burden on pediatric cancer families already managing cancer-related stressors. One way researchers have begun minimizing patient and family demands associated with study participation has involved reducing the length of measures (James et al., 2002). Questionnaire length has been proposed to affect response burden, or the effort required by an individual to answer a questionnaire (Rolstad et al., 2011). Response burden subsequently affects response rates – how likely participants are to complete study instruments (Rolstad et al., 2011). The longer the questionnaire, the higher the response burden placed on participants, diminishing the number of completed instruments in any given study design (Galesic and Bosnjak, 2009). This finding is particularly relevant to researchers interested in measuring constructs in pediatric oncology, given the overwhelming demands already placed on these families (James et al., 2002).
Fortunately, there are modern measurement methodology frameworks – such as item response theory (IRT) that can assist in refining and reducing existing measures to alleviate participant burden (Embretson and Reise, 2000; Hambleton and Jones, 1993). Classical test theory, in contrast to more modern measurement methodology, is a measurement framework that is based on the traditional ideas of reliability and validity, while forms of modern test theory (e.g. IRT) are based on the assumption that performance on tests are based on individual ability level (e.g. trait-level of person on each of the PBCE constructs) and item qualities (e.g. difficulty of item; Embretson and Reise, 2000). Given that a goal in IRT is to maintain only those items that contribute to unique variance in the measured construct, IRT has been used to reduce the length of measurement scales and to provide more precise psychometric understanding of each item’s contribution to an overall mean score (for review, see Embretson and Reise, 2000).
Understanding psychometric properties of the PBCE questionnaire in the context of pediatric oncology may help inform measurement literature and ultimately provide a richer quantitative perspective on how ES constructs may impact psychosocial outcomes and long-term adjustment in pediatric cancer patients. In addition, test information provided within an IRT framework may provide markers to identify families who are ‘low’ or ‘high’ on each of these constructs (e.g. negative emotions are valuable, children use emotions to manipulate others, children can control their emotions) and potentially inform psychosocial interventions. Overall, the refined PBCE questionnaire will be able to gauge PBCE in families affected by pediatric cancer to support coping and adjustment to cancer. To provide better test information, improve instrument reliability, and reduce measure length to limit participant burden in families affected by pediatric oncology, the current study examined the PBCE questionnaire within an IRT framework.
Method
Caregivers were approached and recruited by research personnel during (1) pediatric oncology patients’ outpatient clinic appointments or (2) inpatient at a pediatric oncology treatment center located in the southwest region of the United States. To participate, adult caregivers needed to have a child with an existing oncological diagnosis (e.g. leukemia, Wilms tumor, brain tumor) for a minimum of six weeks, identify as a primary caregiver (i.e. live with child at least 50% of the time and provide at least 50% of the child’s care), and be able to read and/or speak English or Spanish. Caregivers were approached by research personnel, provided consent, and completed several paper and pencil measures as part of a larger study. The local institutional review board granted ethical approval prior to study initiation.
Measures
Demographics questionnaire
Caregivers completed a brief demographic questionnaire regarding age, sex, ethnicity, and race pertaining to self and child. Additionally, caregivers provided information on their child’s oncological illness (e.g. diagnosis, treatment received, date of diagnosis), family constellation, family income, and caregiver educational level.
PBCE questionnaire
The Parents’ Beliefs about Negative Emotions questionnaire (Halberstadt et al., 2008) and its successor, the PBCE scale (Halberstadt et al., 2008), are existing measures of parents' beliefs about children's emotions. The PBCE contains 105 items across 11 subscales that caregivers rate on a 6-point Likert-type scale (1 = Strongly disagree, 6 = Strongly agree). Exploratory and confirmatory factor analyses have demonstrated good factor structure for the PBCE when used with Caucasian, African-American, and Lumbee Native American parents (Halberstadt et al., 2013). Subscales of the PBCE (see Table 1 for a list of subscales and definitions) have also shown good predictive validity for parents’ ES practices (Wong et al., 2009), children’s emotional understanding, and children’s perceptions of competence with peers (Wong et al., 2008). Subscales of the PBCE scale have shown good internal consistency in other studies as well (α = .78 to .86; Halberstadt et al., 2008). Spanish translation of study questionnaires was conducted by native Spanish speakers and included forward and backward translation (forward and backward translation conducted by different translators). Caregiver completion time for the 105-item PBCE ranged between 30 and 120 minutes when completed in one administration; however, some caregivers chose to complete the PBCE measure over several days or appointments secondary to fatigue, response burden, or interruptions by members of their child’s medical team. The PBCE completion time could not be assessed for caregivers who chose to complete the measures over several days and/or appointments.
PBCE subscales, constructs, and items.
PBCE: parents’ beliefs about children’s emotions.
Note: For the privacy subscale, the higher the score, the more parents believe that they need to know what children are feeling, and the lower the score, the more parents are accepting children’s privacy.
Statistical analyses
The IRT framework was approached in three distinct phases and utilized a graded response model (Samejima, 1997), which is appropriate for ordered categorical responses like the PBCE Likert scales. The analyses presented below follow recommendations for measurement refinement from an IRT perspective (Linacre, 2013) but deviate somewhat from initially proposed procedures, given that instrument modification is a partially data-driven and iterative process. Reise and Yu (1990) suggest IRT models can be estimated with as few as 250 participants but recommend including 500 participants to reduce the standard errors of parameter estimates. The current study had a limited recruitment phase, which interfered in obtaining a large sample of participants. As such, the results of the multistep approach presented here should be considered preliminary, given the sample size (N = 216).
Analyses progressed in several stages. First, for each subscale, response option performance was assessed by examining Rasch–Andrich thresholds using WINSTEPS (Linacre, 2012), which provides graphical representations of probability curves for each item. In addition, the frequency of endorsed response options was examined to identify whether items were particularly discriminating among participants. Based on these findings, refinement of the response options (e.g. collapsing categories) was considered and employed for each subscale to improve the psychometric properties and the discriminating nature of each subscale. Second, item performance was evaluated to assist in scale reduction. Scale reduction involved eliminating items whose response options did not conform to an ordered Rasch–Andrich threshold or demonstrated poor item infit/outfit statistics.
Third, model fit of the refined and reduced PBCE instrument was analyzed using a Rasch Rating Scale Model (RSM) and a Partial Credit Model (PCM) in WINSTEPS version 3.74.0 (Linacre, 2012). χ 2 difference analyses were then employed to identify the best-fit model for the data. The most optimal and parsimonious model was then used in all subsequent analyses.
Lastly, several item-level and instrument-level statistics were examined in WINSTEPS, including item fit, person fit, item discrimination, and instrument dimensionality. IRT assumes local independence of the items and unidimensionality of each factor or subscale. Currently, there is no standard procedure for establishing adequate unidimensionality; generally, evidence of a dominant factor explaining a large proportion of variance and goodness-of-fit indices are assessed (Embretson and Reise, 2000).
Results
Participants
Caregivers (N = 216, 74.59% female) ranged in age from 21 to 65 years (M = 38.76, SD = 9.99) and self-identified as mothers (75%), fathers (18%), or other guardians (7%). Caregivers identified as Caucasian (64.86%), with the remaining identifying as African-American (19.84%) or other (17.29%). The majority of caregivers identified as non-Hispanic (65.57%). Most caregivers obtained a high school diploma or commiserate certificate (40.4%), with the remaining not having finished high school (10.3%), having some college or associates/professional degree (13.8%) or having obtained an undergraduate or graduate degree (35.5%). Caregivers reported household incomes of US$0–US$20,000 (25.2%), US$20,001–US$40,000 (18.1%), US$40,001–US$60,000 (11.1%), and US$60,001–US$80,000 (34.0%). Patients (N = 216) receiving pediatric oncology treatment protocols ranged in age from 1 year to 19 years (M = 9.96, SD = 4.91). The percentage of males (63%) was higher than that of females. Patients were predominantly Caucasian (64.29%), with the remaining patients identifying as African-American (20.00%) and other (15.71%). The majority of patients identified as non-Hispanic (62.04%). Most patients had a diagnosis of acute lymphoblastic leukemia (41.5%), followed by central nervous system (CNS)/brain tumor (16.3%), osteosarcoma (8.9%), neurofibromatosis (8.1%), lymphoma (8.2%), neuroblastoma (6.5%), and other types of cancer (10.6%). Per chart review, the majority of patients (50.3%) were receiving assistance in the form of government-funded insurance (i.e. Medicaid).
Refinement of the rating scale
Examination of Andrich thresholds across each subscale of the PBCE revealed disordered averages between categories 1 (strongly disagree) and 2 (somewhat disagree), 3 (slightly disagree) and 4 (slightly agree), and 5 (somewhat agree) and 6 (strongly agree). That is, participants appeared to have difficulty differentiating between response options, indicating a possible misunderstanding between adjacent response options. For example, endorsing slightly agree did not require a substantially higher level of the subscale construct (e.g. negative emotions are valuable) than slightly disagree. Similar results were found between adjacent categories 5 and 6, as well as 1 and 2. Consequently, categories 1 and 2 were collapsed to create a new response category of disagree, categories 3 and 4 were collapsed to create a new response category of neither agree nor disagree, and categories 5 and 6 were collapsed to create a new response category of agree. A PCM with the new response categories for each subscale of the PBCE (i.e. disagree, neither agree nor disagree, agree) revealed an improvement in rating scale fit.
Item performance
Examination of item fit for each subscale of the PBCE suggested appropriate infit and outfit (less than 1.33), with the exception of 14 items (see boldface entries in column (a) of Table 2). Due to disordered thresholds and poor item fit, these 14 items were eliminated. Many of the remaining 91 items on the PBCE subscales appeared similar at face value, requiring time and effort on the participant’s part to complete. For example, within the control subscale, the item, children can keep themselves from getting too happy when they want to, appeared to capture a very similar concept as the item, children can keep themselves from getting overly excited when they want to. As such, item discrimination was examined for the remaining items on each subscale of the PBCE utilizing bivariate correlations and paired samples t tests. Results of these analyses suggested 44 items added little to each subscale’s ability to measure the construct, therefore allowing those items to be eliminated from their respective subscales. Our goal in eliminating statistically weaker items was to reduce the questionnaire length without compromising reliability. Reexamination of item fit statistics for the remaining 47 items (see column (b) of Table 2) revealed appropriate infit and outfit (less than 1.33).
Item and scale information from the IRT analysis of the PBCE questionnaire.
IRT: item response theory; PBCE: Parents’ Beliefs About Children’s Emotions; RSM: Rasch rating scale model; PCM: partial credit model.
Note: Infit and outfit values >1.33 indicate poor fit. Boldface entries indicate items with poor fit.
*Items dropped to improve model fit. Items were dropped secondary to (1) disordered Andrich thresholds, (2) poor infit/outfit statistics, or (3) item’s poor discrimination (paired samples t-test results) paired with clinical judgment.
(R) Reverse-coded item.
aValues reported based on original 6-item Likert response scale.
bValues reported based on refined 3-item Likert response scale.
cSubscale dimensionality reported for RSM.
d χ 2 difference values between PCM and RSM.
Subscale dimensionality
Following examination of rating scale performance and item fit, dimensionality was examined for each refined subscale (see column (c) of Table 2). With the exception of the contempt subscale (eigenvalue = 1.6 and 41.4%), unexplained variance in the first contrast for each subscale suggested unidimensionality. As a result, the contempt subscale was eliminated due to multidimensionality and poor discriminating properties, yielding a total of 44 items across 10 subscales (see Table 1 for a comparison of original versus final number of PBCE items).
Comparison of model fit
Following instrument refinement, model fit indices from a PCM and RSM were examined for each of the remaining 10 subscales of the PBCE in WINSTEPS. The PCM is a more complex IRT model because it allows thresholds to vary across items, while the RSM is considered more parsimonious because it restricts thresholds to be equivalent across items. For example, the RSM conceptualizes the measurement scale on a standard metric while the PCM does not. This means that in the RSM, differences between response options are the same – the difference between a score of 4 and 6 is the same as the difference between 2 and 4. Therefore, Rasch response scales are often easier to interpret and provide better clinical utility (Hays et al., 2000). A χ 2 difference test was employed to examine the fit between the two models to identify whether the PCM significantly improved model fit over the more parsimonious RSM. Results from the χ 2 difference test indicated the PCM did not significantly improve model fit over the RSM for the majority of the refined subscales (see column (d) of Table 2). The χ 2 difference test indicated the PCM significantly improved model fit over the RSM for the negative emotions are valuable (Δχ 2(6) = 13.67, p = .03), all emotions are dangerous (Δχ 2(5) = 11.72, p = .04), and emotions just are (Δχ 2(3) = 8.99, p = .03) subscales. However, the more parsimonious RSM was retained and utilized for all subsequent analyses for the purposes of increasing the clinical utility of the refined PBCE measure. By using the RSM all responses can be scored on a standard interval level, allowing results to be compared among those who answer the PBCE differently (Hays et al., 2000).
Examination of item and person fit
Next, item fit and person fit were examined in WINSTEPS 3.74.0 using a RSM. Acceptable item fit suggests that the item parameters are valid (i.e. they accurately represent how participants respond to items), while adequate person fit indicates that individual trait levels (e.g. ES) are valid indicators of that person’s position on the latent continuum (e.g. negative emotions are valuable; Embretson and Reise, 2000). According to Baker (2001), infit/outfit cutoff statistics for item-fit parameters are ≥1.33, while person-fit cutoff statistics are ≥2.00 (Li and Olejnik, 1997; Seo and Weiss, 2013. Examination of item fit revealed that all remaining items on the 10 remaining subscales of the PBCE revealed acceptable infit and outfit statistics (i.e. they did not evidence infit and/or outfit statistics above the 1.33 cutoff) and were retained for further analyses (see column (b) of Table 2). Examination of person fit on all of the remaining PBCE subscales revealed, on average, approximately 12 (5.55%) participants with an inadequate infit or output value above 2.00. Given the preliminary nature of the current investigation with a limited sample size (N = 216), participants were not eliminated from any of the instruments based on person-fit statistics.
Discussion
Data from health and psychosocial measures are likely to be compromised if patients find completing instruments a burden (Jenkinson et al., 2001). Instruments with a relatively moderate number of items can present a considerable challenge for families who are already overwhelmed by treatments, numerous medical appointments, and the psychosocial toll associated with a life-threatening illness such as pediatric cancer. Consequently, brevity is to be sought whenever possible in the design and implementation of research measures (Jenkinson et al., 2001). We conducted a psychometric evaluation to reduce and refine the 11-subscale, 105-item PBCE using IRT methodology. We identified 61 statistically weak PBCE items based on data collected from 216 families affected by pediatric cancer. These items were removed resulting in a 10-subscale, 44-item PBCE. This newly refined and shortened instrument has improved psychometric properties with considerably reduced response burden compared to the original PBCE.
As part of the measurement refinement process, an important goal of the present study was to explore the performance of the six-option response scale. While Likert scales with numerous options are usually developed to capture a variety of responses, measurement researchers suggest numerous response options are not necessarily better given that they are sensitive to nuanced measurement problems (Jacoby and Matell, 1971). In fact, several empirical studies suggest Likert scales with too many options tend to be unreliable (Jacoby and Matell, 1971; Matell and Jacoby, 1972). It appears that the large number of response options on the PBCE in the current study became too cumbersome for respondents to use, as evidenced by the samples’ lack of endorsement of some response options entirely on several items (e.g. when children show pride in what they have done, it is a good thing; it is important for children to show their positive emotions with others; it’s the parents’ job to teach children how to deal with distress and other upsetting feelings). One plausible explanation to these findings is that the purported benefits of increasing variability in response options are outweighed by participant fatigue (Lavakas, 2008). Additionally, the analytical sensitivity of the response scale may be compromised when respondents interpret the response options in different ways. In other words, what one participant may describe as slightly agree may mean the same, in absolute terms, as what another participant might describe as slightly disagree. This phenomenon is amplified when the number of potential responses is large, which decreases the interpretability and therefore clinical utility of findings (Jamieson, 2004; Lavakas, 2008). As such, rating scale performance analysis resulted in new response categories for each subscale of the PBCE (i.e. disagree, neither agree nor disagree, and agree), which revealed an improvement in rating scale fit. Results of the present study suggest researchers in measurement development or refinement minimize the number of Likert response options in order to reduce participant fatigue, increase participant interpretation of response scales, improve the overall psychometric properties of the instrument, and improve the clinical utility of measures.
An additional goal of the present study was to empirically explore the structure of each of the 11 original subscales (e.g. negative emotions are valuable) of the PBCE using IRT. Following instrument refinement, results from the current study revealed a one-factor structure for each subscale of the PBCE with the exception of the contempt subscale. After examining item-fit statistics (i.e. infit and outfit values), only three items remained as part of the contempt subscale (see Table 2, column (b)). It is possible that these items, although statistically sound, were inappropriate for measuring the construct of contempt for children’s emotions. For example, item 2, sarcasm is an effective way to get children to change what they are doing, appears to have poor face validity. It is also possible that contempt for children’s emotions, as currently measured by the PBCE, is not a relevant construct for caregivers of pediatric oncology patients. This might be particularly true for a subgroup of families (approximately 25–30%) that have significant difficulties in personal and social domains (Colletti et al., 2008; Vannatta and Gerhardt, 2003). Often, these families show increased parental overprotection and perceived child vulnerability (Colletti et al., 2008). Parental overprotection involves protective parenting behavior that is considered excessive given the child’s developmental age (Colletti et al., 2008; Thomasgard and Metz, 1997). Caregivers who exhibit overprotective parenting and perceive their child to be vulnerable might refrain from showing contempt for their child’s emotion as this might be seen as detrimental to a child who is undergoing cancer treatment. Consequently, the contempt subscale was eliminated due to multidimensionality and poor discriminating properties, resulting in 10 subscales.
Following the initial instrument refinement phase, several subscales of the PBCE contained both positively and negatively worded items. All reverse-coded items (e.g. It is not helpful for parents to make fun of their children’s feelings; see Table 2 for all reverse-coded items) were removed from the PBCE because they appeared to distort the measurement model. Findings contradict earlier work in the area of scale development that indicates a preference for reverse-coded items in most summated measures (DeVellis, 2003). Originally devised as a way to minimize inattention and acquiescent responding in individuals secondary to item fatigue, measurement problems outweighed the potential benefits of the inclusion of items that are worded in the opposite direction (Lavakas, 2008). One of the problems is that reverse-coded items frequently produce unexpected factor structures (Netemeyer et al., 2003), an undesirable characteristic of subscales that are supposed to be unidimensional. Another issue when making or utilizing a scale composed of items with opposing meanings is miscomprehension (Swain et al., 2008), as it is easy for respondents to misinterpret phrases that include negation. These problems are often compounded when scales are translated for use in other languages (Wong et al., 2003). Measurement issues concerning linguistic translations are increasingly important as psychosocial research aims to include representative samples of ethnic and cultural groups reflecting the composition of the US population (Varricchio, 2004). Findings suggest researchers in measurement development or refinement refrain from using reverse-coded items, particularly in instruments with few items where fatigue is unlikely to play a role in participant response selection.
The current study had notable strengths, including being the first to examine the PBCE using IRT methodology to reduce participant burden as well as utilizing a diverse (e.g. ethnicity, language) and unique sample (i.e. caregivers of pediatric oncology patients). Despite numerous strengths, a major limitation regarding sample size is worth noting. The current study included 216 participants, which rendered analyses informative, yet exploratory and preliminary in nature. Having more participants would have increased the power and robustness of statistical tests. With more participants, differential item function analyses among caregiver subsets could have been explored. These analyses would have determined whether specific items (e.g. Sometimes it is good for a child to sit down and have a good cry) function differently among caregivers based on sex, race/ethnicity, patient diagnosis (e.g. leukemia vs. CNS tumors), or caregivers’ linguistic preferences (e.g. Spanish vs. English).
The role of IRT in new instrument development and refinement has grown substantially over the last decade, with a greater number of studies examining the usefulness of IRT for developing disease-specific questionnaires (Jenkinson et al., 2001). The current study adds to this growing literature with findings that suggest the newly refined, psychometrically sound PBCE can be used to acquire data relevant to ES in families affected by pediatric cancer. These data will be important in determining which families may benefit from psychosocial interventions targeted at ES.
Footnotes
Declaration of Conflicting Interests
The author(s) declare that they have no conflict regarding this article and have no involvements that might raise the question of bias in the work reported or in the conclusions, implications, and opinions stated.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
