Abstract
Self-report questionnaires based on Harter’s response format (“Some kids . . . but other kids . . . ”) are commonly used in developmental and clinical research settings, but the reliability and validity of this format in middle childhood are still under debate. The current study aimed to test the psychometric proprieties of Harter versus Likert response formats as applied to two attachment questionnaires in a sample of 410 Italian children aged 8 to 10 years. Participants completed the Experiences in Close Relationships–Revised Child version (n = 102, 4-point Likert-type scale; n = 104, adapted Harter version) and the Security Scale (n = 95, Harter’s format; n = 109 adapted 4-point Likert version). Results of multigroup confirmatory factor analyses indicated that the two response formats exhibited comparable reliability and factorial validity, although a slight superiority of Harter’s format emerged for the Security Scale. External validity was supported for both answer formats. Implications for developmental theory and practice are discussed.
Self-report measures are among the most widely used assessment tools in the social sciences, as they can be administered to large samples and measure constructs that would be difficult to detect with other kinds of procedures (e.g., behavioral measures). In the past years, there has been a substantial increase in the development of self-report questionnaires designed for children with both clinical and research purposes. Such increase is consistent with the evidence that children’s opinions represent a useful and valid source of knowledge (Achenbach, McConaughy, & Howell, 1987; Sturgess, Rodger, & Ozanne, 2002; Sturgess & Ziviani, 1996). Indeed, children may be the best informants especially with regard to their emotions, since they have better access to their internal states than do teachers or parents (Lagattuta, Sayfan, & Bamford, 2012; Madigan, Brumariu, Villani, Atkinson, & Lyons-Ruth, 2016).
Much attention has been devoted to investigating how item characteristics (e.g., complexity of questions, number of items, item format, presence or absence of neutral points, wording, and direction of scale items) may affect the response quality and reliability of questionnaires in adult populations (for overviews, see Hartley & Betts, 2010; Shulruf, Hattie, & Dixon, 2008). Yet only a handful of studies have investigated these issues in the assessment of children. Albeit informative, results from extant research conducted with adults cannot be generalized to children due to substantial differences in their cognitive, linguistic, communicative, and social skills.
The present study aimed to examine the reliability and psychometric proprieties of Harter’s (1982) response format as compared with the most popular and widely used Likert format. The former asks children to choose between two opposite statements (i.e., “Some kids . . . but other kids . . . ”) and was introduced by Harter within the Self-Perception Profile for Children (Harter, 1985/2012) to offset children’s tendency to give socially desirable responses. In the developmental literature, this format has been used less frequently than the Likert format presumably because it is time consuming and more demanding in terms of cognitive resources. Despite its potential usefulness with younger children, validity and differences of this response format with respect to other, more direct answer formats such as Likert scales have not yet been investigated with school-age children. Here, we used two self-report tools designed to assess attachment in middle childhood, namely the Security Scale (SS; Kerns, Aspelmeier, Gentzler, & Grabill, 2001) and the Experiences in Close Relationships–Revised Child version (ECR-RC; Brenning, Van Petegem, Vanhalst, & Soenens, 2014). These questionnaires are widely used to measure perceived quality of attachment relationships in children aged between 8 and 12 years. Specifically, the SS has a “some/other” format (Harter, 2012), while the ECR-RC employs a Likert-type response format. For the purpose of this study, we developed two additional versions of both questionnaires by (a) adapting the original answer format of the SS into a Likert response scale and (b) adapting the original answer format of the ECR-RC into Harter’s response format. These versions were compared using a multigroup confirmatory factor analysis (MG-CFA) approach.
Using Self-Reports in Middle Childhood
The developmental period of middle childhood (ranging from 7/8 to 11/12 years) overlaps with Piaget’s operational stage (Inhelder & Piaget, 1958) and represents a crucial phase in human development, as it is characterized by prominent changes in cognitive, social, and emotion regulation abilities. During this period, together with the improvement of language and reading skills, children begin to understand different points of view and temporal relations (Selman, 1980), and there is a considerable stride in memory retention, abstract thinking, cognitive flexibility, as well as metacognitive skills (Raikes & Thompson, 2005). Despite these advancements, many children still have difficulties with logical forms of sentences (i.e., negative wordings/sentences) and tend to be very literal in the interpretation of items (De Leeuw, Borgers, & Smits, 2004; Holaday & Turner-Henson, 1989). It is only after entering the formal operational stage (from 11/12 years onward) that children develop the ability to think about abstract concepts, and logical negation becomes well-developed. This entails that the child is able to understand a statement and to simultaneously construct its negation. Thus, both stages play a critical role in the ability to understand questions or respond to items involving the choice among multiple alternatives which, in turn, may impact on children’s accuracy of responses (Borgers & Hox, 2000).
Self-reports used in research with school-age children generally employ several types of response formats, including a dichotomous, two-choice format (e.g., yes/no), the Likert-type scale format (e.g., not at all like me/very much like me), Visual analog scales, Faces Scales, and Harter’s format (i.e., “Some kids . . . but . . . other kids”). However, studies investigating potential differences in the reliability of data obtained using these response formats with children are still scarce. Empirical evidence suggests that item wording, educational level, and cognitive development can result in different levels of reliability (e.g., Borgers, Hox, & Sikkel, 2004; Borgers, Leeuw, & Hox, 1999; Borgers, Leeuw, & Hox, 2000; De Leeuw & Otter, 1995; Marsh, 1986; Pantell & Lewis, 1987). For instance, younger children and children with poor reading skills are less able to respond to negatively worded items (Borgers et al., 2000; Marsh, 1986). To reduce the effect of reading skills, some investigators read the entire questionnaire aloud (Danielson & Phelps, 2003).
Missing data are another common occurrence that deserves attention in the assessment of children through self-reports. Indeed, previous research suggests that younger children produce fewer nonresponses to ambiguous response scales compared with older children (e.g., Borgers & Hox, 2001). Although this finding could be interpreted in terms of better data quality, “this counterintuitive effect is likely to be related to children’s cognitive-developmental stage. It is assumed that younger children do not recognize the ambiguity of the response scale, which leads to more, however less reliable, responses” (Fuchs, 2009, p. 1).
Taken together, these studies suggest that although children are valuable informants in referring their own opinion and internal states, the choice of an age-appropriate response format is paramount to obtain reliable and valid data.
Likert Versus Harter Response Format
In the context of survey research with school-age children, Likert scales are the most widely used format. They were introduced to measure attitudes (Likert, 1932) and allow to quantify participants’ choices as opposed to, for example, dichotomous response options (e.g., yes/no). People are asked to report their level of agreement with a series of statements on an ordinal scale, with item response options usually varying between 3 and 7 points. Compared with the aforementioned dichotomous format, this type of scale has the advantage of providing more response options, thereby increasing the variability of item distribution. However, mixed opinions exist concerning the number of scale points to be used in Likert scales when respondents are children. Some studies suggest that scale reliability increases from 2 to 5 points (Lissitz & Green, 1975), even though offering too many response options may result in a cognitive overload for young children (Borgers & Hox, 2000). Other studies have shown that school-age children respond in a similar way to 3- and 5-point scales (Chambers & Johnston, 2002). Borgers et al. recommend to offer no more than four to five response options (Borgers & Hox, 2001; Borgers et al., 2004). It has also been observed that the effect of response order decreases with age (Fuchs, 2005), that verbal labels are more easily understood than numerical ones (Borgers & Hox, 2000), and that reliability is higher when all points—rather than only the two extremes—are labeled (Borgers & Hox, 2000). Moreover, younger children have a stronger tendency to respond by selecting the extreme values of the scale compared with older children and adults (Chambers & Craig, 1998; Chambers & Johnston, 2002). Despite some investigators have adjusted their scales by reducing the number of response options to facilitate children’s comprehension (e.g., Gullone & Taffe, 2012; Lau & Lee, 2008; Wright & Asmundson, 2003), this tendency does not necessarily depend on the number of choices included in the response array, but may be related to the construct that is being measured (Borgers et al., 2004).
Harter’s format, also known as the “some/other” format (Harter, 1982), has been designed to offset children’s tendency to give socially desirable responses. This format requires children to read two opposing statements, such as, “Some kids worry that their mom does not really love them BUT Other kids are really sure that their mom loves them” (Kerns et al., 2001). After choosing the child that best fits them, participants are asked to indicate whether the description is “really true” or “sort of true” for them. Since this response format is less intuitive, it requires more time to explain how it works and entails children to process more words, therefore, potentially increasing the total cognitive burden of the questionnaire. Harter (1982) argues that administration time and cognitive burden involved in this type of answer format are worthwhile, because this format is likely to elicit more accurate self-descriptions on issues in which social desirability pressures might be present: “The effectiveness of this question format lies in the implication that half of the adolescents in the world (or one’s reference group) view themselves in one way, whereas the other half view themselves in the opposite manner” (Harter, 2012, p. 5). Therefore, this type of question legitimizes both choices (i.e., “some/other”), and simultaneously increases the number of response options (i.e., 4-point scale) provided by the typical dichotomous format (Harter, 2012).
The effectiveness of this response format in reducing socially desirable responses has been evaluated by Harter (1982), who found that the correlation between children’s perceived competence as assessed via the Perceived Competence Scale for Children and social desirability measured by the Children’s Social Desirability Scale (Crandall, Crandall, & Katkovsky, 1965) was .09, whereas the association between perceived competence as assessed via Coopersmith’s Inventory of Self-Esteem (Likert-type format) and the Children’s Social Desirability Scale was .33.
Since the Self-Perception Profile for Children (Harter, 1985/2012) was first introduced, it has been used in many studies of self-concept, including national surveys of adolescents (e.g., National Longitudinal Survey of Youth, 1997 Cohort, NLSY:97, Bureau of Labor Statistics, 2005; the 4-H Study of Positive Youth Development, Lerner, von Eye, Lerner, Lewin-Bizan, & Bowers, 2010). Furthermore, Harter’s response format was applied to questionnaires assessing other relevant developmental constructs in middle childhood, such as attachment (e.g., Preoccupied and Avoidant Coping Questionnaire, Finnegan, Hodges, & Perry, 1996; Security Scale, Kerns et al., 2001), reporting good psychometric proprieties (e.g., Marci, Lionetti et al. 2018). Yet the literature on the validity of this format in comparison with the more widely used, “direct” format (i.e., Likert scale) is extremely sparse. As far as we know, such information is limited to two studies by Yeager and Krosnick (2011, 2012) involving adolescents and adults. Results of both studies suggested that the Harter-type format failed to increase validity and reliability. Indeed, answers to Harter-type questions reported lower indices of criterion validity than did answers to Likert-type questions. Thus, the authors concluded that its use should be avoided. However, it should be noted that the studies were conducted with adolescents and adults, thereby questioning the generalizability of their findings to children. Moreover, despite the target items used by the authors covered different constructs (i.e., social relationships, self-esteem, deviant behavior, and academic achievement), they focused their attention on testing differences in terms of criterion validity. Hence, the extent to which this response format may differ from more “direct” formats (e.g., Likert scale, dichotomous questions) in terms of factor structure in middle childhood is still unclear.
The Present Study
The current study aimed to examine the reliability of the “some/other” response format in comparison with the Likert-type format in questionnaires assessing attachment in middle childhood. Within the attachment field, self-reports specifically designed for school-age children have been developed following Harter’s (1982) format, while tools originally designed for use with adolescents and adults and subsequently adapted for younger children use the Likert format. In this article, we focused on two widely used questionnaires designed to measure attachment in middle childhood, namely the Italian versions of the ECR-RC (Marci, Moscardino, & Altoè, 2019) and the SS (Marci, Lionetti et al., 2018). The former has a Likert-type scale response format, in line with the original scale used with adults (ECR, Fraley, Waller, & Brennan, 2000), whereas the latter has Harter’s (1982) format. For the purposes of this study, in addition to the existing versions, we developed adapted versions of (a) the ERC-RC by applying Harter’s response format, (b) the SS by applying a Likert-type answer format. To compare the two competitive answer formats of both questionnaires, we evaluated the factor structure and internal consistency of the four scales via CFAs, tested factor structure invariance across formats via MG-CFAs, and examined the equivalence between the two questionnaires in terms of convergent and concurrent validity.
Because Harter’s format is supposed to involve a higher cognitive load, we hypothesized that it would result in less optimal psychometric properties of the adapted ERC-RC and original SS than the two questionnaires in Likert-type format. With regard to convergent validity, since empirical evidence suggests that the quality of attachment relationships in middle childhood is positively related to children’s social competence (Groh et al., 2014), this type of validity was examined by testing the associations of the ECR-RC and SS (both original and adapted) with prosocial behavior as reported by teachers using the Strength and Difficulties Questionnaire (SDQ; Goodman, 1997). Specifically, based on previous research on adolescents (see Yeager & Krosnick, 2011), we expected that the Likert-type questionnaires would be more strongly associated with the external measure considered in this study compared with the Harter-type questionnaires.
Method
Participants
Participants attending third and fourth grades were recruited from 22 classrooms of 6 public primary schools in Northeast Italy. Written informed consent was obtained from 466 families (95.5% of total sample). Eight children were absent on the day of data collection. In addition, children who reported intellectual disabilities or certificated developmental or learning disorders (n = 37) and those who were classified as having borderline IQ or lower (<85, Diagnostic and Statistical Manual of Mental Disorders–Fourth Edition–Text Revision, n = 9) were excluded from subsequent analyses. Two children did not fill out one of the target instruments (i.e., ECR-RC) and thus were not included in statistical analyses. Hence, the final sample consisted of 410 children who were native Italian speakers (55.3% girls) aged between 8 and 10 years (M = 8.8, SD = 0.56). Most children came from high- (75.1%) or medium-income (23.4%) families as measured via the Family Affluence Scale (FAS, Boyce, Torsheim, Currie, & Zambon, 2006, see Measures section for more details).
Procedure
Scale Adaptation Process
In the current study, we used the Italian versions of the ECR-RC and the SS, which were adapted by our own research group in previous published work using standard translation–backtranslation procedures. Based on these extant versions, we created the modified versions of the ECR-RC (Harter format) and SS (Likert format) directly in Italian language.
In the adaptation of the ECR-RC (from Likert to Harter format), an opposite statement for each item was formulated following Harter’s method (i.e., “Some kids . . . but . . . other kids . . . ” ). Both statements were then independently evaluated by the three authors to ensure that they reflected the original meaning of the Likert format. Any disagreements were resolved through discussion. Children received the following instructions: The following statements describe two kinds of kids and the way they feel about themselves in relation to their mom. We are interested in knowing which of these kids is most similar to you. First, choose the child that best fit with you, and then mark the response that corresponds to how you evaluate yourself using “Somewhat True for Me” or “Really True for Me.” Please remember that there are no right or wrong answers.
An adapted version of the SS (from Harter to Likert response format) was created by including only one of the two statements used in each of the original Harter-type questions (i.e., Some kids . . . but other kids . . . ). For each question, the statement was selected independently by three of the authors by evaluating the extent to which each statement (a) appropriately captured security, (b) would result in a less ambiguous wording if presented alone, and (c) could limit the use of “double negatives” in the Likert-type form. After extensive discussion, each statement was adapted to a first-person Likert format (e.g., “I find it easy to trust my mom”). A psychologist with expertise in cognitive and reading development also reviewed the final version of the Likert-type questionnaire. As a result, the newly adapted questionnaire included nine positively and three negatively keyed items, a proportion that maps onto the proportion of negatively keyed questions included in the original ECR-RC (Likert format).
Instructions for the participants were as follows: Below you find a series of statements concerning your feelings in the relationship with your mom. Please indicate how much each statement is true for you using the following scale: “Not true at all for me,” “A little true for me,” “Somewhat True for Me,” or “Really True for Me,” Remember there are no right or wrong answers.
The adapted versions of both questionnaires (ECR-RC, SS) are included within the supplementary material, which is available online.
Data Collection
Data collection took place between April and November, 2017. The study is part of a larger project on the socioemotional development of school-age children, which was approved by the Ethics Committee of the School of Psychology at the University of Padova (protocol #[1838-2016]). After receiving approval from the Principal of the participating schools, a letter was sent to children’s parents in order to explain the nature of the study and ask for written consent. All children whose parents signed the informed consent form were involved in the study. The latter was obtained from both parents, while verbal assent was obtained from each child prior to data collection. Children completed questionnaires in the classroom and in a single morning session during school hours.
Before the questionnaire booklet was delivered, children were explained that they were free to stop their participation at any time, and that their answers would remain anonymous. Items were read aloud by the researcher to minimize possible effects due to differences in reading ability, and children followed and filled in their responses. Given that children completed the questionnaires collectively, we randomly assigned classes to one of the following four groups: Group 1 (G1; n = 102): children who completed the mother scales of the Italian version of the ECR-RC; Group 2 (G2; n = 104): children who completed an ad hoc Harter-type adapted version of the ECR-RC; Group 3 (G3; n = 95): children who completed an Italian version of the SS with Harter’s format and the original Likert-type ECR-RC; Group 4 (G4; n = 109): children who completed an ad hoc Likert-type adapted version of the SS and the original ECR-RC.
Children also completed a sociodemographic form asking about their age, gender, and socioeconomic status (SES), and the Italian version of the Cattell Culture Free Intelligence Test–Scale 2a (CFIT; Cattell & Cattell, 1981). Furthermore, the main teacher reported on children’s prosocial behavior at school by completing the corresponding subscale included in the SDQ (Goodman, 1997; Tobia & Marzocchi, 2017).
All children were thanked for their participation with a gift certificate.
Measures
The ECR-RC (Brenning et al., 2014) was originally designed to assess romantic attachment in adults (Busonera, San Martini, Zavattini, & Santona, 2014; Fraley et al., 2000) and has been recently adapted for use with children and adolescents in the context of parent–child relationships. It allows to measure attachment-related anxiety in terms of concern about social support and fear of abandonment and rejection (e.g., “I worry that my father/mother does not really love me”), as well as attachment-related avoidance defined as avoidance of intimacy, discomfort with closeness, and self-reliance (e.g., “I prefer not to tell my father/mother how I feel deep down”) separately for mother and father (Brenning et al., 2014). The short version consists of 12 items (6 for anxiety and 6 for avoidance) originally rated on a 7-point scale, ranging from “completely untrue” to “completely true.” For each dimension (i.e., anxiety and avoidance), scores across items are averaged to yield an anxiety and an avoidance score, with higher scores indicating a more anxious and avoidant attachment. Good psychometric proprieties in terms of factor structure, internal consistency, and external validity have been evidenced in the few available studies with school-age children (e.g., Brenning et al., 2014). The psychometric properties of the Italian ECR-RC (Marci, Moscardino, & Altoè, 2019) are comparable to those found for the Belgian (see Brenning et al., 2014) and the Polish (Skoczen, Głogowska, Kamza, & Włodarczyk, 2019) versions. In this study, we focused on the mother form. Given the age of our participants, the Likert version was reduced from 7 to 4 points following recommended guidelines (Borgers et al., 2004).
The SS (Kerns, Klepac, & Cole, 1996) is the most widely used tool to assess attachment during late childhood (Bosmans & Kerns, 2015). It was developed to measure perception of security toward mother and father in children aged between 8 and 12 years, and provides a continuous dimensional score of “felt security.” This score reflects the degree to which a child feels that an attachment figure is responsive and available, the child’s tendency to rely on this figure in times of stress, and the ease of communication with this figure (Kerns et al., 1996). The Italian version (Marci, Lionetti, et al., 2018) consists of 12 items (the same for mother and father) rated using the aforementioned Harter (1982) format “Some kids . . . but other kids. . . . ” Children are asked to choose the statement that best fits them between the two alternatives, and then indicate whether it is “really true” or “sort of true” for them. Each item is scored on a 4-point scale. Scores across items are averaged to provide a continuous security score, with higher scores indicating a higher level of perceived secure attachment. A recent meta-analysis of 57 studies provided evidence for the cross-cultural, convergent, and concurrent validity of the Italian SS in middle childhood (Brumariu, Madigan, Giuseppone, Abtahi, & Kerns, 2018). In this study, children completed the mother-related items of the questionnaire.
The FAS (Boyce et al., 2006) is a four-item measure of family wealth (e.g., “Does your family own a car?”). The sum across items is computed to provide an overall score ranging from 0 to 9, with scores from 0 to 2 denoting low affluence, 3 to 5 medium affluence, and 6 to 9 high affluence. The FAS has provided evidence of validity and reliability across different cultures and countries, including Italy (Vieno, Santinello, Lenzi, Baldassari, & Mirandola, 2009).
The Cattell Culture Free Intelligence Test–Form A (CFIT; Cattell & Cattell, 1981) is a paper-and-pencil test designed for measuring fluid intelligence in children aged between 8 and 13 years. It consists of 46 multiple-choice items divided into four timed subsets (series—12 items, classification—14 items, matrices—12 items, and topology—8 items). Each subtest has a specific time constraint (from 2 to 4 minutes). One point was assigned for each correct response. Scores across items are summed to provide a total raw score ranging from 1 to 46, which is subsequently computed as a standardized IQ score. The CFIT has been shown to have good internal consistency and test–retest reliability (Cattell & Cattell, 1981).
The SDQ (Goodman, 1997; Tobia & Marzocchi, 2017) is a widely used screening questionnaire designed to measure emotional and behavioral problems in children and adolescents. It consists of 25 items distributed across five scales: Emotional Symptoms, Conduct Problems, Hyperactivity–Inattention, Peer Problems, and Prosocial Behavior. Items are rated on a 3-point Likert-type scale and are summed to yield five component scores (i.e., one for each subscale). Higher scores on the Prosocial Behavior subscale indicate a strength, while high scores on the other four subscales indicate difficulties. For this study, the Prosocial Behavior subscale was considered. The questionnaire has been widely used across different societies, including Italy (Tobia & Marzocchi, 2017), and in both clinical and nonclinical settings.
Data Analytic Strategy
All analyses were performed in R (R Development Core Team, 2017). First, we conducted preliminary analyses to evaluate differences in terms of gender and grade distribution between groups who completed the same questionnaire (G1 vs. G2 and G3 vs. G4) using the chi-square test. Furthermore, we tested differences in SES and IQ using t tests for independent samples. Then, item response distributions and descriptive statistics (mean, standard deviation, and skewness) of the target questionnaires (i.e., ECR-RC and SS) were presented and analyzed separately for the two response formats (i.e., Harter vs. Likert).
To test the psychometric proprieties of the two response formats (factor structure, invariance and external validity), data analyses were carried out following three steps.
In the first step, the factor structure of the two competitive formats (Harter vs. Likert: ECR-RC, G1 and G2; SS, G3 and G4) was tested separately for each group by performing CFAs (see Beaujean, 2014, for the procedure in R). The robust diagonally weighted least squares mean and variance estimator specifically designed for ordinal data was used (Brown, 2006; Rhemtulla, Brosseau-Liard, & Savalei, 2012), and missing data (which were less than 1%) were handled with the pairwise maximum likelihood estimation method for factor analytic models with ordinal data available in the R package lavaan (Rosseel, 2012).
A series of goodness-of-fit indices were computed to evaluate model fit: the chi-square to degrees of freedom ratio (χ2/df), the root mean square error of approximation (RMSEA), the comparative fit index (CFI), the Tucker–Lewis index (TLI), and the standardized root mean square residual (SRMR). Cutoff values for fit were considered good if χ2/df was less than 2 (Schermelleh-Engel, Moosbrugger, & Müller, 2003), CFI and TLI were less than .95, RMSEA was less than .06, SRMR was less than .05, and RMSE was less than .08 (Schermelleh-Engel et al., 2003). Additionally, we evaluated the weight of the factor loadings as well as its statistical significance (p < .05). Internal consistency was evaluated for both formats by computing Polychoric Cronbach alphas and McDonald’s Omegas. These indices provide more realistic reliability estimates than Cronbach’s alpha (see Sijtsma, 2009; Zinbarg, Revelle, Yovel, & Li, 2005). Following the procedure explained by Zumbo, Gadermann, and Zeisser (2007), we computed both coefficients on the polychoric correlation matrix, thereby obtaining a more accurate estimation of the relations between the underlying variables due to the consideration of the ordinal nature of the data. “In fact, using Cronbach’s alpha—or any other reliability coefficient—under circumstances that violate its assumptions and/or prerequisites might lead to substantively deflated reliability estimates” (Gadermann, Guhn, & Zumbo, 2012, p. 1).
In the second step, we tested measurement invariance of the two formats by performing MG-CFAs (i.e., G1 vs. G2 and G3 vs. G4). First, we tested configural invariance by allowing all structural parameters to remain free; then, we simultaneously tested metric and scalar invariance by constraining factor loadings and thresholds to be equal across groups (Muthén & Muthén, 2001). If metric and scalar invariance hold, it implies that the meaning of the constructs (the factor loadings) and the levels of the underlying items (thresholds) can be assumed to be equal in both groups. In our specific metric and scalar invariance, this would lend support to the hypothesis that the latent constructs may be operationalized the same way across the same items, despite the latter are based on different types of response formats. Delta CFI was computed between the two most proximal models (i.e., configural vs. strong). Cutoff values of CFI (ΔCFI) should be less than or equal to .01, and RMSEA (ΔRMSEA) less than or equal to .015 (Cheung & Rensvold, 2002). Internal consistency reliability was inspected by calculating polychoric alpha coefficients.
Third, we assessed external validity (i.e., convergent and concurrent validity) of the two response formats by adopting a structural equation model approach. In particular, convergent validity of the two questionnaires (ECR-RC, G1: Harter format, G2: Likert format; SS, G3: Harter format, G4: Likert format) was assessed by performing two structural equation models to evaluate the extent to which each latent attachment dimension (i.e., ECR-RC: anxiety and avoidance; SS: felt security) was linked to the latent prosocial behavior score (as measured via the SDQ). Using the same procedure, we tested concurrent validity by evaluating the pattern of associations of the latent SS scores (i.e., G3: Harter format; G4: Likert format) and the latent anxiety and avoidance scores as assessed via the original ECR-RC.
Results
Preliminary Analyses
Chi-square tests supported the assumption that G1 and G2 did not differ with respect to gender distribution, χ2(1, 206) = 0.016, p = .90, Cramer’s phi = .009, or grade composition, χ2(1, 206) = 2.4232, p = .120, Cramer’s phi = .108. Additionally, independent samples t tests showed that the two groups did not differ in terms of IQ, t(204) = 0.477, p = .634 dCohen = .067 or SES, t(203) = −1.3499, p = .179, dCohen = .189.
Similarly, preliminary analyses comparing G3 and G4 supported homogeneity in terms of gender distribution, χ2(1, 204) = 1.548, p = .213, Cramer’s phi = .087, and grade composition, χ2(1, 204) = 0.995, p = .319, phi = .069. Independent samples t tests also indicated that the two groups of children did not differ in terms of IQ, t(202) = −1.855, p = .065, dCohen = .260. However, children who completed the adapted SS Likert questionnaire (G4) reported lower SES scores, t(201) = −2.1789, p = .031, dCohen = .307, than their Harter-type counterparts (G3).
Descriptive Statistics
Figure 1 shows item response distributions for the ECR-RC by response format (i.e., Likert vs. Harter). Overall, in both questionnaires there was a high percentage of children who consistently scored in the first category of response (i.e., 1 = not agree), resulting in a positive skew. Item response distributions by response format for the SS are depicted in Figure 2. The majority of items in both formats showed a negative skew.

Response distributions of items from the Experience in Close Relationships–Revised Child version by response format.

Response distributions of items from the Security Scale by response format.
The percentage of nonresponses per item was very low in both questionnaires and formats (<.01%). Thus, none of the ECR-RC or SS items were problematic with respect to each response format. Further descriptive statistics for both questionnaires by format are reported in Table 1A and Table 2A in the appendix section.
Factor Structure and Internal Consistency
The goodness of fit of the models estimated on the ECR-RC (i.e., G1 and G2) yielded good fit indices for both Likert- and Harter-type versions (see Table 1). However, while all paths across items were significant in Harter’s format (all ps < .001), one item (Item 1) was marginally extraneous to the factor structure (p = .052) in the Likert format. Factor loadings along with the two competitive response formats appear in Table 1. Polychoric Cronbach’s alphas for anxiety-related attachment dimensions were .78 (95% confidence interval [CI: 0.85, 0.92]) and .79 (95% CI [0.85, 0.92]) for Likert and Harter’s format, whereas McDonald’s Omegas were .78 (95% CI [0.64, 0.90]) and .64 (95% CI [0.57, 0.77]) for Likert and Harter formats, respectively. With regard to the avoidant attachment dimension, polychoric Alphas reached the same coefficient for both Likert (α = .69, 95% CI [0.68, 0.82]) and Harter formats (α = .69, 95% CI [0.68, 0.83]), while McDonald’s Omegas were .69 (95% CI [0.57, 0.77]) and .66 (95% CI [0.53, 0.77]) for Likert and Harter’s format, respectively.
Fit Indices of the Confirmatory Factor Models of the ECR-RC and SS for Both Response Formats.
Note. G1: n = 104; G2: n = 102; G3, n = 95; G4, n = 109. ECR-RC = Experiences in Close Relationships–Revised Child version; SS = Security Scale; CFI = comparative fit index; TLI = Tucker–Lewis index; SRMR = standardized root mean square residual; RMSEA = root mean square of approximation; WRMR = weighted root mean square residual; CI = confidence interval; ΔCFI = difference among CFIs; ΔRMSEA=difference among RMSEAs.
With regard to the SS (G3 and G4), goodness-of-fit indices of the models are presented in Table 3. With the exception of RMSEA, the majority of considered indices for the Likert model ranged from acceptable to good. However, findings pointed to a slightly better fit to the data for Harter’s format, with an excellent fit in all of the considered indices. Of interest, the factor analytic results were quite similar for the two response formats, and all paths were significant in both models. Table 2 shows the factor loadings by response format. Polychoric alpha coefficients (Harter: .83, 95% CI [0.77, 0.88] and Likert: .84, 95% CI [0.79, 0.88]) as well as Omega coefficients (Likert: ω = .73, 95% CI [0.61, 0.80] and Harter: ω = .72, 95% CI [0.63, 0.81]) were quite similar across formats. To evaluate the degree of measurement invariance (e.g., equal factor loadings and thresholds) between answer formats, we used MG-CFAs.
Standardized Factor Loadings of the ECR-RC Across Response Formats (Harter vs. Likert).
Note. Harter’s format, n = 104; Likert format, n = 102. ECR-RC = Experiences in Close Relationships–Revised Child version; SE = standard error. All factor loadings were significant at the 5% level, except for Item 1 in Likert format (p = .052).
Adapted version.
Results for the ECR-RC supported configural invariance (without parameter restrictions) of the two competitive formats, suggesting that the factor structure was similar between the Likert and Harter forms. Hence, in the subsequent step, we held loadings and thresholds invariant across G1 and G2. Results showed that all the considered fit indexes achieved a good fit, and both ΔRMSEA and ΔCFI were lower than the recommended cutoff values (.003), suggesting that metric and scalar invariance were supported.
Multigroup analyses also supported configural invariance between the two forms of the SS (i.e., Harter and Likert), yielding a good fit in all the considered indexes. When loadings and thresholds were constrained to be equal across formats (i.e., metric and scalar invariance), although all fit indexes reached an acceptable fit, ΔRMSEA and ΔCFI slightly exceeded the recommended values (see Table 3). Thus, the results failed to provide evidence for metric and scalar invariance.
Standardized Factor Loadings of the SS Across Response Formats (Harter vs. Likert).
Note. Harter’s format, n = 95; Likert format, n = 109. SS = Security Scale; SE = standard error. All factor loadings were significant at the 5% level.
Adapted version.
External Validity
Convergent validity of the two answers format was evaluated by testing the extent to which each latent attachment dimension (G1 and G2: anxiety and avoidance; G3 and G4: felt security) was associated with the latent prosocial behavior score.
As can be seen in Table 4, no significant associations were found between the two attachment-related dimensions assessed via the ECR-RC forms (G1: Harter; G2: Likert) and teacher-reported prosocial behavior (see Table 4). Regarding the SS, a similar pattern of associations was found in the two groups (i.e., G3: Harter; G4: Likert). Specifically, in both Harter and Likert-type formats, felt security toward mother was significantly and positively associated with prosocial behavior as reported by teachers (see Table 4).
Associations of Anxiety, Avoidance, and Felt Security to Prosocial Behavior.
Note. ECR-RC = Experience of Close Relationships–Revised Child version, Harter: n = 104, Likert: n = 102; SS = Security Scale, Harter: n = 95, Likert: n = 109.
p < .01. ***p < .001.
With regard to concurrent validity, felt security (measured via the Harter- and Likert-type versions of the SS) was negatively associated with attachment avoidance (i.e., G3, Harter = −0.653 and G4, Likert = −0.997, respectively) and anxiety (i.e., G3, Harter = −0.560 and G4, Likert = −0.680, respectively) as assessed via the original ECR-RC.
Discussion
Over the past 20 years, the development and use of self-report measures with young children has greatly increased in both research and clinical contexts. Indeed, children have been recognized as reliable informants who are able to provide valid information about themselves. This conclusion is corroborated by research demonstrating that exclusive reliance on adult reports may, for example, underestimate children’s feelings of worry and anxiety (Lagattuta et al., 2012). Furthermore, knowledge of children’s point of view is essential because young participants “have a right to be intimately involved in the decisions being made about them” (Sturgess et al., 2002, p. 109). In this perspective, the choice of a specific answer format (e.g., number of scale points and response format) plays a critical role in ensuring the quality of collected data to accurately reflect children’s attitudes, feelings, or behaviors. The present study is among the first to directly compare the validity of Harter versus Likert response formats as applied to the same questionnaires designed for school-aged children. Indeed, although previous research has explored the validity of Harter’s format and differences between the two competitive answer formats, these studies were based on adolescent and adult samples and can therefore not be generalized to younger children. Another important contribution of this study is the use of MG-CFA to examine whether latent constructs may be operationalized the same way across the same items with different types of response formats, thereby allowing a fine-grained comparison from a multivariate perspective. Here, we used two well-known questionnaires for measuring attachment in middle childhood, namely the SS (original version with Harter’s format, adapted version with Likert scale) and the ECR-RC (original version with Likert scale, adapted version with Harter’s format).
With regard to the factor structure of both questionnaires, CFAs were conducted separately for each response format. Overall, the results supported good psychometric properties for both Likert and Harter formats of the two questionnaires. Yet, when we tested for equivalence, the results of MG-CFAs suggested that scalar invariance was supported only for the ECR-RC. With regard to the SS, contrary to our expectations, Harter’s format reached slightly better fit indices than the competitive Likert format. Subsequent MG-CFAs provided support only for configural invariance, while evidence for strong invariance (i.e., metric and scalar) was not found. Indeed, differences in RMSEA and CFI values between proximal models (configural vs. strong) exceeded the recommended values. Thus, it appears that the validity of response formats depends—at least in part—on the type of questions asked to measure attachment in children. In particular, this pattern of results suggests that there is no “gold standard” in terms of answer format to be used in attachment questionnaires for school-age children; rather, it depends on the conceptualization of the construct to be assessed, and on the types of questions/items used to evaluate such construct. However, further investigations are needed to ascertain whether the validity of response formats depends on content-related aspects (e.g., operationalization of underlying attachment dimensions), or if it is affected by the adaptation process itself (i.e., conversion from Likert to Harter format, and vice versa).
In terms of convergent validity, our findings indicated that both formats met the theoretical expectation that the parent–child relationship represents the foundation for children’s social development (Bretherton, 1987). Specifically, secure children are more socially competent, more accepted by peers, and report higher levels of positive and trustful interactions with others when compared with insecure children (Kerns et al., 1996; Sroufe, 2005). Consistent with this well-recognized pattern, a significant and positive association between perceived attachment security and teacher-rated prosocial behavior was found for both forms of the SS (G3 and G4). Secure children have internalized a model in which distress is being met with care (“secure base script,” Waters & Waters, 2006), and others are expected to be responsive and generally well-intentioned (Waters & Waters, 2006). These representations likely support children’s prosocial behavior by providing a bearing for how others’ needs might be addressed, and by promoting the motivation to meet their needs as individuals worthy of care (Gross, Stern, Brett, & Cassidy, 2017).
We did not find, however, any significant association between the avoidance and anxiety dimensions as assessed via the two ECR-RC forms (i.e., Harter and Likert) and prosocial behavior. This finding can be interpreted in light of the concerns raised by Fraley et al. (2000) about the capability of the ECR-R for adults to sufficiently capture the security region of attachment, and does not seem to be related to response format. Thus, further studies are warranted to evaluate whether the measurement of security in association with the assessment of anxiety and avoidance might be of particular added value to capture the construct of attachment in middle childhood, especially in relation to positive outcomes. Furthermore, it may be useful to sample children at risk for developing insecure attachment to evaluate whether one of the two response formats is more sensitive in capturing insecurity in middle childhood (i.e., discriminant validity).
In relation to concurrent validity, in line with previous studies (e.g., Brenning, Soenens, Braet, & Bosmans, 2011), results showed strong negative correlations between felt security (as measured by the SS) and the avoidance and anxiety dimensions (as assessed via the ECR-RC) in both response formats. This finding is consistent with a continuous-dimensional approach (Brennan, Clark, & Shaver, 1998), which suggests that individual differences in attachment can be represented along two fundamental attachment-related dimensions, namely anxiety and avoidance. In line with this approach, secure attachment is the result of low anxiety and low avoidance in the child’s relationship with parents. Of interest, these associations were more evident for the Likert format, especially in relation to the avoidance subscale. Future investigations may address whether this finding is due to a superior validity of the Likert-type response format, or to the homogeneity in answer formats (i.e., both questionnaires with Likert scales).
Taken together, our findings support the use of both Harter and Likert answer formats in terms of psychometric properties, convergent validity with another self-report measure assessing attachment-related dimensions, and concurrent validity. Contrary to previous studies with adolescent and adult samples (Yeager & Krosnick, 2011, 2012), the results do not allow to establish the prominence of one of the two competitive answer types in middle childhood.
Perhaps Harter’s format might be particularly suitable for younger children, since the cognitive load involved in the two-step response process (i.e., children first choose which of the two descriptions is more like them, and then indicate whether the description is sort of true/or really true for them) might be offset by the clear statements provided by Harter-type questions. Also, the absence of double negatives (i.e., negatively worded questions associated with a negative response, such as “false” or “not like me”) makes it easier for younger children to understand and respond. Given that the latter primarily think in a dichotomous way (Gelman & Baillargeon, 1983), responding to Harter’s format—which basically consists of two dichotomous questions—may facilitate the answer process without compromising the reliability of the questionnaire. However, cognitive abilities vary considerably from 8 (early middle childhood) to 12 years (late middle childhood). For example, memory capacity becomes increasingly comparable with the level of adults, with older children (aged 11-12 years) showing better performance in complete recall than younger children (aged 8-9 years; see Saywitz, 1987). This aspect has relevant implications for the number of response options to be used in questionnaires (Cole & Loftus, 1987; De Leeuw et al., 2004). Furthermore, socially desirable responses and context effects tend to be more pronounced among older children (Scott, 1997). Thus, the expression “some kids” could sort a different effect among older versus younger children, as the former are more sensitive to being considered as “kids.” Overall, these different cognitive characteristics can lead to variations in the process and reliability of response within this age range. Thus, more studies are needed to shed light on how Likert and Harter formats work from early to late middle childhood.
Despite its relevant contribution to the literature on the use of self-reports with children, our study has limitations that need to be acknowledged. First, our comparison was limited to attachment questionnaires. Future studies might use a broader range of items assessing other psychological constructs to evaluate if these results are limited to the attachment field, or if they can be generalized to the measurement of other constructs in middle childhood. Second, given the relatively small sample size in each group of children, replication with larger samples is warranted. In particular, since the present study focused on Italian children aged 8 to 10 years, further studies replicating these findings in late middle childhood (i.e., aged 10-12 years) across culturally diverse samples are auspicable to allow greater confidence in the generalizability of results and to test invariance across response formats and ages. A third limitation is the exclusive reliance on self-report questionnaires to establish concurrent and convergent validity. Future research may include other assessment methods (e.g., interviews and behavior observations) to obtain a more comprehensive picture of children’s consistency of responses across direct and indirect measurement tools. Fourth, further studies might consider other domains (e.g., academic achievement and externalizing behavior) for which the reliability and accuracy of children’s answers can be verified by means of additional informants (e.g., parents and teachers). Last, the present study did not include a measure of social desirability to ascertain whether Harter’s format (compared with Likert format) reduces socially desirable responses in children, as it was beyond the scope of this work. Addressing this issue could be the focus of future research to better understand the role of answer format in potential response bias across middle childhood.
To conclude, our data indicate that Likert- and Harter-type response formats exhibit comparable reliability and validity across the two most widely used attachment questionnaires in school-age children, although a slight superiority of Harter’s format was found for the SS. From an applied perspective, the current study suggests that the same measures might be used in both formats based on the context. For instance, in clinical settings—where social desirability is recognized as a relevant issue within the professional–child relationship—the Harter-type format might be more suitable to reduce this potential source of bias by making children feel more comfortable in expressing their own emotional experience. Because Harter’s format carries an “impersonal structure,” it might facilitate children’s sharing of their internal states, which are particularly difficult to report during this developmental period.
Supplemental Material
Supplemental_Material – Supplemental material for Using Harter and Likert Response Formats in Middle Childhood: A Comparison of Attachment Measures
Supplemental material, Supplemental_Material for Using Harter and Likert Response Formats in Middle Childhood: A Comparison of Attachment Measures by Tatiana Marci, Ughetta Moscardino, Francesca Lionetti, Alessandra Santona and Gianmarco Altoé in Assessment
Footnotes
Appendix
Descriptive Statistics of SS Items Across the Two Formats.
| Items | Harter’s format | Likert format | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| n | M | SD | Skew | Range | n | M | SD | Skew | Range | |
| Item 1 | 95 | 3.73 | 0.49 | −1.51 | (2-4) | 109 | 3.71 | 0.50 | −1.34 | (2-4) |
| Item 2 | 95 | 3.66 | 0.61 | −1.87 | (1-4) | 109 | 3.01 | 1.25 | −0.66 | (1-4) |
| Item 3 | 95 | 3.27 | 1.03 | −1.08 | (1-4) | 109 | 3.34 | 0.94 | −1.25 | (1-4) |
| Item 4 | 95 | 2.99 | 1.04 | −0.60 | (1-4) | 109 | 2.97 | 1.13 | −0.59 | (1-4) |
| Item 5 | 94 | 2.65 | 1.00 | −0.10 | (1-4) | 109 | 2.59 | 1.18 | −0.02 | (1-4) |
| Item 6 | 95 | 3.68 | 0.78 | −2.52 | (1-4) | 108 | 3.94 | 0.37 | −6.34 | (1-4) |
| Item 7 | 95 | 3.28 | 0.92 | −1.07 | (1-4) | 109 | 3.48 | 0.75 | −1.41 | (1-4) |
| Item 8 | 95 | 3.61 | 0.80 | −2.13 | (1-4) | 109 | 3.80 | 0.66 | −3.34 | (1-4) |
| Item 9 | 95 | 3.00 | 1.04 | −0.45 | (1-4) | 109 | 2.68 | 1.23 | −0.18 | (1-4) |
| Item 10 | 95 | 3.36 | 0.78 | −1.10 | (1-4) | 109 | 3.61 | 0.86 | −2.09 | (1-4) |
| Item 11 | 95 | 3.40 | 0.82 | −1.30 | (1-4) | 109 | 3.61 | 0.75 | −2.02 | (1-4) |
| Item 12 | 95 | 3.58 | 0.81 | −2.00 | (1-4) | 109 | 3.80 | 0.49 | −2.84 | (1-4) |
Note. Harter’s format, n = 95; Likert format, n = 109. SS = Security Scale.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
