Abstract
Children’s unique developmental and contextual needs make it challenging to measure empathy validly and reliably. This scoping review is the first to collate currently available information about self-report, other-report, and performance-based questionnaire measures of empathy for children aged up to 11 years. Following the guidelines for Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR; Tricco et al., 2018), a literature search using PsycINFO, Scopus, and Google Scholar identified 24 relevant measures of empathy in children, with publication years spanning 1958 to 2019. Questionnaires could broadly be classified into four groups, according to the extent to which they were developed with children’s developmental needs and contexts in mind, and were based on contemporary theory and research findings. There was a distinction between performance-based measures, which elicited children’s empathy-related responses to novel content and therefore assessed situational state empathy, and self- and other-report measures, which rated children’s general empathic tendencies and thus assessed dispositional trait empathy. Results highlighted the importance of researchers having clarity on their definition of empathy and choosing measures consistent with this, and the merit of utilizing a multimodal assessment approach.
Keywords
The development of, and capacity for, empathy in humans is a topic of growing research interest. The ability to express or withhold empathy for fellow living beings seems linked to our capacity to engage in a wide range of behaviors, from exceptionally selfless and thoughtful deeds, to acts of unthinkable cruelty (Baron-Cohen, 2011; Decety et al., 2016; Hawkins et al., 2017; McPhedran, 2009). Yet the study of empathy and our understanding of its development in humans have been significantly affected by difficulties associated with its accurate measurement. Historically, there was greater disagreement between researchers regarding empathy’s precise definition, and whether empathy is predominantly a cognitive (e.g., Deutsch & Madle, 1975; Hogan, 1969) or an affective experience (e.g., Mehrabian & Epstein, 1972). Over time, agreement has been established that empathy involves both cognitive and affective components (see Eisenberg & Fabes, 1990).
Recent developments in neuroscience have further shaped the contemporary understanding of empathy. Brain imaging has identified regions that reliably activate when humans in experimental settings are primed to experience an empathic response (see Tousignant et al., 2017 for an overview). These areas appear to reflect the processes of emotional contagion (i.e., having another individual’s emotions directly trigger a similar personal affective experience), self-other awareness (i.e., distinguishing oneself from the person experiencing the observed emotion), perspective-taking (i.e., cognitively appreciating the other person’s position, how they are feeling, and why), emotional regulation (i.e., having control over the intensity of one’s empathic response), and, according to many researchers, also altruistic motivation (a desire to improve others’ welfare or experiences, arising from these neurological processes) (Decety & Moriguchi, 2007; Richaud et al., 2017; Tousignant et al., 2017).
Consequently, most researchers have greatly broadened their understanding of empathy. An example of a common contemporary description of empathy is, “a complex construct emerging from several interacting components, including bottom-up and top-down processes” (Tousignant et al., 2017), with researchers typically subsequently clarifying which components are perceived as important in an empathic response, usually based on current neuroscientific findings. Preston and de Waal (2002) take a particularly broad view, arguing that empathy comprises a range of psychological phenomena including mirroring, bodily synchronization, various forms of imitation, and emotional contagion, because these are all foundational to human and animal social behavior. However, Coplan (2011) argues that a clear and precise definition of empathy and its associated processes is necessary for developing true understanding. Coplan (2011) defines empathy as “a complex, imaginative process through which an observer simulates another’s situated psychological states while maintaining clear self-other differentiation,” and describes empathy as having three essential features: affective matching, other-oriented perspective-taking, and clear self-other differentiation. Coplan argues that the many contributing psychological phenomena or processes should be viewed as distinct, and conceptualized individually. Coplan’s emphasis on the importance of definitional clarity is similarly articulated by Batson (2009), who identified eight different ways in which the concept of “empathy” could be defined and understood. Achieving definitional clarity is important to guide the accurate measurement of empathy.
Distinguishing Between Empathy and Associated Constructs
Emphasis has been placed on distinguishing empathy from associated concepts, such as sympathy, compassion, personal distress in response to others’ suffering, and theory of mind (e.g., Gallant et al., 2020; Singer & Lamm, 2009). Such distinctions are important because these different capacities are thought to have different relationships with what is arguably a major reason for interest in studying empathy—its perceived behavioral manifestations, such as kind and caring acts toward others. Sympathy and compassion differ from empathy in that they do not require shared feelings with the other person (Singer & Lamm, 2009). Instead, they involve feelings of concern, pity, or care toward another who is suffering. Compassion might be defined as the capacity to feel sympathetic concern toward unfortunate others, also associated with a motivation to alleviate their suffering (Preckel et al., 2018). As a primary response to others’ suffering, personal distress can be viewed as arising from emotional contagion, in that an individual feels the negative feelings experienced by another person, but struggles to regulate these emotions. The emotional response remains self-focused and dysregulated, and research suggests that individuals who are prone to high personal distress reactions are less likely to help suffering others (Eisenberg & Fabes, 1990; Preckel et al., 2018; Singer & Lamm, 2009). Theory of mind, the ability to infer others’ mental states including thoughts, feelings, beliefs, and desires (Gallant et al., 2020), is thought to be conceptually linked to cognitive and affective empathy. However, theory of mind does not require shared emotions with the other person, and research investigating its links with measures of cognitive and affective empathy indicates a distinction between these constructs, and different developmental trajectories (Gallant et al., 2020; Wang & Wang, 2015).
Prosocial behavior is commonly viewed as a manifestation of empathy toward others. However, individuals might engage in prosocial behavior for a range of reasons, including their moral reasoning or principles, to receive praise or recognition, or to achieve another personal goal. Some individuals might also be highly empathic but limited in how this manifests behaviorally. For example, young children with more inhibited temperaments have been observed to express less empathy toward distressed others (Young et al., 1999), but the extent to which this reflects their internal experience is unclear. Despite the widespread assumption that empathy drives various socially beneficial behaviors, some evidence suggests that sympathy may be more closely associated with higher prosocial behavior, and lower aggressive behavior (e.g., Vossen et al., 2015). Helpful behavior toward suffering others is more likely if individuals can maintain an other-oriented focus (Batson, 2009), which is influenced by the capacity to maintain a self-other distinction and good self- and emotional-regulation skills (Decety & Jackson, 2006).
The Development of Empathy in Children
Hoffman (1975, 2000) proposed a four-stage model of the development of empathy in children, which remains relevant today. Hoffman proposed that mature empathy requires individuals to perceive themselves as independent from others, physically and in terms of their emotional states, personal identities, and contextual life factors. Empathizing individuals must also know how they would feel in particular situations, and have a sense of how others would feel in these situations. They must understand that although facial cues and nonverbal behaviors may indicate how a person is feeling, people may also mask their feelings depending on the context. Although infants and young children do not have these sophisticated capacities, Hoffman (1975, 2000) noted that they can be empathically aroused through more primitive mechanisms such as conditioning, association, and mimicry. Hoffman proposed that these more primitive mechanisms combine with children’s social-cognitive development, such that empathy in children develops across four stages: unclear self-other differentiation, awareness of self and others as distinct physical entities, awareness of self and others as having distinct emotional states, and awareness of self and others as having distinct personal histories, identities, and lives. Although Hoffman proposed age ranges for each of these stages, he noted that these may vary widely between children.
Hoffman’s (2000) first stage of empathic development in children, Global Empathy, is characterized by an unclear distinction between self and other, such that another’s distress evokes a child’s own genuine distress. This is first reflected in newborns’ tendency to cry reactively when hearing another newborn’s cry, and develops over the first year of life. Infants might feel personal distress in response to others’ negative emotions and experiences, and seek comfort as though they themselves have been hurt. Hoffman (2000) argued that this stage takes place between birth and approximately 12 months. From around 6 months of age, typically developing infants develop a growing understanding that they are physically distinct from others. This gives rise to the second stage of empathy development, Egocentric Empathy, which occurs between the ages of around 1 and 3 years. During this stage, children can physically distinguish between themselves and others; however, they still have little conception of others’ internal states, and may seek to comfort distressed others according to what they themselves would find comforting.
Children aged between 3 and 8 years enter Hoffman’s (2000) third stage of empathy, Empathy for Another’s Feelings. During this stage, children become increasingly aware of others’ emotions, and that these might differ to their own. Children become more sensitive to others’ emotional cues, and their language development improves their ability to label their own and others’ emotions, and express empathy for others (Schonert-Reichl, 2011). Children also develop their empathy toward those with whom they have no direct interaction—for example, people affected by a natural disaster overseas.
Hoffman’s (2000) fourth stage of empathic development, Empathy for Another’s Condition, takes place during late childhood and early adolescence. Young people’s increasing cognitive maturity and complexity allows them to feel empathy for others whose identities, histories, and lives may be very different from their own, including entire disadvantaged groups. Hoffman’s theory is supported by empirical research showing that human empathy development progresses from more rudimentary to sophisticated mechanisms, in line with and alongside other domains of human development (Schonert-Reichl, 2011).
Given the significant development in humans’ empathic capacities from birth to adulthood, approaches for empathy measurement differ according to developmental stage. This paper focuses on the measurement of empathy in children aged up to 11 years. This is because the unique cognitive, language, and social-emotional developmental needs of this younger age group must be accommodated when measuring empathy, as must their context. For this age group, the many questionnaires designed for older adolescents and adults are generally not appropriate.
Measurement of Empathy: Controversies
In addition to the definitional issues reviewed above, debate continues regarding how empathy might most validly be measured. For children aged up to 11 years, possible options for measuring empathy include self-report (for children aged over approximately 3 years), other-report (parent or teacher), performance-based questionnaire measures, observations of behaviors in structured or naturalistic scenarios (and interpreting these behaviors—such as facial expressions, vocalizations, or helping behavior—as a proxy for empathy; e.g., Strayer & Roberts, 1997), physiological measures (e.g., heart rate, facial muscle movement, and skin conductance; Eisenberg & Fabes, 1990; Neumann et al., 2015), and neurological measures (e.g., magnetic resonance imaging [MRI], functional magnetic resonance imaging [fMRI]; Neumann et al., 2015; Zhou et al., 2003). Each of these approaches has its advantages and limitations (see de Minzi et al., 2016; Zhou et al., 2003, for overviews).
In psychology research, the most popular measurement approach for empathy in children aged up to 11 years has been utilizing self- and parent-report questionnaires. This form of measurement is convenient and relatively low cost. The first empathy questionnaire adapted specifically for children was developed by Bryant (1982). During this earlier phase, empathy questionnaires for children were adaptations of questionnaires originally developed for adults (e.g., Bryant, 1982; Dadds et al., 2008; Garton & Gringart, 2005). Although these measures served an important purpose by filling a gap in the literature, over time, their wording has become dated or their validity more questionable, given our evolving understanding of empathy. Because they have a long history of widespread use in research and applied settings, this review will focus exclusively on self-report, other-report, and performance-based questionnaire measures of empathy.
Since 2006 particularly, novel empathy questionnaires for children and young people have increasingly emerged. Beginning with the development of the Basic Empathy Scale (BES; Jolliffe & Farrington, 2006), most more recent questionnaires have endeavored to contain original, contemporary items with more convincing face validity, in light of current conceptualizations of empathy (e.g., Overgaauw et al., 2017; Raine & Chen, 2018; Reid et al., 2013; Richaud et al., 2017; Rieffe et al., 2010; Vossen et al., 2015). Questionnaire developers have also attended carefully to psychometric properties of their instruments. The creation of these questionnaires has been invaluable to the advancement of the study of empathy in children. However, results have varied regarding the extent to which each questionnaire convincingly appears to measure empathy (and associated constructs); the quality of each questionnaire’s psychometric properties; and the extent to which findings using the measure make intuitive sense. There is now a wide range of self-report, other-report, and performance-based measures of empathy in children aged up to 11 years, but there is no centralized means of gaining information about the appropriateness and psychometric properties of each available measure. This is the aim of this review.
Various limitations have been identified regarding the use of self-report questionnaires when measuring empathy in children. Developmentally, children may not have the language or cognitive ability to accurately convey an internal capacity like empathy (Wang & Wang, 2015). Children’s self-reports of empathy might also be influenced by social desirability biases (Pechorro et al., 2017), and self-reports may differ depending on whether any involved experimenter (e.g., reading out the questions) or protagonist in a questionnaire is the same or opposite sex to the child participant (see Lennon et al., 1983). There may also be concern about the extent to which child-reported empathy correlates as expected with variables of interest, and whether such correlations reflect genuine relationships between well-operationalized and measured constructs, or other factors, like social desirability biases.
Parent- or teacher-report measurement approaches might address these concerns. Overwhelmingly in psychology research, parent-report has been utilized moreso than teacher-report, perhaps for convenience reasons. However, this approach has its own challenges. Parents may have little experience of child behavior norms and they also may be positively biased to assume a greater empathic capacity in their own children than is actually present. Research shows some evidence of subjectivity in parent ratings of child empathy—for example, mothers’ ratings can be higher than fathers’ (e.g., Dadds et al., 2008). Also, when comparing child-reported versus parent-reported empathy scores with theoretically related constructs, the correlation patterns are often different depending on the rater, and inter-rater agreement can be low (de Minzi et al., 2016). In many cases, parent ratings may be a reasonable reflection of their child’s empathic capacity. However, the validity of parent ratings will be affected by factors such as parent mental health, the parent-child relationship, the amount and breadth of interaction between parent and child, and parents’ familiarity with typical child development.
Furthermore, although many questionnaire measures distinguish between different facets of empathy—such as cognitive and affective empathy—as well as having overall scores, some do not. In such cases, a high or low overall empathy score may not assist with understanding the underlying reason for the score. For example, a low overall score might reflect a child with positive intentions, but difficulties with the skills underlying cognitive empathy; or it might reflect a child with little interest in, or care toward, the thoughts and feelings of others, despite having the technical ability to decode relevant cues accurately (see Murphy, 2019a). These different presentations would require different types of support, and would likely lead to different developmental trajectories. They would also lead to different theoretical conclusions in research studies.
In a significant recent development, Murphy and Lilienfeld (2019) examined the widespread assumption that self-reported cognitive empathy serves as a proxy for actual cognitive empathy ability. The authors conducted a meta-analysis of 85 studies, both published and unpublished, in which self-reported cognitive empathy scores and performance on cognitive empathy tasks (e.g., the Reading the Mind in the Eyes Task; Baron-Cohen et al., 2001) were both reported; 11 of the included studies focused on children. The authors found that self-reported cognitive empathy scores accounted for only approximately 1% of the variance in performance on wide-ranging behavioral assessments of cognitive empathy. This contribution was not significantly different to that of affective empathy. These results held true for both adult and child populations studied.
In line with cautions provided by previous researchers (e.g., Davis & Kraus, 1997; Ickes, 1993; Realo et al., 2003), Murphy and Lilienfeld (2019) suggested that their meta-analysis results could in part be explained by phenomena including the Dunning-Kruger effect (Kruger & Dunning, 1999), in which less skilled individuals overestimate their performance and more skilled individuals underestimate their performance. Limitations in individuals’ metacognitive abilities may also play a role; few individuals are likely to have the metacognitive ability to correctly judge their capacity to recognize and understand others’ thoughts and feelings (Murphy & Lilienfeld, 2019), and this may be particularly true for children. Personal qualities like narcissism and overconfidence might also have an influence—previous research with adults has indicated that these are related to self-reported mindreading ability, but not actual mindreading performance (Ames & Kammrath, 2004). Although these findings relate to adults, similar considerations are relevant to children, and these findings point to the value of including scales measuring potentially relevant variables such as narcissistic tendencies, confidence, or social desirability biases, when attempting to measure empathy in children via self- or other-report.
Murphy and Lilienfeld’s (2019) results are concerning, but should be interpreted with caution. Their review focused only on the relationship between self-reported cognitive empathy and behavioral measures of cognitive empathy performance. Twenty percent of the included papers were unpublished, and no studies were excluded on the basis of quality. There was evidence of differences in results between published and unpublished studies, though these differences did not substantially affect the interpretation of overall trends in the meta-analysis. Included measures of cognitive empathy performance were wide-ranging—from judgments about thoughts and feelings of dyadic partners, to the same judgments of individuals in videotaped and other interpersonal interactions, to basic face and voice recognition tasks. The extent to which all these measures were tapping into the same construct of cognitive empathy, rather than at times, skills feeding into this capacity, is unclear. The included self-report measures of cognitive empathy were also diverse, and there was an emphasis on studies that used older measures such as the Interpersonal Reactivity Index (IRI; Davis, 1983). Such older empathy questionnaires may not always exclusively measure empathy as intended because our understanding of empathy and its distinction from constructs such as sympathy, personal distress, and compassion has evolved over time. Finally, relatively few of the included studies focused on children.
Despite these issues, the authors’ meta-analytic findings paralleled the conclusions of previous researchers who had investigated the relationship between self-reported ability, and actual interpersonal performance or empathic accuracy (e.g., Davis & Kraus, 1997; Hall et al., 2009). On the basis of their findings, Murphy and Lilienfeld strongly cautioned against the exclusive use of self-report questionnaires to measure cognitive empathy, instead recommending the use of batteries of behavioral cognitive empathy measures, at least until such time that there is greater evidence for the validity of self-report cognitive empathy questionnaires. These conclusions are relevant to the broader measurement of empathy in children.
Aims and Rationale of This Review
Despite the limitations identified, there is value in providing a comprehensive review of the range and quality of existing self-report, other-report, and performance-based questionnaire measures available for measuring empathy in children. This might assist researchers in selecting questionnaires that align with their theoretical definitions and priorities for measurement of empathy. It might also assist researchers to have greater awareness of the research investigating the appropriateness, scope, and available alternatives, for a particular empathy measure. This could reduce the likelihood of research results that are unintuitive or difficult to interpret. Outlining features of existing measures could also assist in improving questionnaire measurement of empathy, and could inform future meta-analyses examining the ability of these questionnaires to accurately capture this construct.
This scoping review aimed to collate information about all self-report, other-report, and performance-based questionnaire measures that have been used to specifically measure empathy in children aged up to 11 years. The question framing this scoping review was: What is known from the existing, peer-reviewed, published literature, regarding questionnaire measures that exclusively measure empathy in children aged up to 11 years, and these questionnaires’ psychometric properties? It is hoped that this review will assist in furthering research knowledge regarding whether or not self- and other-report questionnaire measures of empathy appear to provide valid and reliable information about children’s empathic abilities, or could be improved to do so.
Method
As this article describes a scoping review, no ethical approval was required.
Search Strategy
The approach taken for this scoping review was informed by the guidelines in the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR; Tricco et al., 2018). A literature search using PsycINFO and Scopus was conducted on January 27, 2020. Following consultation with a specialist librarian, the following search terms were used: (empath*) AND (self-report or self-rat* or questionnaire or measur* or survey or scale or parent-report or “parent rating”) AND (quotient or index or assessment) AND (child* or adolescen*) AND (validat* or validity or reliability or psychometric*). No limit was applied regarding the dates of publications. For PsycINFO, a general search restricted to peer-reviewed journal articles was conducted. In addition, for Scopus, due to its much broader database, the search was restricted to titles, abstracts, and keywords only. The search results were collated, and duplicates and non-English language articles were removed. Article titles and abstracts were then screened to identify papers that investigated the psychometric properties of self-report, other-report, and performance-based questionnaire measures of empathy in children aged up to 11 years. Articles without this focus were removed. Studies that used a relevant measure as part of broader research, without focusing on empathy measurement in children, or the measure’s psychometric properties, were excluded. Additional measures and articles were identified through checking reference lists and wider reading.
Measures retained were those that exclusively focused on the measurement of empathy in children; were self-report, other-report, or performance-based measures; and had been used with children aged up to 11 years. Measures were excluded if they measured empathy as part of a broader range of skills related to social-emotional functioning; focused on callous-unemotional (CU) traits; were not formal measures; had only been used with those aged 12 years or older; or had been developed and used exclusively within non-English-speaking countries. Two experienced researchers were consulted when required regarding whether or not identified measures met eligibility criteria.
This search strategy identified most articles relating to each eligible measure and its psychometric properties. However, subsequent literature searches using PsycINFO, Scopus, and Google Scholar were also conducted, using each eligible measure’s name as the search term, to identify further articles investigating relevant measures and their psychometric properties. All measures identified for final inclusion were cross-checked using Google Scholar on July 1, 2021 by examining all peer-reviewed publications that cited each relevant journal article (i.e., each article that met inclusion criteria, plus the original journal article for each empathy measure), to ensure that all eligible studies evaluating each measure were included.
Data obtained using this search strategy were charted using an Excel spreadsheet. The charting form had been developed prior to the formal search being conducted, based on a broad review of the relevant literature. Information charted included the name of the measure, its authors, year of publication, country of origin, response format (self-report, other-report, or performance-based), age range (up to and including 11 years) for which the measure has been used in research, number of questionnaire items, scoring method, score range, scales, item examples, and evidence for factor structure, reliability, and validity. Data charting was completed independently by the author, but results were shared with two independent researchers with expertise in the area. As this scoping review aimed to establish and explore the evidence available regarding self-report, other-report, or performance-based questionnaire measures of empathy in children, all articles meeting the above criteria were included, and no method was used for further critical appraisal.
Results
Stage I: Identification of Measures
The process of the scoping review is illustrated in Figure 1. Fifty-six relevant peer-reviewed journal articles were identified that evaluated the psychometric properties of 24 distinct self-report, other-report, and performance-based measures of empathy in children aged up to 11 years. These 24 measures are listed in Table 1.

Flow Diagram of Scoping Review Process to Identify Relevant Measures of Empathy (Self- and Other-Report, and Performance-Based) in Children Aged Up to 11 Years.
Key Characteristics of Empathy Measures for Children Aged Up to 11 Years.
Note. FASTE = Feshbach and Roe Affective Situations Test for Empathy; QMEE = Questionnaire Measure of Emotional Empathy; IRI = Interpersonal Reactivity Index; BEI = Bryant’s Index of Empathy for Children and Adolescents; FTS = Feeling and Thinking Scale; BES = Basic Empathy Scale; GEM = Griffith Empathy Measure; CEAQ = Children’s Empathic Attitudes Questionnaire; STEP = Southampton Test of Empathy for Preschoolers; DPES = Dispositional Positive Empathy Scale; EQ-C = Empathy Quotient-Child; KEDS = Kids’ Empathic Development Scale; AMES = Adolescent Measure of Empathy and Sympathy; EToMS = Empathy and Theory of Mind Scale; EmQue-CA = Empathy Questionnaire for Children and Adolescents; EQCEA = Empathy Questionnaire for Children and Early Adolescents; GES = General Empathy Scale; CASES = Cognitive, Affective, and Somatic Empathy Scales for Children; FERET = Facial Emotion Recognition and Empathy Test; KR-20 = Kuder-Richardson Formula 20; CE = Cognitive Empathy; IC = Intention to Comfort; AE = Affective Empathy.
Stage II: Charting and Description of Measures
Each identified measure was classified into one of the four broad categories, based on the extent to which the measure was specifically developed with children’s developmental needs and contexts in mind, and based on contemporary theory and research findings regarding empathy development in children. The point in time and research context for when the measure was developed was a major influence in determining its classification.
The first category consisted of measures that were originally developed for adults, or were primarily based on existing adult scales, but had been used with children aged 11 years or younger in research literature. These questionnaires included the Questionnaire Measure of Emotional Empathy (QMEE; Mehrabian & Epstein, 1972) and the IRI (Davis, 1980), both of which were developed for adults but used in research with children. In addition, Bryant’s Index of Empathy for Children and Adolescents (BEI; Bryant, 1982) (based on the QMEE), the Feeling and Thinking Scale (FTS; Garton & Gringart, 2005) (based on the IRI), and the Griffith Empathy Measure (GEM; Dadds et al., 2008) (based on the BEI) were included in this category.
The second category consisted of measures whose authors generated original items that were developmentally appropriate for children; however, these questionnaires were developed without explicit reference to contemporary theory and research findings relating to empathy. For example, no measure made a meaningful distinction between cognitive and affective empathy, and none clearly linked item formation to current developmental theory or neuroscientific research regarding the components of empathy. The questionnaires in this category were the Feshbach and Roe Affective Situations Test for Empathy (FASTE; Feshbach & Roe, 1968), the Borke Empathy Scale (Borke, 1971), The Young Children’s Empathy Measure (Poresky, 1990), Ukegawa’s Empathy Scales (Ukegawa, 1996), the Dispositional Positive Empathy Scale (DPES; Sallquist et al., 2009), and the Empathy Quotient-Child (EQ-C; Auyeung et al., 2009).
The third category included more contemporary measures which made a distinction between cognitive and affective empathy in their measurement, but item construction did not explicitly reference current developmental theory or neuroscientific findings regarding empathy development in children. The measures in this category were the BES (Jolliffe & Farrington, 2006), the Children’s Empathic Attitudes Questionnaire (CEAQ; Funk et al., 2008), the Southampton Test of Empathy for Preschoolers (STEP; Howe et al., 2008), the Questionnaire to Assess Affective and Cognitive Empathy in Children (Zoll & Enz, 2010), the Kids’ Empathic Development Scale (KEDS; Reid et al., 2013), and the General Empathy Scale (GES; Mikac et al., 2017).
The fourth category included those questionnaires whose development was most clearly informed by contemporary theory and research findings regarding the measurement of empathy, its distinction from highly related constructs via the measurement process, its development in children, and relevant neuroscientific research evidence. The questionnaires in this category were the Empathy Questionnaire (EmQue; Rieffe et al., 2010), the Adolescent Measure of Empathy and Sympathy (AMES; Vossen et al., 2015), the Empathy and Theory of Mind Scale (EToMS; Wang & Wang, 2015), the Empathy Questionnaire for Children and Adolescents (EmQue-CA; Overgaauw et al., 2017), the Empathy Questionnaire for Children and Early Adolescents (EQCEA; Richaud et al., 2017), the Cognitive, Affective, and Somatic Empathy Scales for Children (CASES; Raine & Chen, 2018), and the Facial Emotion Recognition and Empathy Test (FERET; Coskun, 2019).
Table 1 provides an overview of practically relevant information for all identified measures, listed by year of publication. In interpreting the internal reliability coefficients, the guidelines suggested by DeVellis (2012) were followed, whereby <.60 is unacceptable, .60 to .65 is undesirable, .65 to .70 is minimally acceptable, .70 to .80 is respectable, .80 to .90 is very good, and >.90 is excellent but indicates possible redundancy. For interpreting inter-rater reliability, McHugh’s (2012) suggestions were followed, whereby <.20 is no agreement, .21 to .39 is minimal agreement, .40 to .59 is weak agreement, .60 to .79 is moderate agreement, .80 to .90 is strong agreement, and ≥.90 is almost perfect agreement (see also Santomauro et al., 2020). In interpreting correlation coefficients, Dancey and Reidy’s (2007) guidelines were followed, whereby 0 is no relationship, ±.10 to .30 is a weak relationship, ±.40 to .60 is a moderate relationship, ±.70 to .90 is a strong relationship, and ±1 is a perfect relationship.
The quality and extent of information regarding each measure’s psychometric properties varied widely. Information that was currently available in eligible studies that directly evaluated the psychometric properties of these measurement tools for children aged up to 11 years, and evidence regarding factor structure, reliability, and validity in this age group, is summarized below.
Older Questionnaires Designed for Adults, or Primarily Based on Existing Adult Scales, but Used With Children
QMEE (Mehrabian & Epstein, 1972). This 33-item self-report questionnaire was developed to measure affective empathy in adults; its psychometric properties for children were investigated in one study (Kalliopuska, 1983). The adapted child version uses a 3-point Likert-type scale response format. Example items include, “I am annoyed by unhappy people who are just sorry for themselves,” and “Lonely people are probably unfriendly.” The measure had minimally acceptable internal consistency when used with children. Children’s responses weakly correlated positively with social desirability scale scores. Scores increased with age until 11 years, and girls scored higher than boys. Empathy scores also correlated positively with children’s stage of development for moral judgment, as measured by responses to Kohlberg’s moral judgment dilemmas (Kohlberg, 1981).
IRI (Davis, 1980). The IRI is a 28-item self-report questionnaire developed for adults which measures four components of what was then considered empathy: Perspective-Taking, Fantasy, Empathic Concern, and Personal Distress. The Perspective-Taking and Empathic Concern scales align with contemporary theories of empathy. The psychometric properties of the IRI with children have been investigated in two studies (Holgado Tello et al., 2013; Litvack-Miller et al., 1997). It has a 5-point Likert-type scale response format. Example items include, “When I’m upset at someone, I usually try to ‘put myself in his shoes’ for a while” (Perspective-Taking), “I really get involved with the feelings of the characters in a novel” (Fantasy), “I would describe myself as a pretty soft-hearted person” (Empathic Concern), and “Being in a tense emotional situation scares me” (Personal Distress). For children, Litvack-Miller et al. (1997) found a four-factor structure for their shortened version, broadly consistent with the adult scales; but Holgado Tello et al. (2013) found five different first-order factors and one second-order factor, consistent with other research conducted with the IRI in Spain (Carrasco Ortiz et al., 2011). Subscale internal reliabilities ranged from unacceptable to respectable, and test–retest reliability was moderate. Litvack-Miller et al. (1997) found positive correlations between most factors, apart from Fantasy and Empathic Concern, which were weakly negatively correlated. Females generally scored higher than males (Holgado Tello et al., 2013; Litvack-Miller et al., 1997). Older children scored higher than younger children for Empathic Concern; this was also the most reliable predictor of self- and teacher-reported prosocial behavior, followed by Perspective-Taking (Litvack-Miller et al., 1997). There was support for measurement invariance across sex (Holgado Tello et al., 2013).
BEI (Bryant, 1982). The BEI’s items were derived from the QMEE (Mehrabian & Epstein, 1972); it represents one of the earliest efforts to create an empathy questionnaire for children, and only measures affective empathy. This 22-item self-report measure has used various response formats, including Yes/No, and 3- to 9-point Likert-type scales. Item examples include, “People who kiss and hug in public are silly,” and “Sometimes I cry when I watch TV.” Five studies have examined its psychometric properties with children (Bryant, 1982; Lucas-Molina et al., 2016; de Wied et al., 2007; Huang & Tran-Chi, 2020; Lasa Aristu et al., 2008). Although the BEI was designed to only provide a total score, some subsequent research has argued for it having two (Empathic Sadness, Attitude; de Wied et al., 2007) or three (Understanding of Feelings, Feelings of Sadness, Tearful Reaction; Lasa Aristu et al., 2008; Lucas-Molina et al., 2016) underlying factors, and corresponding subscales. Scale and subscale internal reliabilities have ranged from unacceptable to very good, and test–retest reliability was strong. BEI scores correlated positively with scores on other empathy measures, and were generally uncorrelated with theoretically distinct constructs such as social desirability and reading achievement scores (Bryant, 1982). Higher BEI scores were associated with greater self-reported acceptance of others’ physical proximity, and decreased teacher-rated physical aggression toward classmates for boys (Bryant, 1982). Girls generally scored higher than boys (Bryant, 1982; de Wied et al., 2007; Lucas-Molina et al., 2016; c.f. Huang & Tran-Chi, 2020). Scores increased with age (Bryant, 1982).
FTS (Garton & Gringart, 2005). The FTS items were sourced from the IRI (Davis, 1980); it represented another early effort to develop an empathy questionnaire for children. The 12-item self-report measure utilizes a 5-point Likert-type scale. Only the original paper examined the measure’s psychometric properties (Garton & Gringart, 2005). The measure has two factors: Affective Empathy and Cognitive Empathy. Item examples include, “I often feel worried about people that are not as lucky as me, and feel sorry for them” (Affective Empathy), and “I sometimes try to understand my friends better by pretending I am them” (Cognitive Empathy). Internal consistency for the two scales was unacceptable to minimally acceptable. Females scored higher than males.
GEM (Dadds et al., 2008). The GEM drew from the BEI (Bryant, 1982), with the aim of developing a parent-report questionnaire for children’s empathy, at a time when no such measure existed. The 23 items are completed using a 9-point Likert-type scale. The GEM is derived from a questionnaire that measured only affective empathy, but purports to measure both affective and cognitive empathy (Dadds et al., 2008). It has been argued that the GEM cognitive empathy factor may be an artifact of the reverse-coded items of this scale (Murphy, 2019b). The affective empathy scale is limited to items describing emotion contagion (Dadds, 2019; Murphy, 2019a, 2019b). Item examples are “My child seems to react to the moods around them” (Affective Empathy) and “My child would eat the last cookie, even when they know someone else wants it” (Cognitive Empathy). Although the GEM has been used widely in research, three studies have specifically focused on its psychometric properties (Dadds et al., 2008; Malcolm-Smith et al., 2015; Murphy, 2019a). Internal consistency of the scale and subscales has ranged from unacceptable to very good. Test–retest reliability was strong. Dadds et al. (2008) found that the affective and cognitive empathy scales were uncorrelated, and cognitive empathy was moderately positively correlated with verbal IQ. There was convergence between different raters (self, mother, and father). GEM and BEI total scores were positively associated, and lower GEM scores were associated with greater tolerance for others’ distress in a computer game. Total and cognitive empathy scores increased with age, and girls scored higher than boys. Using the Strengths and Difficulties Questionnaire (SDQ), many relationships between GEM scores and children’s social-emotional and behavioral functioning were in the anticipated direction, but there were also counterintuitive findings: for example, higher affective empathy scores were associated with increased conduct, emotional, and peer problems for girls. Malcolm-Smith et al. (2015) noted that GEM scores correlated positively with parent age, but did not correlate with other child empathy measures. Murphy (2019a) found that in a meta-analysis of 5 studies, GEM cognitive and affective empathy scores accounted for little variance in CU trait levels (c.f. Dadds, 2019; but see Murphy, 2019b for reaffirmation).
Older Questionnaires for Which Original Items for Children Were Generated
FASTE (Feshbach & Roe, 1968). The FASTE is an 8-item performance-based measure in which children are shown a sequence of slides featuring a child protagonist accompanied by a narrative description that avoids affective labels, with each item targeting one of four emotions. The child is then asked how they feel, and how they think the child in the story feels; responses are scored according to the match to the portrayed emotion. Different scoring approaches have been suggested (Feshbach & Roe, 1968; Levine & Hoffman, 1975). Three studies were identified that evaluated the psychometric properties of the FASTE in its original form (Feshbach & Roe, 1968; Hughes et al., 1981; Lennon et al., 1983). Inter-rater reliability for the measure indicated almost perfect agreement. Children scored higher on easier items when interviewed by a same-sex experimenter, suggesting that social desirability may influence performance (Lennon et al., 1983). Feshbach and Roe (1968) also noted that children scored higher when items depicted a character of their same sex. Children’s scores increased with age, and males and females performed comparably (Hughes et al., 1981).
Borke Empathy Scale (Borke, 1971). The Borke Empathy Scale was originally a 15-item performance-based measure of empathy in young children; subsequent adaptations have increased the number of items (Adams et al., 1993; Borke, 1973). Children’s emotion understanding is first confirmed using drawn faces depicting four emotions. In Part 1, children are told stories about child characters whose emotions can readily be inferred. The child selects one face from the four shown that best depicts how the character feels. In Part 2, the child is told stories about themselves behaving toward another child in a way that would cause that child to feel a particular emotion. Participants select the face that best represents how the other child would feel. The child’s accuracy is scored. Three studies examined this scale’s psychometric properties (Adams et al., 1993; Borke, 1971, 1973). Internal consistency across the two parts and total score of the measure was respectable, and split-half reliability was strong. Score reliability and accuracy increased with age (Adams et al., 1993; Borke, 1971, 1973). Females scored higher than males in one study (Borke, 1973), and there was no difference in other studies (Adams et al., 1993; Borke, 1971). Children from different cultural groups (the United States and Taiwan) performed similarly (Borke, 1973).
The Young Children’s Empathy Measure (Poresky, 1990). In this 8-item performance-based measure, the experimenter reads out four scenarios involving a child, targeting but not labeling one of four basic emotions. The participant is asked how the character feels, and how they feel about this. Responses are scored on a scale of 0 to 4 based on the match to the intended emotion. Only the original paper examined this measure’s psychometric properties (Poresky, 1990). There is a cognitive and affective scale. Internal consistency was minimally acceptable, and inter-rater reliability was almost perfect. The cognitive and affective scales correlated positively. Total empathy scores also correlated positively with a measure of empathy for pets. Empathy scores were positively correlated with maternal ratings and home visitors’ assessments of children’s reassurance and cooperation. Children who had a strong bond with their pet had higher empathy scores than children with no pet. Scores correlated positively with children’s age, and were uncorrelated with children’s receptive vocabulary as measured with the Peabody Picture Vocabulary Test.
Ukegawa’s Empathy Scales (Ukegawa, 1996). This 19-item measure was the only teacher-report children’s empathy questionnaire identified in this review. It was developed for preschool teachers and uses a 5-point Likert-type scale. Only the original paper evaluated the psychometric properties of this scale (Ukegawa, 1996). It has three factors: Concern for Surroundings, Sensitivity, and Shared Emotion. Item examples include, “During free play, he or she is unconcerned even if a schoolmate is in trouble” (Concern for Surroundings), “Even upon hearing a pitiful story, he or she will not feel sympathy for the character in the story” (Sensitivity), and “When a friend is praised, he or she also becomes happy” (Shared Emotion). Internal reliability was excellent, split-half reliability was strong, and inter-rater reliability was moderate. Ukegawa (1996) grouped 71 children according to their total score into low, medium, or high empathy groups; no significant differences between groups were found in an experimental task measuring altruistic behavior.
DPES (Sallquist et al., 2009). This 7-item parent-report questionnaire only measures dispositional positive empathy, which is argued to be distinct from empathy for negative emotions (Sallquist et al., 2009). It uses a 4-point Likert-type scale. Example items include, “My child typically becomes happy when seeing others in happy situations on TV or in a movie,” and “My child often feels happy for other children who receive good news.” Only the original paper investigated the scale’s psychometric properties (Sallquist et al., 2009). It had a single-factor structure and very good internal consistency. Across two time points, DPES scores positively correlated with paternal and maternal reports of children’s empathy/sympathy, certain measures of positive emotion and social competence, and socioeconomic status.
EQ-C (Auyeung et al., 2009). This parent-report questionnaire extended from the Empathy Quotient for adults (Baron-Cohen & Wheelwright, 2004), and covers a range of developmentally appropriate social-emotional skills and caring behaviors interpreted as reflecting empathic capacity. Parents complete the 27 items using a 4-point Likert-type scale, and each response is scored between 0 and 2 points. Example items include, “My child would not cry or get upset if a character in a film died,” and “My child enjoys cutting up worms, or pulling the legs off insects.” Five studies have evaluated the psychometric properties of this measure with typically developing children aged up to 11 years (Auyeung et al., 2009; Girli et al., 2017; Phallaphi et al., 2018; Sassa et al., 2012; Wakabayashi, 2013). Although the measure yields a total score only, Girli et al. (2017) identified two factors in a shortened Turkish version, cognitive empathy and emotional empathy, and Phallaphi et al. (2018) found two factors in a shortened Indonesian version, relating to more negative behavior, and more positive behavior. Internal reliability was very good to excellent, and test–retest reliability was strong. Auyeung et al. (2009) found that EQ-C and Systemizing Quotient-Child (SQ-C) scores were uncorrelated, whereas Sassa et al. (2012) and Wakabayashi (2013) found a small positive correlation. Age and EQ-C scores were uncorrelated (Sassa et al., 2012; Wakabayashi, 2013). Using MRI, Sassa et al. (2012) found a significant positive correlation between EQ-C scores and areas of the brain functionally relevant to empathizing. Among typically developing children, girls scored higher than boys (Auyeung et al., 2009; Girli et al., 2017; Sassa et al., 2012; Wakabayashi, 2013), followed by children with autism spectrum conditions.
More Contemporary Questionnaires That Distinguish Between Cognitive and Affective Empathy in Their Measurement
BES (Jolliffe & Farrington, 2006). The authors of the BES were innovative in generating original items reflecting the contemporary understanding of empathy as having both cognitive and affective components, when creating this youth self-report questionnaire. The 20 items are completed using a 5-point Likert-type scale. There are two scales, Cognitive Empathy and Affective Empathy. Item examples include, “I find it hard to know when my friends are frightened” (Cognitive Empathy), and “When I’ve been with a friend who’s sad, I’m sad” (Affective Empathy). Five studies have investigated the psychometric properties of this questionnaire for children aged 11 years or younger (Bensalah et al., 2016; Cavojová et al., 2012; Geng et al., 2012; Sánchez-Pérez et al., 2014; Zych et al., 2020). Most of these studies confirmed the scale’s intended two-factor structure, sometimes after adaptations (Cavojová et al., 2012; Geng et al., 2012; Sánchez-Pérez et al., 2014; Zych et al., 2020). Internal consistency for the cognitive and affective empathy subscales and total scale was minimally acceptable to very good, and test–retest reliability was moderate. The cognitive and affective empathy subscales positively correlated (Cavojová et al., 2012; Geng et al., 2012; Sánchez-Pérez et al., 2014). These subscales also correlated positively with measures of social skills (Sánchez-Pérez et al., 2014), prosocial behavior (Geng et al., 2012), social-emotional competencies and moral engagement (Zych et al., 2020), and other empathy and theory of mind measures (Cavojová et al., 2012). Sánchez-Pérez et al. (2014) adapted the BES to include a parent-report format, and found that parent ratings were higher than child ratings. For self-report ratings, cognitive empathy scores increased with age and were positively associated with SES; affective empathy scores were higher for girls than boys, and were positively associated with family satisfaction and negatively correlated with aggressive behavior. For parent-report ratings, cognitive and affective empathy was positively associated with SES, affective empathy was negatively correlated with weak parental management, and cognitive empathy was positively associated with family climate. Females scored higher than males (Cavojová et al., 2012; Zych et al., 2020), and there was some evidence that scores increased with age (Geng et al., 2012; c.f. Cavojová et al., 2012).
Bensalah et al. (2016) were distinct in arguing that a three-factor structure of Emotional Contagion (CONT), Cognitive Empathy (CE), and Emotional Disconnection (DIS) best fit the BES (French child version). This proposed factor structure aligned more closely with contemporary neuroscientific findings regarding empathy. However, there were issues with the authors’ interpretation of these factors, particularly in light of research indicating that positive or negative item wording can impact factor structure (Woods, 2006).
CEAQ (Funk et al., 2008). This questionnaire measured “empathic attitudes,” which the authors described as the more cognitively based component of empathy assessed by self-report. Children complete the 16 items using a 3-point Likert-type scale. Item examples include, “Seeing a kid who is crying makes me feel like crying,” and “I would feel bad if the kid sitting next to me got in trouble.” Two studies have investigated the psychometric properties of this scale (Funk et al., 2008; Vilte et al., 2016); both supported a unidimensional factor structure. Internal consistency was respectable to very good, and test–retest reliability was moderate. CEAQ scores correlated positively with BEI and SDQ parent-reported prosocial behavior scores, and slightly negatively with SDQ parent-reported conduct problems (Funk et al., 2008). CEAQ scores also correlated positively with scores on a social desirability index (Funk et al., 2008). Girls scored significantly higher than boys (Funk et al., 2008; Vilte et al., 2016). Scores were otherwise invariant across sex (Vilte et al., 2016).
STEP (Howe et al., 2008). This performance-based measure uses eight video vignettes of children experiencing emotional situations, and measures preschoolers’ ability to understand and share in the emotional response of each story protagonist. Each vignette includes four emotion understanding items (STEP-UND) and four emotion sharing items (STEP-SHA), totalling 64 items. For each item, children’s responses are scored 0 to 2 based on the match to the intended emotion. A STEP-UND example item is, “How did Thomas feel when he saw the big dog?” and a STEP-SHA example item is, “How did you feel when Thomas saw the big dog?” Only the original article evaluated the measure’s psychometric properties (Howe et al., 2008). Internal consistency was respectable to very good for the two scales. The STEP-UND and STEP-SHA scores positively correlated with each other, and with parent-reported dispositional empathy and teacher ratings of prosocial behavior on the SDQ. Children with higher STEP-UND and STEP-SHA scores were more likely to show relevant facial expressions during testing. STEP-UND scores increased with age.
Questionnaire to Assess Affective and Cognitive Empathy in Children (Zoll & Enz, 2010). This self-report questionnaire aimed to capture both the cognitive and affective components of empathy, and includes items derived from three measures—the BEI (Bryant, 1982), Leibetseder et al.’s E-Skala (2001), and the FTS (Garton & Gringart, 2005). The questionnaire comprises 22 items, with an additional six items from Eisenberg’s Child-Report Sympathy Scale (Zhou et al., 2003). Children rate their agreement using a 5-point Likert-type scale. Item examples include, “When I am angry or upset at someone, I usually try to imagine what he or she is thinking or feeling” (Cognitive Empathy) and “On the phone I can tell if the other person is happy or sad by the tone of their voice” (Affective Empathy). This questionnaire was made available via an unpublished manuscript that did not report psychometric properties, but these have been investigated in two studies (Mason et al., 2019; Roth, 2020). Both studies confirmed the two factors of Cognitive and Affective Empathy in shorter questionnaire versions, with Mason et al. (2019) also noting a third factor, Concern for Others, comprising the sympathy scale items. Internal consistency ranged from respectable to very good. The two subscales and total scale all correlated positively with one another (Mason et al., 2019; Roth, 2020).
KEDS (Reid et al., 2013). The KEDS is a performance-based, multidimensional assessment measure of empathy in children, and aims to concurrently assess affective, cognitive, and behavioral components of empathy. It comprises one sample item and 12 items with faceless pictographic stimuli, each including an affect inference question, a cognitive question and prompt, and an other-referenced behavioral question, per blank-faced character. Children score between 0 and 2 for each item component. Item examples include, “How do you think this girl feels?” (Affect), “Can you tell me why this girl feels sad?” and “Please tell me more about what is happening in the picture” (Cognition), and “What would you do if you were that girl?” (Behavior). There is concern regarding the extent to which the KEDS truly measures affective, cognitive, and behavioral empathy (Bensalah et al., 2016). For Affect items, rather than assessing children’s affective response to another’s emotional state, children infer the feeling of a blank-faced character using the context provided. The Cognition questions may also assess verbal and visual comprehension and expressive language. Behavior items involve the child “putting themselves into others’ shoes,” but responses may be influenced by social desirability, and may also reflect cognitive empathy as they do not appear to involve caring or empathic behavior.
Two studies have investigated this measure’s psychometric properties with children (Leana-Taşcılar et al., 2018; Reid et al., 2013). Results reflect the uncertainty regarding what the KEDS scales truly measure, particularly with regard to the Affect scale. Reid et al. (2013) identified four factors, labeled Simple, Complex, Aggression, and Authority. Internal consistency across subscales and scales ranged from undesirable to excellent. Affect, Cognition, and Behavior scores positively correlated with total scores (Leana-Taşcılar et al., 2018; Reid et al., 2013). Evidence was inconsistent between studies regarding correlations between Affect, Cognition, and Behavior; the scales mostly correlated positively, but Reid et al. (2013) found that the Affect scale was uncorrelated with the other scales. Leana-Taşcılar et al. (2018) found that KEDS Cognition and Total scores correlated positively with a Turkish child empathy measure. Reid et al. (2013) found that girls scored higher than boys for Total and Cognition scores. Scores increased with age, apart from Behavior in Reid et al.’s (2013) study, and Affect in Leana-Taşcılar et al.’s (2018) study.
GES (Mikac et al., 2017). The GES was intended as a cognitive and affective empathy questionnaire for early adolescents. However, factor analysis revealed that it did not distinguish between cognitive and affective empathy, so it is considered a general empathy measure. Young people complete the 14 items using a 5-point Likert-type scale. Item examples include, “Sometimes I imagine how the children without parents feel,” and “I hate it when someone treads flowers or plucks plants.” Only the original paper investigated this measure’s psychometric properties (Mikac et al., 2017). It has a unidimensional factor structure, and very good internal consistency. GES scores correlated positively with performance and peer-rated measures of emotional intelligence, negatively with peer-rated direct and total aggression scores, and were uncorrelated with indirect aggression. Girls scored higher than boys, and children living in a small town scored higher than those living in a large town. Children aged 12 and 14 years scored lower than younger age groups (Mikac et al., 2017).
More Contemporary Questionnaires That Distinguish Empathy From Related Constructs and Map More Closely on to Current Developmental Theory and Neuroscientific Research
EmQue (Rieffe et al., 2010). This parent-report measure aimed to measure the developmental skills for empathy and associated behaviors in very young children via parent-report, based on Hoffman’s (1987) theory of empathic development. Parents complete the 19 items using a 3-point Likert-type scale. The measure has three subscales: Emotion Contagion (EC), Attention to Others’ Feelings (AOF), and Prosocial Actions (PA). Item examples include, “My child also needs to be comforted when another child is in pain” (EC), “When an adult gets angry with another child, my child watches attentively” (AOF), and “When I make clear that I want some peace and quiet, my child tries not to bother me” (PA). Four studies have investigated this scale’s psychometric properties (Grazzani et al., 2017; Lazdauskas & Nasvytienė, 2021; Lucas-Molina et al., 2018; Rieffe et al., 2010). These studies confirmed the proposed three-factor structure, often using an adapted, shorter questionnaire. Internal reliability for the three scales ranged from unacceptable (EC) to very good. Test–retest reliability ranged from moderate to strong depending on the time interval. The three subscales generally correlated positively (Grazzani et al., 2017; Lazdauskas & Nasvytienė, 2021; Rieffe et al., 2010). Rieffe et al. (2010) found various correlations as expected between EC, AOF, and PA scores, and certain measures of emotion regulation, prosocial behavior, emotion recognition and understanding, empathy, and theory of mind. There was evidence that AOF and PA scores, but not EC scores, increased with age (Lucas-Molina et al., 2018; Rieffe et al., 2010), and for measurement invariance across sex and over time (Lazdauskas & Nasvytienė, 2021; Lucas-Molina et al., 2018). Boys and girls scored similarly (Grazzani et al., 2017).
AMES (Vossen et al., 2015). The AMES is a self-report questionnaire that distinguishes between affective empathy, cognitive empathy, and sympathy. A 5-point Likert-type scale is used to complete the 12 items. Item examples include, “I can often understand how people are feeling even before they tell me” (Cognitive Empathy), “When a friend is scared, I feel afraid” (Affective Empathy), and “I feel sorry for someone who is treated unfairly” (Sympathy). This measure’s psychometric properties have been investigated in three studies involving children aged up to 11 years (Li et al., 2019; Vossen et al., 2015; Zengin et al., 2018). All confirmed the proposed three-factor structure, and internal consistency and test–retest reliability ranged from undesirable to very good. The three subscales positively correlated with each other, and with the total score (Li et al., 2019; Vossen et al., 2015). After correcting for social desirability, Vossen et al. (2015) found that all subscales and particularly Sympathy correlated positively with the Empathic Concern subscale of the IRI (Davis, 1980), and all subscales and particularly Cognitive Empathy correlated positively with the IRI Perspective-Taking subscale. Affective Empathy and Sympathy were negatively correlated with self-reported physically aggressive behavior, and Cognitive Empathy was uncorrelated. All AMES subscales correlated positively with SDQ self-reported Prosocial Behavior. Girls generally scored higher than boys (Li et al., 2019; Vossen et al., 2015).
EToMS (Wang & Wang, 2015). The EToMS is a parent-report questionnaire that measures two related constructs: Empathy and theory of mind. Theory of mind is further distinguished according to prosocial (“nice”) or antisocial (“nasty”) tendencies. Parents complete the 16 items using a 5-point Likert-type scale. This questionnaire was included in this review due to the overlap between the constructs of cognitive empathy and theory of mind (Coundouris et al., 2020). Item examples include, “Enjoys stories with deceptive plots” (Nice Theory of Mind), “Is good at maintaining a lie when questioned about it” (Nasty Theory of Mind), and “Tries to offer help when someone is hurt, upset, or feeling ill” (Empathy). Only the original paper evaluated this questionnaire’s psychometric properties (Wang & Wang, 2015). The proposed three-factor structure was confirmed, and internal consistency ranged from respectable to very good. There were positive correlations between Nice ToM and Empathy, and Empathy and performance on a ToM task. There were also positive correlations between Nice ToM and children’s observed prosocial white lie telling behavior, and their performance on a ToM task; and between Nasty ToM and observed strategic lie-telling behavior. Nasty ToM and Empathy were uncorrelated, as were Nice ToM and Nasty ToM. Mother and father reports did not differ, and girls scored higher than boys on the Empathy scale.
EmQue-CA (Overgaauw et al., 2017). The EmQue-CA measures affective empathy, cognitive empathy, and intention to comfort, the latter being a dimension lacking from most other child empathy questionnaires. The self-report measure consists of 14 items which children complete using a 3-point Likert-type scale. Example items include “If my mother is happy, I also feel happy” (AE), “When a friend is angry, I tend to know why” (CE), and “If a friend has an argument, I try to help” (IC). Two studies have investigated the psychometric properties of this questionnaire in the target age range (Lazdauskas & Nasvytienė, 2021; Overgaauw et al., 2017). These studies confirmed the suitability of the proposed three-factor model, and found internal consistency to range from undesirable to excellent, and test–retest reliability to be strong. The three scales correlated positively with each other (Lazdauskas & Nasvytienė, 2021; Overgaauw et al., 2017), and with the Empathic Concern and Perspective-Taking scales of the IRI (Overgaauw et al., 2017). The subscales also generally correlated as expected with certain self-report measures of social-emotional competence and bullying. Older age was associated with higher AE and CE scores for girls, but lower subscale scores for boys. Girls scored higher than boys on all subscales. Lazdauskas and Nasvytienė (2021) found measurement invariance across sex.
EQCEA (Richaud et al., 2017). The EQCEA is the only empathy questionnaire for children which endeavors to map itself on to neuroscientific research which identifies five components to empathy: Emotional Contagion (EC), Self-Other Awareness (SA), Perspective-Taking (PT), Emotional Regulation (ER), and Empathic Action (EA). The EQCEA has a 3-item scale for each of these components. The 15 items of this self-report measure are completed using a 4-point Likert-type scale. Example items include, “When I see someone crying who I don’t know, I feel like crying” (EC), “Even though I am happy, I notice when a friend is angry” (SA), “When I argue with someone, I try to understand what he or she is thinking” (PT), “When I get angry, I find it difficult to calm down” (ER), and “We must share with those who have less than us” (EA). Only the original paper evaluated this questionnaire’s psychometric properties (Richaud et al., 2017). The proposed 5-factor structure was supported, and internal reliability was respectable. All items discriminated between low- and high-scoring respondents. All EQCEA scales correlated positively with total scores for two prosocial behavior self-report measures. The PT scale correlated positively with the IRI perspective-taking scale (Davis, 1980). All EQCEA scales correlated negatively with a self-report measure of verbal and physical aggression. ER correlated negatively with a self-report measure of emotional instability.
CASES (Raine & Chen, 2018). The CASES is unique in including somatic empathy; it also distinguishes between positive and negative empathy in its measurement. It has three broad scales: Cognitive Empathy, Affective Empathy, and Somatic Empathy. Each of these scales is evenly divided into positive and negative items. Children complete the 30-item self-report questionnaire using a 3-point Likert-type scale. Item examples include, “I know why my friends are cheerful even when they don’t say why” (CE-Positive), “If I saw my friend being made a fool of, I would feel uncomfortable” (AE-Negative), and “I would sweat if I saw someone getting their tooth pulled out” (SE-Negative). Three studies have investigated the psychometric properties of this scale in children aged up to 11 years (Chen et al., 2021; Liu et al., 2018; Raine & Chen, 2018). These studies found support for a two-factor structure (Positive-Negative empathy), a three-factor structure (Cognitive-Affective-Somatic empathy), and a model of six first-order factors (positive and negative Cognitive, Affective, and Somatic empathy). Internal consistency was found to be excellent for the total scale, respectable to very good for the broader subscales, and undesirable to respectable for the narrower subscales. Liu et al. (2018) found strong positive correlations between CE, AE, and SE, and between positive and negative empathy. CASES scores were negatively correlated with child- and parent-reported CU traits across different time points (Chen et al., 2021; Liu et al., 2018; Raine & Chen, 2018). Research indicated complex but generally negative associations between empathy scores and certain measures of aggression, externalizing behavior, and societal disadvantage (Chen et al., 2021; Raine & Chen, 2018). Total empathy correlated negatively with experiences of physical victimization, and positively with experiences of social manipulation and verbal victimization, and IQ (Liu et al., 2018). Females consistently scored higher than males (Chen et al., 2021; Liu et al., 2018; Raine & Chen, 2018). Liu et al. (2018) found partial support for sex invariance for the scale.
FERET (Coskun, 2019). The FERET was developed based on the premise that emotion recognition is a precursor to empathy in children. This is a predominantly visual, 6-item test which depicts faces through hand-drawn illustrations. For each item, children must recognize the emotion depicted in an illustration of a face, think about how they would feel if a close friend of theirs looked like this emotion, and then choose their response from three options of illustrated faces. Responses are scored between one and three points based on their correlation with the Consensual Mood Structure (Watson & Tellegen, 1985), producing an overall score. Only the original article explored this measure’s psychometric properties; a unidimensional factor structure was supported, and internal consistency was very good. All items discriminated between higher and lower scoring respondents.
Discussion
This review identified all self-report, other-report, and performance measures of empathy for children aged up to 11 years, published in English-language, peer-reviewed journals, and summarized the existing literature regarding each measure’s conceptualization of empathy, item content, and psychometric properties. Collating the existing information about empathy measures for this age group provides an accessible synthesis of these instruments to assist researchers to identify gaps in the literature and shape directions for future research, particularly with regard to investigating measures’ psychometric properties.
Twenty-four empathy measures were identified, dating from 1968 (e.g., FASTE) to 2019 (e.g., FERET). The review revealed a wide spectrum in terms of each measure’s working definition of empathy, nature of items, and quality of supporting evidence regarding questionnaire development and psychometric properties. For example, the IRI (Davis, 1980) defines empathy as including the components of personal distress and fantasy, and the BEI (Bryant, 1982) measures only affective empathy. These measurement approaches are no longer consistent with our contemporary understanding of empathy (Coplan, 2011; Tousignant et al., 2017). In contrast, some more recently developed measures show sophistication in defining empathy and differentiating this from related constructs through their measurement methods (e.g., the CASES, Raine & Chen, 2018; the EQCEA, Richaud et al., 2017; and the AMES, Vossen et al., 2015). Richaud et al. (2017) are unique in having developed a child empathy questionnaire that reports reasonable psychometric properties and corresponds to the five empathy components identified in neuroscientific research: Emotional Contagion, Self-other Awareness, Perspective-Taking, Emotional Regulation, and Empathic Action (Decety & Moriguchi, 2007; Tousignant et al., 2017). Given this variability in available instruments, when selecting an empathy measure, researchers should be mindful of their own definition of empathy, and how well this corresponds to how empathy is defined and understood by a particular measure’s authors.
Broadly, the empathy questionnaires included in this review took two distinct approaches in measuring empathy. Some measured situational state empathy, or a child’s response to specific eliciting situations. Others measured dispositional trait empathy, or a child’s rated empathic tendencies over time and across contexts. Performance-based measures generally involved children responding to items involving novel visual stimuli and narratives, and therefore assessed situational state empathy (e.g., Borke, 1971; Coskun, 2019; Feshbach & Roe, 1968; Howe et al., 2008; Poresky, 1990; Reid et al., 2013). These measures tended to seek children’s responses to standardized fictitious scenarios, which were interpreted as reflecting their empathic ability. However, performance on such measures is more vulnerable to the influence of personal and environmental variables on the day (e.g., temperament, shyness, attentiveness, receptive and expressive language ability, sex of the experimenter, and social desirability). Also, the extent to which performance on a situational state empathy measure corresponds to a child’s stable, underlying empathic disposition is unclear (de Minzi et al., 2016). In comparison, self- and other-report measures of empathy in children assessed dispositional trait empathy. These measures allowed greater opportunity to rate a child’s general empathic disposition across time and different contexts. Especially given that scores across different empathy measures and measurement approaches may not correlate well (de Minzi et al., 2016; Zhou et al., 2003), it is important for researchers to carefully consider what aspect of empathy they wish to measure, and how this influences any conclusions they can draw, when selecting measurement tools.
The measures included in this review spanned a publication period of 51 years. Given our evolving understanding of empathy and its components, and our increasing technological capacity which makes some questionnaire features outdated (e.g., using projector slides), some older measures have fallen out of widespread use with children (e.g., Borke, 1971; Feshbach & Roe, 1968; Mehrabian & Epstein, 1972; Poresky, 1990). However, some older measures continue to be used in recent research, such as Davis’ (1980) IRI (e.g., Holgado Tello et al., 2013) and Bryant’s (1982) BEI (e.g., Huang & Tran-Chi, 2020; Lucas-Molina et al., 2016). It is important that researchers consider whether older and more established and routinely used questionnaires are most appropriate for exploring their research questions, or whether lesser known but more contemporary measures may at times be better-suited. Increased investigation of the psychometric properties of newer measures will further inform future researchers’ decisions regarding selection of empathy measures for their projects.
There was a preponderance of self-report measures of empathy in children. Murphy and Lilienfeld (2019) emphasize the importance of multimodal assessment of empathy, if researchers seek greater confidence that they are validly measuring this construct. Although it is inadvisable to use an excessive number of measures for one construct, researchers might opt to use both a parent- and a self-report questionnaire, or supplement such questionnaires with a behavioral or performance measure. One benefit of a multimodal assessment approach is that it addresses the problem of self-report not necessarily validly reflecting individuals’ empathic ability, due to influencing factors such as metacognitive abilities, social desirability, overconfidence, or narcissistic tendencies (Ames & Kammrath, 2004; Murphy & Lilienfeld, 2019). This problem is not currently sufficiently addressed in self-report empathy questionnaire development and evaluation research. A multimodal approach toward empathy assessment in children also increases the likelihood that researchers can collect meaningful and accurate data because relying on one measure may adversely impact the value and meaningfulness of data collected if its psychometric properties are not robust (e.g., Malcolm-Smith et al., 2015).
In recent research, there has been greater emphasis on developing novel measures of empathy in children. Investigation of a measure’s psychometric properties is also more commonly conducted by the measure’s authors and their collaborators, rather than independent researchers. Sometimes, a measure’s psychometric properties have solely been evaluated by the authors of the original paper (e.g., Garton & Gringart, 2005; Howe et al., 2008; Richaud et al., 2017; Wang & Wang, 2015). The recent emphasis on developing new measures is useful because these measures are more connected with contemporary research findings. Newer measures more commonly endeavor to distinguish between different aspects of the empathy construct (e.g., Raine & Chen, 2018; Richaud et al., 2017), or theoretically related constructs such as sympathy, prosocial behavior, and theory of mind (e.g., Overgaauw et al., 2017; Vossen et al., 2015; Wang & Wang, 2015). These newer measures would benefit from increased independent research that evaluates their psychometric properties and addresses the gaps in previous research.
Of the research investigating measures’ psychometric properties, there has been a relative emphasis on investigating scale and subscale internal consistency, differences in scores according to age and sex, factor structure, and correlations between scales and subscales. Less attention has been given to investigating test–retest reliability, or evaluating how scores on an empathy measure correlate with scores on other theoretically related or unrelated measures. Importantly, relatively few studies have investigated the relationship between children’s empathy questionnaire scores and measures of real-world functioning, such as observed prosocial behavior or relationship quality. Such relationships were also commonly investigated by comparing scores on rating scales completed by the same reporter. Some measures did have more compelling evidence of scores correlating as expected with measures of real-world functioning. For example, first- and fourth-grade boys’ self-reported BEI scores correlated negatively with teachers’ ratings of their physical aggression (Bryant, 1982), and children’s scores on the performance-based Young Children’s Empathy Measure were positively correlated with maternal ratings and home visitors’ assessments of these children’s reassurance and cooperation. In contrast, Ukegawa (1996) grouped children according to their total empathy score into low, medium, and high empathy groups, but found no group differences in altruistic behavior on an experimental task. More research that investigates the relationship between empathy questionnaire scores and independent measures of real-world functioning would help in building our understanding of the extent to which empathy questionnaire scores relate to real-world empathic behavior (see Murphy & Lilienfeld, 2019), as well as which measures have the most robust supporting evidence for their validity.
No measures included in this review had norms. If psychometrically sound, these measures may therefore be appropriate for research use, but are not currently appropriate for use in clinical contexts. Without norms and further extensive validation, these measures cannot be used to make individual decisions or judgments about children, or inform clinical diagnoses. Future research could focus on establishing norms for existing measures, or developing new child empathy measures based on contemporary theory and research, that include norms. Although sex differences were not always investigated, a pattern of girls scoring higher than boys was identified for many measures. This pattern was more common for measures for older children. Girls and boys often scored similarly on measures designed for younger children (e.g., Borke, 1971; Feshbach & Roe, 1968; Grazzani et al., 2017; Sallquist et al., 2009), which may point toward some impact of socialization on children’s performance (Michalska et al., 2013), or sex-based differences in empathy development over time (Christov-Moore et al., 2014). Results are in line with research indicating that females tend to score more highly on empathy measures than males (Christov-Moore et al., 2014), and this pattern should be taken into account when developing norms.
As this was a scoping review, its focus was on compiling all available evidence within the parameters established (e.g., English-language, peer-reviewed publications focused on evaluating measures’ psychometric properties), rather than evaluating the quality of the evidence. The studies included in this review were published in a range of peer-reviewed journals, and research quality has not been considered when reporting findings. Research findings have also been summarized. Those wishing to select the most appropriate empathy measure for their research should refer back to the original articles to ensure that they are satisfied with the quality of the evidence.
In accordance with the PRISMA-ScR guidelines (Tricco et al., 2018), it was necessary for this scoping review to have parameters, and to maintain a focus on the measurement of empathy specifically. However, this resulted in some limitations. For example, the inclusion only of studies that specifically examined child empathy measures’ psychometric properties meant that studies with relevant data, but that addressed a wider research question, were not included (e.g., Campbell et al., 2015; Dadds et al., 2009; Wang et al., 2017). In addition, the fact that only English-language articles were included meant that some high-quality research and measures have doubtlessly been overlooked. It was also beyond the scope of this review to include the many existing measures that assess empathy in children alongside various other constructs. Finally, as only peer-reviewed research was included, there may be some publication bias, in that studies with significant or interesting results may have been more likely to be published and therefore included in this review, than studies with nonsignificant results.
This review was undertaken in recognition of the fact that there is currently no centralized, cohesive source of information regarding self-report, other-report, and performance measures of empathy for children aged up to 11 years. Children in this age group have unique developmental and contextual needs and characteristics that make empathy measures designed for adults inappropriate. When choosing an empathy measure for children, researchers are encouraged to particularly consider those measures that are aligned with, and informed by, contemporary research regarding child development and the neuroscience of empathy. Researchers are also encouraged to utilize a multimodal assessment approach.
Footnotes
Acknowledgements
I wish to thank Professor Virginia Slaughter, Professor Vanessa Cobham and Dr Andrew Hill for their input during the process of this scoping review, and Professors Virginia Slaughter and Vanessa Cobham for their feedback on earlier versions of this paper.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
