Abstract
Psychometric and utility studies on Social Emotional Assessment Measure (SEAM), an innovative tool for assessing and monitoring social-emotional and behavioral development in infants and toddlers with disabilities, were conducted. The Infant and Toddler SEAM intervals were the study focus, using mixed methods, including item response theory modeling and classical test theory. Results using a Rasch one-parameter logistic model indicated model fit statistics were consistent for age and item difficulty as well as for ability and item characteristics. Classical test theory analyses generally confirmed the developmental structure; mean scores increased with age and were significantly correlated across 6-month increments. Reliability studies indicated strong internal consistency and moderate interrater agreement between teachers. Test–retest reliability results completed by parents online indicated significant agreement. Overall, 90% of parents reported the SEAM asked appropriate questions and took less than 10 min to complete.
Keywords
Increased attention on the early identification and prevention of social-emotional problems has been fueled by growing prevalence rates of children with mental health concerns and emotional and behavioral disorders (Briggs et al., 2012; Knitzer & Perry, 2009). Research studies indicate a high percentage of preschool-age children are receiving a psychosocial diagnosis (i.e., 10%–25%), with figures consistent with rates for school-age children (Jellinek & Murphy, 1999; Qi & Kaiser, 2003; Webster-Stratton, 1997). Gilliam, Meisels, and Mayes (2005) report that preschool-age children are expelled more frequently for behavior problems than school-age children, suggesting that substantial numbers of younger children are at risk for, or already have, social-emotional behaviors that require intervention (Crnic, Hoffman, Gaze, & Edelbrock, 2004; Perry, Holland, Darling-Kuria, & Nadiv, 2011), and that preschool teachers do not feel prepared to intervene with behavioral challenges (Hinshaw-Fuselier, Zeanah, & Larreiu, 2009; President’s New Freedom Commission on Mental Health, 2003). Early identification and intervention for social-emotional delays are crucial for children’s later social and academic success.
Children with developmental disabilities have an estimated four to five times more social-emotional deficits than children who have not been identified with developmental problems (Crnic et al., 2004; Kelly, Zuckerman, & Rosenblatt, 2008; Merrell & Holland, 1997). Crnic et al. (2004) estimated that 42% of toddlers with developmental disabilities have behavior problems in the clinical range (or that require intervention). Differences in young children’s interactions may contribute to patterns that disrupt the development of trust and responsiveness, both with parents and other primary caregivers (Kelly et al., 2008). Fortunately, a number of studies suggest that relationship-focused interventions can be used effectively to support developmental and social-emotional functioning in young children with disabilities (Kelly et al., 2008; Mahoney & Perales, 2005; Mahoney, Perales, Wiggers, & Herman, 2006; Warren & Brady, 2007; Warren, Brady, Sterling, Fleming, & Marquis, 2010).
For timely identification and effective intervention, functional behavioral measures and procedures are needed that yield accurate and important information about children’s and caregivers’ social-emotional strengths and needs (Hinshaw-Fuselier et al., 2009; Squires & Bricker, 2007). Assessment tools that provide more than a label or a set of numbers are critical—tools that yield functional and pertinent information so teachers and other child care personnel can identify social-emotional strengths and deficits, develop quality goals and objectives, and generate intervention content necessary to develop positive behavior supports (Bagnato, Neisworth, & Pretti-Frontczak, 2010). In addition, measures are needed that focus on performance and competence for preschool-age children and their caregivers (Meisels & Atkins-Burnett, 2000; Raver & Zigler, 1997). Finally, measures that foster positive partnerships with families are recommended, with families actively participating in assessment procedures. Access, coordination with other professionals, and family collaboration are recommended components of assessment (Copple & Bredekamp, 2009; Division for Early Childhood, 2007; Hemmeter, Joseph, Smith, & Sandall, 2001; National Association for the Education of Young Children & National Association of Early Childhood Specialists in State Departments of Education [NAEYC & NAECS-SDE], 2003; Neisworth & Bagnato, 2005; Sandall, Hemmeter, Smith, & McLean, 2005).
The Social Emotional Assessment Measure (SEAM; Squires & Bricker, 2007; Squires, Bricker, Waddell, Funk, & Clifford, in press) was developed to address the need for psychometrically sound social-emotional tools for young children. As a curriculum-based measure, the SEAM was designed to assist in the prevention and early identification of social-emotional difficulties and behavior disorders, as well as to build positive partnerships with families and optimize positive parent–child interactions in the first years of life. Administered in natural settings, the SEAM yields in-depth information on children’s skills and deficits in social-emotional and behavioral areas. Such information can be used by early interventionists and family members to jointly develop high-quality and meaningful social-emotional goals and objectives. Children’s progress toward important social-emotional goals (e.g., sharing toys) also can be monitored by family members and early interventionists using SEAM items.
The SEAM is composed of three age intervals: Infant (2–18 months), Toddler (18–36 months), and Preschool (36–66 months). A sample of the SEAM is shown in Figure 1. A study of the psychometric properties of the Infant and Toddler SEAM intervals for children ages 3 years and below was the focus of this study.

Sample benchmark and items from SEAM Toddler interval.
Research questions focused on (a) the reliability of the Infant and Toddler SEAM intervals, including internal consistency, test–retest, and interrater reliability; (b) the validity of the Infant and Toddler SEAM intervals, specifically content and concurrent validity; (c) the item functioning for the Infant and Toddler SEAM intervals; and (d) the utility of the SEAM as rated by parents and early interventionists. Validity studies included concurrent comparisons with other measures that have been shown to measure social-emotional development, and comparison of scores across groups of children who were likely to have different levels of ability in the area of social-emotional competence. Reliability studies were conducted to examine consistency of scores across raters (agreement between professional raters within classrooms), across time, and across benchmarks. Item analysis and vertical scaling research were completed using item response theory (IRT) and included (a) testing for item performance with respect to developmental sequence and (b) vertical scaling of age-level form. Descriptive statistics were used to examine the relationship of scores with child age. Utility data were gathered from practitioners and parents/caregivers using satisfaction questionnaires to evaluate user satisfaction.
Method
Measures
SEAM
The SEAM (Squires & Bricker, 2007; Squires et al., in press) targets key child social-emotional behaviors and includes items that were derived from 10 key child benchmarks. Benchmarks include item-level skills such as “participating in healthy interactions,” “expressing a range of emotions,” and “cooperating with daily routines and requests.” Items are listed in order of difficulty along with 3 to 7 related examples within each of the 10 benchmarks. (For a list of benchmarks, see Table 1.)
Child Benchmarks With Sample-Associated Assessment Items for the Infant and Toddler Interval of Social Emotional Assessment Measure.
SEAM content was identified from the literature on social-emotional development of young children raised in mainly Western cultures, although care was taken to include items that had a broad applicability across cultures (Heo, Squires, & Yovanoff, 2008) and potential for cultural, functional, and metric equivalence across groups of children and families (Artiles, Trent, Hoffman-Kipp, & López-Torres, 2000; Peña, 2007). Sources included foundational readings about social and emotional competence, including Dodge and Garber (1991), Raver and Zigler (1997), Sroufe (1996), and Witherington, Campos, and Hertenstein (2001). Texts on social and emotional curriculum and intervention with diverse families, such as Foley and Hochman (2006) and Landy (2002), were also sources. Finally, infant/toddler assessments, including the Child Behavior Checklist 1.5-5 (Achenbach & Rescorla, 2000) and Infant Toddler Social Emotional Assessment (ITSEA; Carter, Little, Briggs-Gowan, & Kogan, 1999), were consulted. Certain concepts repeatedly emerged as those deemed essential or critically important to the mental health competence of young children (Squires & Bricker, 2007; Summers & Chazan-Cohen, 2012). These benchmarks and items were reviewed and revised, based on feedback from experts in infant mental health, early childhood, early intervention/early childhood special education, psychology, and behavior disorders. The SEAM was developed as a unique functional assessment to be used by a variety of practitioners without a background in mental health and/or behavioral interventions.
Each SEAM assessment item is designed to address its accompanying benchmark and, importantly, can also be selected as an intervention goal. A set of specific criteria were used: Items must be (1) functional, (2) meaningful, (3) observable/measurable, (4) easily embedded into daily activities, (5) written in jargon-free language, and (6) able to serve as an intervention goal. These six criteria resulted in the elimination of many child-focused items that frequently appear on diagnostic measures, such as child clings to parent and child hits and bites other children. Targeting negative responses as intervention goals is not usually appropriate; rather, the reciprocal positive response was designed as the intervention target. For example, for Toddler Benchmark 6, “demonstrating independence,” accompanying items include (6.1) “toddler tries new tasks before seeking help,” (6.2) “toddler can separate from parent in familiar environment with minimal distress,” and (6.3) “toddler explores new environments while maintaining some contact.” To assist caregivers in completing the SEAM, each item was also followed by examples of how the behavior might be exhibited by children of various ages within the interval.
Each interval has a consistent format, including a cover sheet with space for entering identifying information and instructions for completion. The four response options are “very true,” “somewhat true,” “rarely true,” and “never true,” plus one column to indicate “concerns” with the targeted behavior and a last column to indicate whether this behavior should be an “intervention goal.” Psychometric properties of the SEAM are the focus of this article and are addressed in the “Results” section.
Ages and Stages Questionnaires: Social-Emotional (ASQ:SE)
The ASQ:SE (Squires, Bricker, & Twombly, 2002) are a series of parent-completed screening questionnaires targeting social, emotional, and behavioral competence in young children from 4 to 65 months of age and was used as a concurrent validity measure. Targeted behavioral areas include self-regulation, compliance, adaptive functioning, affect, and interactions with others. Each questionnaire has an empirically derived cutoff score; if a child scores on or above this cutoff, further assessment is recommended. Internal consistency ranges from 67% to 91%, test–retest reliability is 94%, concurrent validity ranges from 81% to 95%, sensitivity ranges from 71% to 85%, and specificity ranges from 90% to 98% (Squires et al., 2002). It was hypothesized that the SEAM would be negatively correlated with the ASQ:SE, as high scores on the ASQ:SE are indicative of challenging behaviors.
The Devereux Early Childhood Assessment for Infants and Toddlers (DECA-IT)
The DECA-IT (Mackrain, LeBuffe, & Powell, 2007) was used for children ages 2 to 12 months, as a measure of concurrent validity. The DECA-IT is an assessment designed for parent or teacher completion and measures protective and risk factors in social and emotional development. Reliability studies by the authors indicated internal consistency ranged from .90 to .94, test–retest reliability from .83 to .94, and interrater reliability from .68 to .72. Concurrent validity has been studied for the toddler age range only and using the original DECA as the criterion measure. It was hypothesized that the SEAM would be positively correlated with the DECA-IT, as both assessments measure competence behaviors.
ITSEA
The ITSEA (Carter & Briggs-Gowan, 2006) is a standardized norm-referenced assessment that evaluates social-emotional competence in young children from age 1 to 3 years and is considered one of the few instruments with adequate psychometric properties targeting infant/toddler social-emotional development (Printz, Borg, & Demaree, 2003). Four domains—externalizing, internalizing, dysregulation, and competence—include 17 subscales. In addition, maladaptive, atypical behavior, and social relatedness indices are included to assess more serious problems. Studies indicate national standardization data have been gathered (Carter & Briggs-Gowan, 2006), high internal consistency (majority of Cronbach’s α is above .70), acceptable test–retest reliability (intraclass correlations = .61–.91), evidence for concurrent validity (problem scores correlated significantly with Child Behavior Checklist 2/3; r = .28–.78), and acceptable factor loading on the designated subscales (Briggs-Gowan & Carter, 1998; Carter et al., 1999).
Three subscales (i.e., Compliance, Negative Emotion, and Prosocial Behavior) were used as a measure for SEAM concurrent validity for children from 12 to 36 months, based on recommendations from the first author (Carter) as critical for social-emotional assessment (Beeber et al., 2007). It was hypothesized that results from the SEAM would be positively correlated with the Compliance and Prosocial subscales, and negatively correlated with Negative Emotion on the ITSEA.
Utility surveys
Satisfaction, ease of understanding, completion time, acceptability, and usefulness were rated on the utility survey for parents. Similar items were included on the practitioner utility survey. Examples of items include, “How long did it take for you to complete the SEAM?” and “Were any items difficult to understand?”
Institutional review board (IRB) approval was obtained including procedures for informed consent before recruitment began. Two primary methods were used for gathering SEAM data: (a) data collected directly online from parents and (b) data gathered using “paper and pencil” forms. The online data were gathered through a research website advertised on parenting websites and electronic bulletin boards. Completion of developmental screening as well as the SEAM were options on the online site, and all data were collected anonymously. Parents could request consultation or assistance about concerns if they e-mailed research personnel directly.
The pencil/paper data were gathered from early childhood programs serving both typically developing children and children with developmental delays. Program administrators who had expressed interest in using the SEAM or who had received prior assessment training from research personnel were contacted and asked to participate in the research study. Once program administrators agreed, research personnel conducted on-site or teleconference training on research objectives and procedures. Research packets were then distributed to practitioners who approached families about participating. Those families agreeing to participate then received a packet with research forms and measures. Parents who completed pencil/paper data received a US$10 gift certificate for participating; early interventionists received a US$15 gift certificate for collecting research forms from several families (e.g., 2–8 families). Online participants received social-emotional games activity sheets appropriate for their child.
The paper and pencil data including demographic and utility surveys were completed one of several ways, including individually by parents without practitioner assistance, by parents during an interview with a practitioner, and (for interrater reliability) by a practitioner with at least 20 hr of weekly contact with a child. Practitioners included (a) early childhood classroom teachers and assistants working primarily with infants and toddlers who were typically developing and (b) early interventionists/early childhood special educators working with families and their children who are at risk or eligible for Part C services.
Data were collected in 49 states across the United States and from Canada. The numbers from each site ranged from 1 to 279, with the largest number coming from Oregon. The sample included a total of 1,685 SEAMs; 1,334 were collected online and 351 from “paper and pencil” versions. Of the sample, 58% of children were male and 42% were female. The children represented in the sample were predominately White (76.9%). Other ethnicities included Multiracial (5.6%), Hispanic/Latino (4.8%), African American (4.2%), Asian (3.7%), American Indian/Alaskan Native (1.2%), Native Hawaiian/Pacific Islander (0.1%), and other nonspecified ethnicity (1.4%). Fifty-seven percent of children were typically developing, while 43% of children were identified with a disability or developmental delay.
Data on family income and education level were also collected. The majority of caregivers reported incomes greater than US$50,000 (56%), whereas 44% reported incomes below that level. The greatest percentage of participating caregivers (60%) had a bachelor’s or postgraduate/graduate degree, whereas 20% had some college, 16% had a high school diploma or General Education Development (GED), and 4% had not completed high school.
Analyses
Two different types of analytical approaches were used: (a) IRT modeling and (b) classical test theory. These analytical approaches and data management procedures are described below.
IRT modeling
IRT models are probabilistic, providing probabilities that a person with a specific ability will provide a pattern of responses to a set of “calibrated” items. One hallmark of IRT models is that person ability and item difficulty are placed on the same scale. The interaction of a person’s ability and item difficulty theoretically predict the observed response such that higher levels of person ability are required for a correct response to a difficult item relative to an easier item. This enables a simple comparison of ability to item difficulty for estimating how likely a person is to correctly perform a certain skill. Model fit statistics and item functioning analyses were computed to examine the structure of the individual benchmarks as well as ordering of items to evaluate fit validity (i.e., Do the data fit the Rasch model usefully well for the purposes of measurement?) and construct validity (i.e., Does the item difficulty hierarchy make sense?).
For all IRT analyses in this study, a Rasch one-parameter logistic (1PL) partial credit model (PCM) for polytomous scoring (Masters & Wright, 1997) was selected, and the estimation software Winsteps 3.73 (Linacre, 2011) was used. One attractive feature of the Rasch model is the fact that the total summative score is a “sufficient” statistic for estimating ability, which is an underlying assumption of the SEAM. The PCM was determined to be appropriate for the SEAM assessment because no assumptions are made about the relative difficulty of steps between ratings (e.g., different items on the SEAM may require varying amounts of growth in order for a child’s rating to move from “rarely true” to “somewhat true”).
Classical test theory
Descriptive statistics were used to examine the relation between SEAM total scores and child age, as well as practitioner satisfaction. Reliability analyses including interrater, test–retest, and internal consistency were conducted to determine consistency across raters, time, and benchmarks. Validity studies were completed to determine how well the SEAM measures social-emotional skills. Correlations between SEAM scores and other related measures were examined to evaluate concurrent validity.
Data management
As mentioned previously, participant completion methods and participant characteristics (i.e., disability status) differed in ways that may have potentially affected test scores. At issue was whether or not it was appropriate to run analyses on the entire data sample or whether analyses should be conducted separately based on different administration methods and ability status. Thus, differential item functioning (DIF) analyses using IRT modeling were conducted to determine (a) whether SEAM items functioned invariantly across the conventional paper–pencil method of administration and electronic versions completed online by accessing the SEAM research website and (b) whether any of the items functioned differently (i.e., included measurement bias) for either of the ability groups, which included both typically developing children as well as children with identified special needs.
DIF analyses are used to detect when certain items within a test appear to be functioning differently for different populations—that is, construct irrelevant variance (Messick, 1989)—as test takers of similar abilities may exhibit different responses to items due to an extraneous variable (e.g., such as mode of completion or ability status). DIF analyses are useful in identifying items that may result in unexpected ratings for children of the same ability level, thus reducing the strength of inferences that can be drawn from scores (Embretson & Reise, 2000).
The results from the preliminary DIF analysis indicated that there were only minor differences in item functioning for both analyses (administration method and ability status). For administration method, evidence for significant DIF was demonstrated in 3 of the 35 items (8.6%) in the infant version and 2 of the 35 items (5.7%) in the toddler version. These results suggested that most of the items functioned invariantly and were not affected by extraneous artifacts inherent in the method of completion by the rater. Similar results were found for ability status, such that for the infant version, 5 of 35 items (14.3%) demonstrated DIF, while only 2 of the 35 items (5.7%) on the toddler version had evidence of significant DIF. As with administration methods, these results suggested minimal bias between groups. These findings supported the rationale for analyzing the data set as a whole.
Results
IRT Modeling
Model fit statistics
Item fit statistics are generated as an indication of how well the selected model—in this case the Rasch model—fits the obtained data. Responses to items from people of varying estimated abilities should be consistent with the estimated item difficulty, such that participants with estimated high ability should be able to demonstrate more difficult skills, while participants with lower ability should only be able to do easier items. Items that fit the model well are assigned fit statistics that range in value from 0.5 to 1.5. Items with fit statistics less than 0.5 are considered “overly predictable,” whereas items with fit statistics greater than 1.5 contain more noise than useful information and are considered degrading to the measure (Linacre, 2011). Confirming adequate model fit is a necessary step for ensuring credibility of results when performing an IRT analysis. For this analysis, we examined item fit within each SEAM benchmark. As shown in Table 2, results indicated that item-level fit statistics were well within the acceptable range for the majority of benchmarks, except for Item 5.1 (fit statistics = 2.01) from Benchmark 5 in the infant interval and Item 2.1 (fit statistics = 8.09) from Benchmark 2 in the toddler interval. These results provide evidence of unidimensionality for each benchmark and support the use of the Rasch PCM as a means to evaluate item functioning.
Item Fit Statistics.
Note. For item ordering, letters are used instead of numbers to facilitate visual analysis of the results (e.g., for Infant Benchmark 1, a = 1.1, b = 1.2, c = 1.3, and d = 1.4). Italicized items indicate that item ordering within benchmark was significantly different from original item ordering. Bolded values and items indicate that significant misfit was detected.
Item functioning
Item functioning was evaluated to better understand the contribution of individual items within each benchmark of the SEAM. As previously mentioned, IRT offers a range of latent trait measurement models for explaining the relation between item responses and two classes of unobserved variables: (a) person ability and (b) item characteristics (Embretson & Reise, 2000; Hambleton & Swaminathan, 1985). Item characteristics (e.g., difficulty and sensitivity) are estimated with the person’s responses to the set of measurement items, and each person’s ability level is estimated based on his or her set of responses and the estimated item characteristics.
One of the purposes of the IRT analysis was to examine the ordering of the items within benchmark. On the experimental version of the SEAM, items within benchmarks were ordered from easier to more difficult to facilitate an examination of a child’s progress in social-emotional competencies. This initial item ordering was based on developmental quotient, or the relative difficulty of skills from the literature on social-emotional development of young children. Results from the estimated item difficulty suggested that, in general, the majority of items within each SEAM benchmark were in fact hierarchically organized and confirmed the predetermined ordering of these social-emotional skills (see Table 2), or only minor changes were indicated (e.g., switching the order of two adjacent items or only minor changes were indicated). Items were left in the original order when only minor changes were indicated (e.g., switching the order of two adjacent items). For four benchmarks in the Infant interval and three benchmarks in the Toddler interval, original item ordering was found to be largely different from the order indicated by the IRT results. For these benchmarks where “misorder” was detected, the researchers carefully examined each cluster of items to determine whether the disagreement appeared to be due to (a) true item misorder or (b) possible misinterpretation of the item by respondents. In most instances, the items were reordered according to the IRT results; however, for a few benchmarks, original item ordering was kept, and minor editing was done (either to the item itself, its accompanying examples, or both) in an attempt to increase the clarity of the item.
Classical Test Theory Analyses
Descriptive Statistics
Correlation of mean SEAM scores with age
Two analyses were computed using a subset of the data sample that included children who were known to be typically developing to (a) calculate mean SEAM scores across 6-month intervals and (b) calculate correlation of mean SEAM scores with age for the Infant and Toddler intervals. As shown in Table 3, there was a consistent increase in mean scores across the 6-month age intervals in both the Infant and Toddler intervals. Correlations of mean scores with age for the Infant (r = .354) and the Toddler (r = .391) intervals were moderate and significant at p < .01, suggesting that children’s scores did increase with age, but with some variation in the relation between age and SEAM scores, such that children of the same age may have received different total scores on the SEAM.
Mean SEAM Scores and Correlations With Age Across 6-Month Intervals.
Abbreviations: SEAM = Social Emotional Assessment Measure.
p < .05. **p < .01.
Reliability
Test–retest
Test–retest reliability data were collected by online parent participants. After completing the SEAM online, parents were immediately given the option to complete a second SEAM, blind to results of the first SEAM. Results indicated strong, significant agreement for both intervals, as shown in Table 4.
Summary of SEAM Findings (Correlational Analyses).
Abbreviations: SEAM = Social Emotional Assessment Measure; DECA = Devereux Early Childhood Assessment for Infants and Toddlers (Mackrain, LeBuffe, & Powell, 2007); ITSEA = Infant and Toddler Social Emotional Assessment (Carter & Briggs-Gowan, 2006); ASQ:SE = Ages and Stages Questionnaires: Social-Emotional (Squires, Bricker, & Twombly, 2002).
p < .05 (two-tailed). **p < .01 (two-tailed).
Interrater
Interrater reliability data were collected from teacher dyads working at a high-quality child care center serving primarily children of University of Oregon faculty and staff parents. Master teachers and assistant teachers from the infant and toddler classrooms participated. Results for the interrater agreement Pearson product and intraclass correlations are presented in Table 4 for four teacher dyads (one dyad for the Infant SEAM, n = 12 children) and three dyads for the Toddler SEAM (Toddler Class 1, n = 7 children; Toddler Class 2, n = 7 children; and Toddler Class 3, n = 8 children). For the Infant SEAM, the Pearson product correlation (r = .776) was significant at p < .01. For the Toddler SEAM, the Pearson product correlation for Toddler Class 2 (r = .948) was also significant at p < .01. Pearson product correlations for Toddler Class 1 and 2 were not significant but were moderately strong (r = .668 and r = .640, respectively). Intraclass correlations were also analyzed to examine the consistency of differences between scores across raters. As shown in Table 4, results of the intraclass correlations were strong and significant for teachers in the infant classroom and for toddler teachers in Classrooms 1 and 2, but were weak and nonsignificant for the teachers in Toddler Classroom 3.
Internal consistency
Internal consistency of the SEAM was addressed by examining the relation between average benchmark scores using correlational analyses and Cronbach’s coefficient alpha (Cronbach, 1951). As shown in Table 5, Pearson product–moment correlation coefficients between benchmarks ranged from .28 to .67 for the Infant SEAM and from .32 to .68 for the Toddler SEAM. In addition, the correlational analyses between benchmarks and overall SEAM scores were consistent, ranging from .69 to .85 for the Infant interval and from .78 to .90 for the Toddler interval. All correlations were significant, suggesting congruence between benchmarks within each age interval, as well as between benchmarks and total SEAM scores. Cronbach coefficient alphas were also calculated for each age interval. The standardized alpha for the Infant SEAM was .90, and .91 for the Toddler SEAM, indicating strong internal consistency.
Correlations Between Infant and Toddler SEAM Benchmarks and With Overall SEAM Scores.
Abbreviations: SEAM = Social Emotional Assessment Measure. All correlations are significant at p < .01. Total number of infant SEAMs included in the analyses between benchmarks ranged from 1,130 to 1,134 and was 1,153 for benchmark correlations with SEAM total score. Total number of toddler SEAMs included in the analyses between benchmarks ranged from 467 to 472 and was 490 for benchmark correlations with SEAM total score.
Validity
Concurrent validity
DECA-IT. Correlations for DECA-IT scores with the Infant SEAM (n = 13) were strong and significant (r = .754). Results are shown in Table 4.
ITSEA. Correlations were strong and significant for both the Infant (n = 27) and Toddler intervals (n = 120), for both Compliance and Prosocial domains (see Table 4). The Ns differed slightly across domains because cases were included only if all items within a domain were scored, thus allowing a total domain score to be calculated. The correlation between Infant scores and the ITSEA Compliance domain was r = .628 and for the Toddler interval was r = .564. The correlations with the Prosocial domain was r = .651 for the Infant interval and r = .652 for the Toddler interval. As expected, correlations between the Negative Emotion subscale for both the Infant SEAM and Toddler SEAM were in a negative direction. Although the Infant SEAM results were strong and significant (r = −.415), the Toddler SEAM results showed a weak correlation (r = −.261) with the Negative Emotion subscale.
ASQ:SE. Correlations with the ASQ:SE were in a negative direction for both Infant (n = 860) and Toddler (n = 162) intervals for the total sample (i.e., measuring challenging behaviors in ASQ:SE and competence in SEAM) as shown in Table 4. The correlation between the ASQ:SE and the Infant interval (r = −.557) was strong and significant, and the correlation between the ASQ:SE and the Toddler SEAM (r = −.516) was medium/strong and also significant. This was the expected outcome, as SEAM scores increased with competence, and ASQ:SE scores increased as concerns and negative behaviors increased.
Utility
Authors collected utility data from 339 parents who completed the SEAM. More than 90% of parents felt that the SEAM asked appropriate and useful questions. Ninety percent of parents felt that items were clearly worded. Parents indicated they were alerted to new child skills (59% agreed or strongly agreed, while 22% had no opinion); 90% indicated that completing the SEAM did not bring up any concerns about their children that they felt they needed to talk to someone about. The average time in minutes that parents indicated it took them to complete the SEAM was 9 min, indicating a reasonable time for parent completion.
Researchers also conducted a written utility survey with 35 practitioners from Part C Early Intervention programs. Of this group, demographic information was collected from 34 practitioners. Practitioners had an average of 8 years of experience working with children, birth through age 5. The majority held either a bachelor’s (47%) or postgraduate/graduate degree (47%), with 6% holding an associate’s degree. Practitioners used a 4-point scale to rate their skill level related to providing mental health services to infants and toddlers and their families, with 1 = very low and 4 = very high. Four participants (12%) gave themselves a “1” rating, 19 participants (56%) gave a “2” rating, 10 participants (29%) gave a “3” rating, and 1 participant (3%) gave a rating of “4.”
Six percent of practitioners used only the Infant interval of the SEAM, 47% used only the Toddler interval, and 47% used both. Each practitioner completed between 1 and 19 SEAMs, with the majority completing 1 to 4. The majority of practitioners (91%) completed the SEAM with families during home visits, whereas others (6%) completed the SEAM in a child care center or in other ways (11%) such as having a foster parent complete on his or her own at home. Written comments on preferred completion methods revealed a preference for completing the SEAM with caregivers during home visits, through a conversational or interview style that permitted discussion of questions.
Ninety-two percent of practitioners agreed or strongly agreed that SEAM items were clear and easy to understand. Seventy-nine percent (n = 33) agreed or strongly agreed that completing the SEAM gave them meaningful information about a child’s social-emotional abilities and needs. Sixty percent (n = 33) agreed or strongly agreed that they would use the SEAM again, 30% had no opinion, and 9% disagreed. Sixty-two percent (n = 29) agreed or strongly agreed that they planned to address some of the skills parents indicated as intervention goals on the SEAM, and 35% had no opinion.
Discussion
The SEAM was developed to assess young children’s social-emotional competence, completed by parents in a home setting during daily routines and activities. Through completion of the SEAM, parent/caregiver concerns as well as infant/toddler needs can be identified as intervention targets. Functional and meaningful activities can then be designed to enhance child skills related to social-emotional competence as well as improve the quality parent–child interactions and ability of families to provide instructional support to their child (Guralnick, 2012; Kaufman, Perry, Hepburn, & Duran, 2012).
The preliminary psychometric investigation of the Infant and Toddler SEAM intervals has yielded promising results. IRT analyses were conducted using a Rasch PCM, and results indicated that scores derived from SEAM assessments completed online compared with those completed by the traditional pencil and paper methodology had only minor differences in item functioning, suggesting minimal bias between completion methods. Online and pencil and paper data were thus combined for subsequent IRT and traditional data analyses.
Model fit statistics using the Rasch 1PL model indicated that SEAM item responses within each benchmark were consistent for age and estimated item difficulty; children with greater ability were able to complete more items than less-skilled children. Item functioning analyses investigating the relation between item responses, and child ability and item characteristics (difficulty and sensitivity) indicated adequate model fit for both Infant and Toddler intervals.
Classical test theory analyses generally confirmed the developmental structure of the SEAM; Infant and Toddler mean scores increased with age and were significantly correlated with age across 6-month increments. Reliability studies indicated strong internal consistency measured by coefficient alpha between benchmarks within each age interval and high agreement among teachers completing the SEAM on the same children. Test–retest reliability results between preschool teachers (head and assistant) within the same classroom for SEAMs completed on the same child were strong and significantly related for infants (r = .776) but not for toddlers in two of three classrooms (r = .640 and .668, respectively). These results suggested variability in some teacher responses and differences in how children were viewed by teachers in the classroom—that is, some teachers were very positive and rated children as having strengths, whereas a second teacher in the same classroom may have viewed their behavior in a more negative light. It is also possible that teachers interpreted the response options differently (i.e., the distinction between “very true” and “somewhat true” may have varied between teachers). Intraclass correlational analyses confirmed that differences in scores were consistent for the two toddler classrooms (i.e., one teacher may have rated children consistently higher than the other teacher), but in the third classroom, it appeared that there was little consistency between the differences in scores, such that sometimes one teacher rated some children higher, and sometimes the other teacher rated other children higher. These inconsistencies are probably due to individual practitioner variability rather than SEAM scoring rubrics but deserve future investigation.
Test–retest reliability results for SEAMs completed by parents online were strong (r = .987 for Infant and .968 for Toddler), although these figures were likely inflated due to completion timelines—Parents could complete the second SEAM immediately after completion of the initial one and might have remembered their previous answers rather than completing the assessment anew. Future research with online participants will need to be conducted with set intervals of 1 to 2 weeks between completion of the second duplicate SEAM.
Utilizing mixed methods, specifically classical and IRT analyses, allowed for multiple analytic approaches to study the psychometric properties of the SEAM. IRT approaches also allowed for inclusion of a large number of children with disabilities in the data set, as results were not sample dependent (i.e., larger number of children with low abilities did not affect IRT results in contrast to sample-dependent classical testing approaches). We feel these mixed methods provided multiple lenses to investigate how the SEAM worked with a variety of children, parents, and practitioners in home and classroom settings.
Limitations
A major study limitation is the nature of the sample. Rather than a randomly selected population, practitioners were recruited directly by the research team and agreed to participate. In addition, parents who completed pencil and paper SEAMs were recruited through these practitioners and may have been those who were better able to read and understand the SEAM and/or those with more trusting relationships with their practitioners, thus biasing SEAM results. Online participants, the major portion of the sample (79%), were also self-selected in that they responded to online recruiting ads placed on parenting websites and may have had concerns about their young children or have been motivated to seek out new information. This was a sample that was highly educated in general with 60% having a bachelor’s or postgraduate degree and only 4% without a high school degree; the majority of parents reported income levels greater than US$50,000. Likewise, utility results from parents and practitioners may have been more positive due to the self-selected nature of the population. Although SEAMs were collected from 49 states, future research studies should target a nationally representative sample from the United States with greater ethnic and economic diversity to assure generalization of these results.
A second sample limitation relates to the cultural and economic diversity of participating families. Many parents had high levels of education and income, and only 23% were non-White. Given the increasing diversity of young children in the United States, this may not be representative of families served in many early childhood settings. Coupled with issues related to cultural diversity and social-emotional competence, and the powerful effect that culture has on early social-emotional development (Chen & Rubin, 2011), more study is needed related to the equivalence of SEAM items across diverse families, and adaptations that need to be made across cultures (Peña, 2007).
A third limitation pertains to concurrent validity analyses. There is no “gold standard” for assessing young children’s social-emotional competence, especially for very young children. A variety of assessments were selected for this study; however, screening instruments such as the DECA-IT and ASQ:SE are not ideal for these comparisons, due to the measurement error inherent in screening instruments. For children from birth to 12 months, however, there are few measures with any psychometric integrity; these were selected because of their psychometric evidence and content validity. In addition, the ASQ:SE was chosen as a measure recommended to be used with the SEAM in a “linked systems” approach for social-emotional intervention (Squires & Bricker, 2007). Results from clinical appraisals by skilled diagnosticians and transdisciplinary teams should be used in addition to results from concurrent validity assessments in future studies to assess concurrent validity of the SEAM.
Implications for Practice
For success in later school and work settings, children’s competence in social, emotional, and behavioral arenas is critical. Assessment tools that yield functional and pertinent information about young children’s strengths and needs will assist parents and practitioners to identify and intervene with delays and difficulties at an early point in time. Early intervention is necessary to improve developmental trajectories, enhance parent–child interactions, and improve academic readiness.
Practitioners in this study reported that a newly developed tool, the SEAM, helped identify social-emotional concerns and facilitated in-depth discussions with parents about difficulties in the home environment and previously unvoiced concerns. These practitioners also felt that the SEAM assisted them in more focused discussions with parents and eventually might improve the quality of their intervention goals. Overall, 90% of parents reported the SEAM asked appropriate and useful questions and took a reasonable time (less than 10 min) to complete.
In addition, the importance of providing training to practitioners before they begin to use social-emotional assessments was highlighted. Some practitioners felt that they were not prepared for the highly emotional conversations they had with parents and had not devoted enough thought beforehand to anticipate responses and the impact of bringing up deep-seated concerns. Therefore, careful preparation of practitioners, including role-playing and discussion of potential responses, is of critical importance. Being knowledgeable about typical social-emotional development and the content and administration process for assessments will enhance the effectiveness of assessments and assist in forging collaborative parent–practitioner relationships.
Practitioner-friendly instruments such as the DECA, Ounce Scale, and SEAM may be used by teachers in a variety of settings to partner with family members in the assessment of early social-emotional competence. These tools can help pinpoint areas of strength as well as early behaviors that are of concern to parents, leading to early prevention of difficulties before they become problematic. As parents often are not able to discuss social-emotional and behavioral concerns with their pediatricians, early care environments may be an optimal forum for these discussions (Rhodes et al., 2012). The initial psychometric studies investigating the validity, reliability, and utility of the SEAM assessment indicated promising results. Future studies are needed with larger, more diverse samples of parents and practitioners as well as studies focused on the quality of goals developed after completing the SEAM with parents, and ultimately of child progress based on SEAM-targeted goals. Early identification and intervention with social-emotional delays will contribute to improved developmental trajectories and outcomes for young children and their family members.
Footnotes
Authors’ Note
The contents do not necessarily represent the policy of the U.S. Department of Education, and you should not assume endorsement by the Federal Government.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The first author, Jane K. Squires, disclosed a conflict of interest as author of a text containing the Social Emotional Assessment Measure (SEAM), and receiving royalties for its publication.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The contents of this publication were partially developed under Grant R324A070255 from the U.S. Department of Education.
