Abstract
Emotional management (EM) is a crucial skill for achieving relevant biopsychosocial goals, and there has been an increased demand for the effective measurement of EM ability. The current study aimed to explore the psychometric properties of the Portuguese version of the brief Situational Test of Emotional Management (STEM-B) using the item response theory in a sample of 899 participants. The global fit indicated the model had a good adjustment, with most items aligning vertically across the logit scale and presenting an adequate range of item difficulty and fit. Differential item functioning analysis showed no differences in difficulty between genders, but some items differed according to the education level and age-groups. Overall, findings suggest the STEM-B is a psychometrically sound measure for specific testing of EM skills that has the potential to be used across cultures and fields.
Introduction
Conceptual models of emotional intelligence (EI) typically suggest a set of components or skills aimed at adaptive intra- and interpersonal functioning and management of social interaction (Bar-On, 2006; Cherniss, 2004; Mayer, Salovey, & Caruso, 2000). The current study focuses on emotional regulation or management, identified as the fourth branch of the Mayer–Salovey EI model, which consists of different behaviors or regulation strategies intended to modify the emotional experience or expressions (i.e., minimizing negative and maximizing positive emotional states) or facilitate goal-directed behaviors (Aldao, Nolen-Hoeksema, & Schweizer, 2010; Dixon-Gordon, Bernecker, & Christensen, 2015; Gratz & Roemer, 2004; Tamir & Ford, 2012). The Mayer–Salovey theory proposes a hierarchical model in which each branch depends on skills from the lower branches, meaning that emotion management is at the top of the hierarchy, built upon skills related to perceiving emotions, using them to facilitate the thought process, and the understanding of emotional experience (Mayer, Roberts, & Barsade, 2008). Within a social interaction, emotional management (EM) skills can be used to regulate one’s own emotions and influence the emotional states of others (Dixon-Gordon et al., 2015). Therefore, their maladaptive or adaptiveness may be assessed based on the outcomes or consequences of the use of different emotion regulation strategies (e.g., increasing or decreasing well-being, social support, distress, and psychological symptoms).
With the popularization of the concept over the past two decades, EI, and EM skills in particular, has received significant attention due to its important impact on outcomes like professional performance, teaching, health care, well-being, leadership, and business (Austin, 2010; Mayer, Salovey, & Caruso, 2004). Regarding the latter, there has been a significant dissemination of findings on EI being a decisive set of skills in the entrepreneurial field to achieve professional success by improving leadership and management skills, which stimulated the multiplication of EI programs (Kelly & Kaminskienė, 2016; Martina, Denisa, & Mariana, 2015; Zhang, Cao, & Wang, 2018). But despite the popularity and research interest around EI, the current debate on theoretical models and adequate methodological approaches to measure EI and its components is still far from reaching a consensus or providing guidance on how the (mal)adaptiveness of responses or strategies and their outcomes should be determined. This latter aspect is fundamental to mental health research and intervention, including the fields of psychopathology and clinical psychology, as they may exacerbate or diminish anxiety states and depressive symptoms or establish consistent patterns that lead to personal invalidation and interpersonal problems—which are ultimately a core feature of personality disorders and one of the most detrimental outcomes of any psychological disorder (Aldao et al., 2010; Brüne, 2001; Power & Dalgleish, 2007).
The conceptual overlap and the terminological interchangeability of the constructs of emotion management and emotion regulation in the current literature complicate the definition of this process, as well as the extrinsic and intrinsic strategies (e.g., behaviors, goal orientation, and motives) that operationalize the process. This often poses an additional obstacle to researchers attempting to systematize the current knowledge of the field, compare or replicate previous research studies, test the predictive assumptions of current conceptualizations, or advance the theoretical understanding of existing EI models (Anguiano-Carrasco, MacCann, Geiger, Seybert, & Roberts, 2015; Austin, 2010; Fan, Jackson, Yang, Tang, & Zhang, 2010; MacCann, Joseph, Newman, & Roberts, 2014).
The EI field currently divides itself into two types of EI that are seemingly unrelated constructs, trait EI and ability EI, the first relating to the personality domain and the second to the intelligence domain. This separation has significant implications to assessment methodology (Austin, 2010; MacCann & Roberts, 2008; Petrides, 2013). Many of the traditional or more widely disseminated tools are made available for commercial use and are presented in ways susceptible to biases or manipulation by the respondents—for example, when items are overly transparent and allow a respondent to provide a socially desirable response, such as skewing responses toward a more positive impression or malingering—or response formats that are also susceptible to faking and may influence the relationship of EI scores and relevant variables such as personality, intelligence, academic achievement, or well-being (Anguiano-Carrasco et al., 2015; Davis & Humphrey, 2012; Fan et al., 2010; Libbrecht & Lievens, 2012). These issues raise questions relating to the very placement of the construct in the general framework of multiple intelligence models (Husin, Santos, Ramos, & Nordin, 2013; MacCann et al., 2014; Pardeller, Frajo-Apor, Kemmler, & Hofer, 2017). It is possible that some assessment strategies introduce confounding effects with verbal or other cognitive abilities (when the variance in the tests of EI cannot solely be attributed to the construct), or become biased when applied in particular populations (e.g., clinical populations) or contexts (e.g., academic and high-stakes job applications, where faking or social desirability becomes more likely). For instance, the Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT), one of the most disseminated and debated measures of EI, resorts to different scoring procedures when assessing different EI components (multiple choice and rate the extent), presenting issues such as the high correlation with IQ. Other measures related to trait EI tend to correlate more with personality traits and are evaluated by self-report inventories, while ability EI correlates strongly to intelligence tests and crystallized abilities, as they are typically presented as problem-solving tests (MacCann & Roberts, 2008; Roberts, Schulze, & MacCann, 2008). The remarkable demand for psychometrically sound tools for the assessment of EI ability has led to the investment in new assessment paradigms and tests to overcome shortcomings—mostly related to the validity or consistency—observed in several studies involving more mainstream assessment instruments (Allen, Rahman, Weissman, MacCann, Lewis, & Roberts, 2014; Allen, Weissman, Hellwig, MacCann, & Roberts, 2014; Austin, 2010; Conte, 2005; MacCann & Roberts, 2008; Mestre, MacCann, Guil, & Roberts, 2016).
Several authors advocate the advantages of maximum performance testing for EI abilities over typical performance testing, as they primarily assess processes related to cognitive abilities and intelligence over personality traits and motivation (Anguiano-Carrasco et al., 2015; Austin, 2010; Libbrecht & Lievens, 2012; MacCann & Roberts, 2008). This recent trend has provided evidence for the situational judgment test (STJ) paradigm as a valid approach to the assessment of emotional abilities. The STJ approach was used in some MSCEIT tests and has been further developed by MacCann & Roberts (2008) in two new measures of EI ability: the Situational Test of Emotional Understanding (STEU) and the Situational Test of Emotional Management (STEM). Studies by Austin (2010) and Libbercht and Lievens (2012) provided initial evidence of construct validity of both measures.
The current study focuses on the psychometric properties of the brief version of the Situational Test of Emotional Management (STEM-B). Investigation of this measure (and of EI measures in general) in independent samples is still lacking, including those of possible differences or possible biases in ability testing as a function of belonging to different sociodemographic groups (Allen, Weissman, et al., 2014; Barchard & Hakstian, 2004; Fan et al., 2010; MacCann & Roberts, 2008; Sharma, 2017). The STEM-B is a performance test depicting several interpersonal scenarios (some taking place in professional contexts, some in personal contexts, or some without a specific context) to which respondents must choose the most efficient strategies among four response options. The availability of a brief measure of EM skills that is easily accessible is of utmost importance to diversify the alternatives for EI ability measurement and when working with specific populations or when time constraints are a concern for researchers. This is also particularly relevant for researchers working with non-native English speakers, to whom the options are even more scarce and can delay the development of cross-national studies. We approach the analysis with a Rasch modeling technique in a large sample of Portuguese-speaking participants from the general population. Rasch modeling is a specific approach within the item response theory (IRT) framework that is deemed more adequate in EI research, especially measures within the SJT framework yielding categorical data (Allen, Rahman, et al., 2014; Anguiano-Carrasco et al., 2015; Roseman, 2001). We establish concurrent validity with a measure of emotional understanding skill (STEU-B, Allen, Weissman, et al., 2014; da Motta, Castilho, Pato, & Barreto Carvalho, 2020) and also explore the relationships with gender, age, and education, as these factors are expected to impact EI-related ability tests (American Psychological Association, American Educational Research Association, National Council on Measurement in Education, 2014; Fan et al., 2010; MacCann et al., 2014; Sharma, 2017). Finally, we provide a scoring procedure that takes into consideration the overall ability of the respondent (person measure) when calculating the probability of the respondent correctly answering items located at different difficulty levels along this variable (item measure) found in Rasch model (RM) analyses. This procedure should allow for a more linear score scale for the evaluation of the latent construct (Boone, Staver, & Yale, 2014; Granger, 2008; Wright, 1997).
Method
Participants and Procedures
The first stage consisted of obtaining permission from the authors and American Psychological Association to translate and use the STEM-B (described in the Measures section) from the authors and American Psychological Association. Next, we followed some guidelines for the translation and adaptation of the test (Beaton, Bombardier, Guillemin, & Ferraz, 2000; Hambleton, Merenda, & Spielberger, 2004; Sousa & Rojjanasrirat, 2011). A Portuguese bilingual psychologist translated the instructions and items, and a bilingual technician residing in the United States of America back translated the test. A senior psychologist revised the translation and found no deviations from the content of the original version of the test. Prior to its administration to a wider sample, the STEM-B was administered to five participants, who did not report difficulties regarding the items’ clarity or comprehension.
All participants were informed about the research goals, warranting the anonymity and the voluntary character of participation, obeying the research ethical principles of scientific research including human participants, and only participants who provided their written informed consent were administered a research protocol including the STEM-B. Underage participants were contacted through local schools, and a signed informed consent was obtained from their legal representatives prior to participation.
A sample of 899 adolescents and adults from the general population from Portuguese mainland and the Azores islands participated in this study. Of the participants, 40% were males (n = 360) and 60% were females (n = 539), aged between 14 and 72 years (M = 22.46; SD = 9.98). Most participants were single (n = 748; 83.2%), 125 were married/in a civil union (13.9%), 21 participants were divorced (2.3%), two were widowed (.2%), and three participants (.3%) did not report their civil status. Most participants (N = 388; 43.2%) completed elementary school, 344 participants (38.3%) completed or were currently attending to mandatory education (high school), and the remaining 151 participants (16.7%) completed higher education (college, masters, or doctoral degree) or an alternative curricula (e.g., attendance at professional schools). Sixteen participants did not report any information about their education (1.7%).
Analyses
Rasch analysis was performed with WINSTEPS Rasch Analysis (version 3.93, SWREG Inc., 2017), and the remaining statistical procedures were carried out using IBM SPSS Statistics (version 23 for Microsoft Windows, IBM Inc. Armonk, NY).
Because the STEM-B is a multiple-choice test format, items were dichotomized into correct or incorrect responses and analyzed through RM (Rasch, 1961). In RM, calculations transform the persons and item parameters to a unit “measure” (θ or theta) that is distributed along a continuum. Each unit of measure of θ corresponds to log-odd units or “logits,” a scale with theoretical ranges being ±infinity but typically ranges between ±5 (Prieto & Velasco, 2006), where 0 localizes the average difficulty for the measure. Thus, the RM analysis has the advantage of estimating items’ parameters independently from the participants who enter the sample and providing a robust assessment of construct validity and representation for more consistent scoring and interpretation.
Infit mean square (Infit MNSQ) and outfit mean square (Outfit MNSQ) are calculated to estimate the model fit to the data, both for items and persons. The infit refers to the weighted MNSQ, providing information about items and possible structural problems (Baker, 2001; Prieto & Velasco, 2006). According to Linacre (2011), infit statistics values between .5 and 1.5 are productive for measurement; values greater than 2.0 can degrade the measurement, values between 1.5 and 2.0 are unproductive for measurement construction, and values smaller than .5 are less productive for measurement but not degrading. The outfit is the unweighted MNSQ and is useful for diagnosing outliers (Linacre, 2011).
Differential item functioning (DIF) analyses comparing item’s difficulty across gender, educational level, and age were carried out to address possible item bias due to these sociodemographic characteristics when holding participants’ ability constant. No differences are expected between males and females regarding item’s logit values; however, it is expected that items vary in the degree of difficulty between participants with higher education or older age (characteristics more typically linked with lower or higher ability of EM).
Thus, it is important to explore whether items can properly assess the EM abilities of these subgroups. DIF analysis allows us to estimate the item difficulty parameter for the two different groups and then compare the two values (item by item). Whenever a statistically significant DIF item is found with Rasch–Welch and Mantel–Haenszel tests of item difficulty, the difference in the estimated difficulty parameters is examined according to the following criteria: the logit differences below .43 are considered negligible, the logit differences above .43 are considered moderate to large, and the logit differences above .64 logits are considered large (Zwick, Thayer, & Lewis, 1999). Finally, the differential group functioning compares all item difficulty parameters between the two groups.
These analyses were complemented with classical test theory analyses of concurrent criterion validity by exploring the relationship between STEM-B scores and emotional understanding measures and the relationship between STEM-B scores with gender and age.
Measures
Participants completed a sociodemographic sheet and were administered a research protocol in paper and pencil format that included the STEM-B and STEU-B.
Situational Test for Emotional Management – Brief
The STEM (Allen, Rahman, et al., 2014; MacCann & Roberts, 2008, Portuguese version by Da Motta et al., 2016a) was developed based on situational judgment paradigm and through semistructured interviews and expert evaluations of response options. Because results may be influenced by response formats, the final multiple-choice format was chosen using a quasi-experimental design that compared the multiple-choice to rate-the-extent formats (Allen, Rahman, et al., 2014; MacCann & Roberts, 2008). This procedure aims at separating the test and construct effects: the multiple-choice format was considered more appropriate because it converges with the emotional and cognitive processes that are more related to intelligence tests (e.g., choosing the best among several response options), when compared with rate-the-extent formats that require more divergent thinking processes (e.g., considering several equally good options and selecting one response). In a previous study by Allen, Rahman, et al. (2014) and Allen, Weissman, et al. (2014), an 18-item version was developed from the 44-item STEM, and findings indicated the shorter version preserved the characteristics of the longer version. STEM-B offers scoring procedures based on expert ratings or dichotomous (0 = incorrect/1 = correct) scoring (Allen, Rahman, et al., 2014). Dichotomous scoring was used to avoid judges’ impact severity in scoring. Because the psychometric properties of this measure are the focus of this article, the detailed information about construct validity, internal consistency, and criterion-related validity will be presented in the results section.
Situational Test of Emotional Understanding – Brief
The STEU-B (MacCann & Roberts, 2008, Portuguese version by Da Motta et al., 2016a, 2016b) is a 19-item version of the STEU, 42 items, a test devised to assess the third branch of EI or the ability to understand the emotional experience in the self and in others. The STEU-B was obtained from IRT analysis of the STEU, performed in the studies by Allen, Rahman, et al. (2014) and Allen, Weissman, et al. (2014), which provides the maximum amount of information with the least testing time. The items depict different interpersonal scenarios, with different degrees of difficulty, to which participants are asked to choose the more likely emotional response resulting from each situation (e.g., Clara receives a gift. Clara is most likely to feel? (a) happy; (b) angry; (c) frightened; (d) bored; (e) hungry). The STEU-B is the shortest available omnibus test of ability-based EI assessment, with increased utility for assessment time restrictions (Allen, Weissman, et al., 2014; da Motta et al., 2020). As a measure that assesses the third branch of EI skills (emotional understanding), it is expected the results of the STEM-B will show some degree of dependence in this skill and the STEU-B and STEM-B scores will be correlated.
Results
The assumption of unidimensionality was assessed through principal component analysis (or contrasts) of standardized residual variance. In this analysis, it was expected that the unexplained variance (the variance that is not explained by the Rasch measures) consisted of random noise. To evaluate this, we calculated contrasts of the unexplained variance: the first contrast typically holds most of the residual variance but is expected to be below an eigenvalue of 2 (otherwise, it could indicate the presence of a second dimension) and the eigenvalues are expected to decrease at each subsequent contrast. The eigenvalue of unexplained variance obtained on the first and second contrast was 1.77 and 1.40, respectively, and because this value was inferior to 2, it is possible to state that the test presents no multidimensionality problems (Linacre, 2011; Raîche, 2005). The assumption of local independence was supported by low residual correlations between items, with values ranging from r = −.12 to r = .25 (Linacre, 2011). The largest observed residual correlation (Yen’s Q3) was used to identify dependent pairs of items, and all items presented weak positive correlations (r = −.15 to r = −.31), indicating local independence. The total raw variance explained by the scale was 25.2%.
Global Fit Statistics
The model tested included all participants and the 18 items from the STEM-B. All items presented adequate adjustments, with average infit and outfit values of .99 (SD = .10) and 1.02 (SD = .22), respectively. The maximum value of the outfit was of 1.57, which suggested the absence of outliers or items with poor adjustment. The amplitude of the measure for the items ranged from −1.53 to 1.73 logits, and the measure’s standard error was low (between .07 and .09, M = .08; SD = .01). Person fit showed appropriate infit (M = 1.00, SD = .96) and outfit (M = 1.02, SD =.46) average values. The maximum values of outfit informed about the existence of abnormal response patterns. The inspection of extreme infit and outfit values revealed that about 26 participants (2.9%) presented response patterns classified as matching “careless” or “lucky guessing” patterns (Linacre & Wright, 1994), which would result in a poorer model fit. The identified participants were youth (less than 18 years old), who may have chosen the responses randomly or with little effort. As misfitting response patterns are a possible cause of measure distortions, the model was reanalyzed after excluding the 26 participants presenting problematic response patterns.
Global Fit Statistic of STEM-B (N = 873).
Note. Max = maximum; Min = minimum; STEM-B = Brief Situational Test of Emotional Management.
Item-Person Map
Figure 1 depicts the distribution of persons and items across the measure (θ). The majority of items align vertically across the logit scale. The only two sets of parallel items suggest a similar ability of EM and could be excluded, if necessary. Provided those items did not present any problems within the model and referred to distinct interpersonal scenarios theoretically relevant to the construct at hand, they were maintained. Item-person map of Situational Test of Emotional Management - Brief (N = 873).
Total Score, θ, Standard Error, Infit e Outfit by Item, and Point–Biserial Correlations between Item and Total Score of STEM-B (N = 873).
Note. θ = measure; SE = standard error; r = point–measure correlation (between observations and θ); STEM-B = Situational Test of Emotional Management - Brief.
Differential Item Functioning
We explored DIF for matched ability levels or possible changes in item difficulty parameters (bias) according to the following sociodemographic characteristics: gender (males vs. females, with females as the reference group), years of education (mandatory education or below vs. higher education, with mandatory education as the reference group), and age (youth vs. adults, with adults as the reference group). It was not expected that differences in item difficulty (e.g., items favoring a group over another) would occur, as changes in difficulty parameters could be an indication that the item may be biased. As shown in Figure 2(a), no statistically significant differences in item difficulty were found in pairwise comparisons between male and female participants, except for item 8. That item’s DIF reached the significant threshold and presented a .39 logits favoring females (see Supplemental Table 1), a DIF value within a negligible threshold according to Zwick et al. (1999). Two items showed statistically significant differences in measures in pairwise comparisons between participants with mandatory education or less and participants with higher education. As shown in Figure 2(b), DIF contrast values of −.68 for item 15 and −1.05 for item 18 were found (see Supplemental Table 2). The magnitude of this difference was considered moderate to large, as the items’ DIF contrasts showed values above .64 logits (or below −.64 logits, as the item favored participants with less years of education). Last, group comparisons regarding age (below or above 18 years old) yielded nearly half the items reaching the DIF significance threshold in terms of difficulty measures, as presented in Figure 2(c). As depicted in the plot, items’ differences seem to balance each other out, as four items favor youth and four items favored older participants (see Supplemental Table 3). Differential group functioning analysis was nonsignificant, indicating that observed differences in item difficulty between age-groups did not impact the total score of the STEM-B measure. Changes in item difficulty between male and female participants (a), level of education (b), and age (c) (N = 873); *p ≤ .01, **p ≤ .05.
Concurrent Validity
To assess concurrent validity, we also correlated the STEM-B results with emotional understanding, the ability from the third branch of EI and related with the strategic use of emotions. We also explored whether scores correlated with gender and age. Correlation coefficients were statistically significant and moderate regarding emotional understanding (r = .530; p ≤ .001) but weak regarding gender (r pb = .069; p = .019, where 0 = males and 1 = females, indicating females had slightly higher STEM-B scores) and age (r = .247; p ≤ .001). These three variables can predict nearly a third of STEM-B absolute scores (R2 = .297; F = 187.843, p ≤ .001), with the strongest predictor being emotional understanding ability (STEU-B, β = .495, p ≤ .001), followed by age (β = .138, p ≤ .001).
Scoring
Table of Sample Norms (500/100) and Frequencies Corresponding to Complete Test.
Note.
Maximum statistically different levels of performance (strata) = 2.9.
Wright’s Sample-Independent Person (Test) Reliability based on maximum strata = .89.
Discussion
The current study aimed to evaluate the psychometric properties of STEM-B in a large sample from the Portuguese population. We used RM (IRT), an approach that is useful to ability measurement models because it allows the creation of a measurement unit from a sample’s performance that considers the item’s difficulty within the same construct. In other words, RM allows an estimation of STEM-B item parameters independently from the sample. These approaches acquire relevance in the field of emotional regulation and management assessment, to which replication has been inconsistent and varied due to the great diversity of methods and measures found across studies, raising significant obstacles to the advancement in the field.
Because an assumption of RM is local independence of items (responses are not similar and do not depend or correlate with each other), reliability estimates may differ from more typical indices of reliability which are prone to be artificially inflated by the number of items and item correlations (e.g., Cronbach’s alpha, Kuder–Richardson formula, or KR-20, Cortina, 1993; Tan, 2009). The reliability (KR-20) obtained in the current study was lower than those from the studies of the 44-item version (Austin, 2010; MacCann & Roberts, 2008) and the brief version by Allen, Rahman, et al. (2014) and Allen, Weissman, et al. (2014) but still within acceptable standards. The high item reliability showed the item difficulty hierarchy could be located across the latent variable with a significant precision in RM analysis.
Findings on the reliability and unidimensionality of the scale provide statistical validity for the STEM-B interpretation of scores used in the current study, and DIF analyses provide further data on the wide applicability and concurrent validity of the scale’s scores with relevant EI constructs. Item difficulty remains unchanged for males and females, suggesting the application of the STEM-B is adequate to a wide public, whereas the differences on item functioning are present only when comparing groups by age, demonstrating the items’ difficulty presents some changes for a few items. DIF cancellation was observed when items differed for younger and older participants, but items favoring one group or the other were balanced, so these differences did not compromise the overall test score (Wyse, 2013). Older participants may be more likely to perform better in the most difficult scenarios not only because they have completed the central nervous system maturation process and may have had the chance to achieve higher levels of education but also because they have had the opportunity to be exposed to a wide range of life experiences that can be pivotal to the development of the capacity to choose the best options available when solving problems in different interpersonal contexts. Considering that EM skills encompass several cognitive and noncognitive processes, it is likely that these skills are largely dependent on different brain functions, as reflected in the findings of functional magnetic resonance imaging studies on social cognition (Van Overwalle, 2009). However, the placement of these constructs within the EI theoretical models and its relationship to intelligence or other brain functions should be addressed in future studies, as it falls out of the scope of the current work (Kaufman & Kaufman, 2001; Roberts et al., 2001a, 2001b).
The lower person reliability and separation values may be a result of the reduced number of items, or the sample presenting a narrow ability range—being constituted by participants from the general population and no other possible extreme groups, for instance. This latter issue was also raised in the study by Libbrecht & Lievens (2012) when using the longer version of the measure (STEM) to explore its relationship to other EI ability measures. Nevertheless, the separation values indicate the possibility of distinguishing at least two levels of performance (e.g., high/low performance). Similar findings were obtained in a study of the longer version of the STEM, which included a sample with higher average age and years of education (MacCann & Roberts, 2008). For this reason, using a shorter form of the measure with a simple score procedure (correct/incorrect) can be a useful approach when working with vulnerable populations (e.g., clinical samples and in- or outpatients), when EM is a complimentary variable, or when researchers face assessment time constraints or participant fatigue is a major concern. When such constraints are not present, it would still be recommended to use a longer version when the discrimination of individuals by more than two levels of performance or assessing performance in a wider range of scenarios may be needed.
While the multiple response format may help circumvent faking responses or social desirability, it is important to keep in mind that current EI theories have not been sufficiently developed to guide the appraisal and choice of correct responses in this kind of test, and the correctness of responses is mainly based on empirical or expert evaluations (psychologists, counselors, or psychiatrists). The current study is bound to this limitation but proposes a more rigorous scoring procedure that accounts for the differences in difficulty across the STEM-B’s 18 items and provides a linearized scale more suitable for further parametric statistical testing (Boone, 2016; Boone et al., 2014). Further limitations of the current study refer to concurrent validity being established with an EI-related measure and to the lack of a test–retest analysis. Because we did not include other measures of cognitive abilities to further address discriminant and convergent validity, future studies should aim to study convergence and divergence with other important cognitive skills (e.g., fluid or crystallized intelligence) and aim to explore the measure’s temporal stability.
Future efforts should be developed in clarifying the algorithms to resolve complex and nuanced decisions involved in social interactions in which emotional skills management is based and manifested. The use of video vignettes or other multimedia support opens the possibility to the development of more dynamic ways to assess different emotional strategies involved in EM, as participants may be able to process verbal and nonverbal clues in a more fluid manner.
It is reasonable to conclude that the STEM-B is a psychometrically adequate measure for specific testing of EM skills with a wide potential utility. Measures of this nature can be useful for academic or professional fields and in aiding the evaluation of socio-emotional skill deficits in clinical and educational settings. More importantly, the STEM-B is an accessible measure to researchers that provides clear scoring procedures that can be used to respond to the current assessment needs in the EI ability field. The dissemination of a Portuguese version of the STEM-B as a novel and cost-effective measure for professionals working with Portuguese-speaking communities allows for new venues to conduct transcultural studies involving Portuguese native speakers worldwide (Lewis, Simons, & Fennig, 2016) to help foster empirical and theoretical advancements on EI models and related constructs.
Supplemental Material
Supplemental_Tables – Supplemental Material for Rasch Measurement of the Brief Situational Test of Emotional Management in a Large Portuguese Sample
Supplemental Material, Supplemental_Tables for Rasch Measurement of the Brief Situational Test of Emotional Management in a Large Portuguese Sample by Carolina da Motta, Célia B. Carvalho, Michele T. Pato and Paula Castilho in Journal of Psychoeducational Assessment
Footnotes
Acknowledgments
We thank the undergraduate students and Joana Rodrigues for their assistance in data collection and preparation, and Anabela Behnke and Teresa Carvalho for proofreading and providing valuable input during the revision process.
Author Contributions
All team members were involved in the general tasks concerning the elaboration of all studies in this article and processing the presented manuscript. Célia Barreto Carvalho, Paula Castilho, and Michele T. Pato (academic advisors) have contributed to the design, construction of evaluation protocol, manuscript reviews, and all methodological aspects concerning this article. Carolina da Motta (PhD grant holder) contributed with data collection and statistical analysis, and in manuscript preparation, revisions, and proof-editing.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the first author’s PhD Grant (SFRH/BD/110308/2015), sponsored by FCT (Foundation for Science and Technology), Portugal, and cosponsored by ESF (European Social Fund), Brussels, through Portuguese POPH (Human Potential Operational Program).
Data Accessibility Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Compliance with Ethical Standards
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
