Abstract
Bullying involvement among youth has consistently been linked to potentially serious consequences for both perpetrators and victims. To help clarify the nature and scope of youth bullying involvement, empirically validated assessment instruments measuring victimization and perpetration behaviors are needed for use in research and practice. The present study investigated the latent factor structure of the 22 victimization and perpetration items within the 2009-2010 Health Behavior in School-Aged Children (HBSC) self-report survey. Structural validity analyses were conducted using a representative sample of U.S. youth in Grades 5 to 10 (N = 11,449) obtained from the national administration of the HBSC self-report survey. Results suggested a two-factor latent structure comprised of bullying victimization and perpetration was the most theoretically and psychometrically sound measurement model for these data. In addition, multigroup measurement and structural invariance analyses showed that this model functioned equitably across student race/ethnicity, sex, and grade level, supporting the measure’s use with diverse student populations.
Keywords
The issue of youth bullying involvement has received a great deal of research and media attention over the past few decades (Olweus, 2010) and remains a salient topic of study (Hymel & Swearer, 2015). Part of the draw to study this phenomenon is the well established link between bullying involvement and the numerous deleterious effects on youths’ social, academic, and psychological health (Jimerson, Nickerson, Mayer, & Furlong, 2012). For example, students who bully others—referred to hereafter as perpetrators—show more aggressive and dominant behavior patterns, exhibit fewer prosocial behaviors, and have elevated rates of externalizing behavior problems (Cook, Williams, Guerra, Kim, & Sadek, 2010; Veenstra, Lindenberg, Oldehinkel, De Winter, Verhulst, & Ormel, 2005). On the contrary, students who are the targets of bullying—referred to hereafter as victims—show higher levels of internalizing behavior problems and are markedly less likely to attend and perform well in school (Cook et al., 2010; Haynie et al., 2001; Kochenderfer & Ladd, 1996; Swearer, Espelage, Vaillancourt, & Hymel, 2010).
The many negative effects associated with bullying involvement have been found across demographics, including sex, race/ethnicity, school grade level, and nationality (Nansel, Craig, Overpeck, Saluja, & Ruan, 2004; Nansel et al., 2001; Renshaw, Roberson, & Hammons, 2016). At the same time, some longitudinal studies of prevalence rates have shown that bullying involvement has been decreasing in the United States over the last several years, perhaps due in part to increased awareness of this public health issue and rise in antibullying legislation (see stopbullying.gov). Although this may suggest reason for optimism, some estimated rates of bullying involvement are nonetheless intolerably high, with certain studies suggesting as many as 60.4% of youth endorsing involvement as a perpetrator and/or victim to some degree (Renshaw, Hammons, & Roberson, 2016).
Due to the elevated risks connected with bullying involvement, it is an important precondition to prevention and intervention that researchers and school professionals use technically adequate instruments to measure youths’ perpetration and victimization. Although some recent meta-analytic reviews of instruments for assessing bullying involvement show that more than 40 measures are currently available (Vessey, Strout, DiFazio, & Walker, 2014), these studies indicate major discrepancies in conceptual approaches to bullying measurement as well as varied evidence quality in regard to measurement validity (Vivolo-Kantor, Martell, Holland, & Westby, 2014). This variability in validity evidence makes it difficult to reach consensus about the nature and scope of youths’ bullying involvement (e.g., Renshaw, Hammons, & Roberson, 2016). Furthermore, these issues pose a challenge for practitioners in selecting appropriate assessment tools for screening or progress monitoring intervention and prevention efforts (Bradshaw, 2015). To remedy these issues, additional studies aimed at developing conceptually coherent and psychometrically robust measures of youths’ bullying victimization and perpetration are essential.
Although no single instrument has been completely adequately studied, Vessey et al. (2014) have suggested that the revised Olweus Bully/Victim Questionnaire (OBVQ; Olweus, 1996) is perhaps the most technically adequate and commonly used bullying measure to date. Whereas some other self-report bullying scales used for general prevalence estimation utilize only a scarce few items—such as the two items concerning school victimization in the Youth Risk Behavior Surveillance Survey (YRBSS; Centers for Disease Control and Prevention, 2014) or the single victimization item in the National Crime Victimization Survey (NCVS; Bureau of Justice Statistics, 2015)—the OBVQ assesses both perpetration and victimization using multiple items targeting many classes of bullying behavior (i.e., verbal, physical, relational, racial, cyber, and general). This breadth makes the OBVQ one of the most conceptually representative instruments available.
In addition, given the evidence that both bullying perpetration and victimization are associated with overlapping yet distinct negative outcomes (Nansel et al., 2004; Renshaw, Roberson, & Hammons, 2016), the ability to derive separate scores for both perpetration and victimization is an important feature of an instrument that intends to enhance the precision of risk estimates. Assessing bullying perpetration in conjunction with victimization also allows for the identification of students who experience both types of bullying (i.e., perpetrator–victims), whom research has identified as having even poorer outcomes compared with students involved in bullying as perpetrators-only or victims-only (Cook et al., 2010; Farmer et al., 2010; Veenstra et al., 2005). Although several existing measures use single, domain-general items to assess overall victimization and perpetration behavior—like those on the YRBSS or NCVS—research suggests that items targeting specific bullying behaviors yield higher prevalence rates, suggesting improved identification sensitivity (Renshaw, Hammons, & Roberson, 2016). Furthermore, it seems reasonable to suggest that measures composed of items targeting several specific bullying behaviors would have both greater usability (see Glover & Albers, 2007) and treatment utility (see Hayes, Nelson, & Jarrett, 1987) for practitioners using data obtained from these instruments to inform bullying prevention and intervention in schools.
Given the context sketched above, the purpose of the present study was to investigate the structural validity of an underresearched measure that targets both specific bullying victimization and perpetration behaviors: the self-report bullying items within the Health Behavior in School-Aged Children (HBSC) survey, sponsored by the World Health Organization (2014). These HBSC self-report items were modeled after the OBVQ and expanded in content to target parallel classes of bullying victimization and perpetration behaviors (e.g., physical victimization and physical perpetration). Although several studies investigating bullying involvement have been conducted using isolated items from HBSC self-report surveys (e.g., Nansel et al., 2004; Renshaw, Hammons, & Roberson, 2016), no study has yet to investigate the structural validity of the entire suite of bullying-related items. Thus, the purpose of the present study was to test the latent structure and invariance of responses to the HBSC self-report bullying items. Given that an item-content evaluation suggested the HBSC self-report survey targets at least two overarching classes of bullying behavior—victimization and perpetration—the following hypotheses were proposed to guide the study:
Method
Participants
The current study utilized the publicly available 2009-2010 HBSC self-report survey dataset, which is composed of a stratified random sample of U.S. youth. A detailed description of the student sampling process and other information concerning the design and administration of the survey can be found in the publicly available codebook (Iannotti, 2013). A summary of participant demographic characteristics for the original full sample (N = 11,449), the full sample with cases containing missing data on one or more of the bullying items removed (n = 9,979), and two random split-half subsamples derived for data analytic purposes is provided in Table 1. Participants in the original full sample identified approximately equally as male and female were enrolled in Grades 5 to 10 (grade-level representation range = 13.7%-20.3%), and approximated racial/ethnic proportions of U.S. youth, with the most prevalent identities being White (49.6%), Hispanic/Latinx (17.7%), and Black/African American (15.3%). Most respondents indicated living in a suburban area (36.2%), followed by urban (30.4%), rural (25.7%), and unclassified areas (7.8%). Respondent scores on a composite measure of family affluence ranging from 0 = low to 9 = high were mostly in the upper-middle range of the scale (M = 5.96, Mdn = 6, Q1 = 5, Q3 = 7). Compared with the original full sample, relative proportions of each demographic were largely maintained across all other iterations of the sample (see Table 1).
2009-2010 HBSC Demographic Proportions for Sex, Grade Level, and Race/Ethnicity Across All Samples.
Note. HBSC = Health Behavior in School-Aged Children.
Measure
The HBSC self-report survey is administered to a nationally representative sample of U.S. students in Grades 5 to 10 every 4 years. The survey assesses a broad range of health-related behaviors among youth, including drug/alcohol use, body image, attitudes about school, peer/family relationships, physical health, bullying involvement, and more. To operationalize “bullying” for respondents, a prompt with a formal definition—adapted from the OBVQ—was provided at the beginning of the item set:
We say a student is BEING BULLIED when another student, or a group of students, say or do nasty or unpleasant things to him or her. It is also bullying when a student is teased repeatedly in a way her or she does not like or when they are deliberately left out of things. But it is NOT BULLYING when students of about the same strength or power argue or fight. It is also not bullying when a student is teased in a friendly and playful way.
The HBSC self-report survey contains 22 items targeting specific classes of bullying behaviors—11 items for victimization (see Table 2) and 11 parallel items for perpetration (see Table 3)—covering teasing, social exclusion, physical aggression, spreading lies, various kinds of harassment, and multiple kinds of cyber bullying. The victimization and perpetration items were arranged in two-item subsets and were prefaced with a similar item stem (i.e., “How often have you [been bullied/bullied another students(s)] at this school in the past couple of months in the ways listed below?”). Response options for all bullying involvement items were arranged along a 5-point, relative frequency-based scale (1 = I haven’t been bullied at school the past couple of months, 2 = it has only happened once or twice, 3 = 2 or 3 times a month, 4 = about once a week, and 5 = several times a week).
2009-2010 HBSC Items Targeting Bullying Victimization.
Note. HBSC = Health Behavior in School-Aged Children.
2009-2010 HBSC Items Targeting Bullying Perpetration.
Note. HBSC = Health Behavior in School-Aged Children.
Data Analyses
The structural validity of responses to the HBSC self-report bullying involvement items was analyzed in multiple stages using the R statistical environment (R Core Team, 2016). Prior to the primary analyses, data inspection and cleaning procedures were undertaken, beginning with handling missing data. First, 647 cases (5.7%) from the original full sample were deleted listwise as they had missing data across all bullying items of interest, suggesting respondents failed to complete this section of the survey entirely. Of the remaining cases, 823 (7.2%) had partially missing data for the relevant variables and were also deleted listwise. Because the number of cases with partially missing responses accounted for less than 10% of the sample, listwise deletion methods were deemed acceptable and unlikely to significantly bias statistical estimates (Langkamp, Lehman, & Lemeshow, 2010). The remaining cases were then divided into two random split-halves for analytic purposes. Subsample 1 (S1; n = 4,989) was used to conduct EFA, whereas Subsample 2 (S2; n = 4,990) was used to conduct CFA and multigroup invariance analyses. Descriptive statistics were calculated for both subsamples to investigate the distributionality of the target items and are provided in Table 4. Skewness and kurtosis estimates were consistently greater than |2.0| for all items, indicating significant nonnormality of the distribution. Due to the relative-frequency nature of the response options for the bullying items, responses were treated as ordered categorical data for all primary analyses.
Descriptive Statistics for All Bullying Items in Subsamples 1 and 2.
Note. Q1, Q3 = first and third quartile score; Skew. = skewness; Kurt. = kurtosis.
EFA were conducted using the psych (Revelle, 2016) and nFactors (Raiche, 2010) R packages. Rather than analyzing the more traditional Pearson correlation matrix, the polychoric correlation matrix was analyzed, as simulation studies have demonstrated the superiority of this approach in estimating the population Pearson values when data are ordinal (Holgado-Tello, Chacón-Moscoso, Barbero-García, & Vila-Abad, 2010). Furthermore, given the nonnormality of these data, a principal axis factoring extraction method was selected, as it is more robust to violations of normality than maximum likelihood. To enhance factor interpretation and account for association among extracted factors, an oblique factor rotation method (promax) was used. The number of factors to retain for the preferred measurement model was determined based on an array of analyses and decision rules, beginning with scree plot visualization of eigenvalues and parallel analysis (Garrido, Abad, & Ponsoda, 2013). Both analyses indicated that a two-factor solution was the best description of these data. In line with this result, a two-factor solution was constrained for the first model but additional three- and four-factor solutions were also tested to explore the potential of alternative structures. Statistical assumptions of the EFA were evaluated in terms of sampling adequacy (Kaiser–Meyer–Olkin [KMO] > .90), sphericity (χ2 p < .05), and extracted item communalities (h2 ≥ .50). In addition, rotated pattern matrix output for each EFA was inspected for strong factor loadings ≥ 0.3, small cross-factor loadings less than 0.3, and theoretical interpretability.
CFA and multigroup invariance analyses were conducted using the lavaan (Rosseel, 2012), semTools (semTools Contributors, 2016), and psych (Revelle, 2016) R packages. Following all EFA, the preferred measurement model was analyzed with a CFA using a robust diagonal weighted least squares estimator (DWLS; DiStefano & Morgan, 2014). Data–model fit was assessed using a variety of indices, including scaled-shifted variants of model chi-square (χ2; Asparouhov & Muthén, 2010), comparative fit index (CFI; Brosseau-Liard & Savalei, 2014), and root mean square error of approximation (RMSEA; Brosseau-Liard, Savalei, & Li, 2012), as well as the standardized root mean square residual (SRMR). The following criterion values were used to indicate at least adequate data–model fit: χ2 with an associated p > .05 (Kline, 2016), CFI ≥ .90, RMSEA with 90% confidence interval (CI) ≤ .08, and SRMR ≤ .08 (Hu & Bentler, 1999). Of these, the χ2 index is perhaps the least useful for determining fit quality, as it may be susceptible to bias from even minor misspecification when data are nonnormal and the sample size is large, as in the present study. However, this index is still reported herein for completeness.
In addition, it is noteworthy that research supporting the use of these decision rules has mostly been conducted using maximum likelihood estimation methods with normally distributed, continuous data. Research on appropriate cutoff values for robust DWLS estimation for nonnormal ordinal data is limited, but some evidence suggests that more conservative criteria of CFI ≥ .95 and RMSEA ≤ .05 may be more appropriate (DiStefano & Morgan, 2014). Due to the lack of consensus regarding fit index cutoff values for the conditions of this study, the traditional criteria were used to identify adequate fit and the stricter criteria were used to indicate strong fit, but conclusions should be regarded as tentative. Scale reliability was evaluated at the latent level using the H statistic (Mueller & Hancock, 2008) and with categorical ω for the observed level (Dunn, Baguley, & Brunsden, 2014; Kelley & Pornprasertmanit, 2016), where values ≥ .70 were considered adequate for both. H was calculated using the formula provided by Mueller and Hancock (Mueller & Hancock, 2008) whereas categorical ω was computed using the MBESS package in R (Kelley, 2016).
Once a good-fitting CFA model was identified, multigroup CFA was then conducted with the same subsample (S2) to investigate measurement and structural invariance across three demographic factors: grade level, sex, and race/ethnicity. Tests of measurement and structural invariance involved adding increasingly restrictive parameter constraints to the model. The order of invariance constraint level proceeded as follows: (a) configural/baseline (equal latent structure), (b) weak (adds equal factor loadings), (c) strong (adds equal intercepts), (d) homogeneity of latent variance (adds equal residuals and latent variances), and (e) homogeneity of latent means (adds equal latent means). Levels (a) through (c) refer properly to measurement invariance while levels (d) and (e) concern structural invariance (Vandenberg & Lance, 2000). Relative invariance from less to more constrained models was determined based on change in CFI ≤ .01 (Vandenberg & Lance, 2000). However, it is noteworthy that little research is available offering guidance on invariance evaluation criteria when using a robust DWLS estimator for nonnormal ordinal data. Another common test for invariance is the change in model χ2 from less to more restrictive models. However, given that there is disagreement in the literature regarding the appropriateness of this test when using estimation methods similar to robust DWLS (e.g., WLSMV; Asparouhov & Muthén, 2006; cf. French & Finch, 2008), we have decided to forego this approach in favor of the change in CFI for the current investigation. Invariance results should therefore also be interpreted with appropriate caution and considered tentative. Finally, once an appropriate factor structure was established with S2, subscale scores were calculated at the observed level to investigate their descriptive properties.
Results
Latent Factor Structure
EFA (S1)
Results of the first analysis, which constrained a two-factor solution, showed strong KMO sampling adequacy (.94), lack of singularity—χ2(231) = 79,308, p < .001—and adequate extracted item communalities (h2 range = .56-.94). Table 5 summarizes the factor loading output and psychometric characteristics of this measurement model. Eigenvalues for the two factors collectively accounted for 75.8% of the variance in these data. Pattern matrix factor loadings for all bullying victimization (λ range = .74-.89) and bullying perpetration (λ range = .77-.95) items were greater than the minimum threshold of .30, with no significant cross-loadings. In addition, observed internal consistency estimates were strong for both factors (ω > .90), while also showing a strong latent factor correlation (ϕ = .53).
EFA Pattern Matrix Results for the Two-Factor Measurement Model.
Note. Bold factor loadings distinguish items in the perpetration and victimization scales.
ω =categorical omega; CI = confidence interval; r = average interitem correlation. The 95% CIs for categorical omega were calculated from 10,000 draws of a bias corrected and accelerated bootstrapping procedure.
Next, a second EFA constraining a three-factor solution was tested. Results from this model showed strong communalities among all items (h2 range = .56-.94) and the three factors collectively accounted for 80.5% of the variance. Inspection of the pattern matrix revealed the three factors consisted of (a) 11 items related to perpetration (λ range = .82-.93), (b) seven items related to traditional (i.e., noncyber) victimization (λ range = .55-1.00), and (c) four items pertaining to cyber victimization (λ range = .69-.72). Although these loadings were strong, interpretation of the factors was not as clear as with the two-factor model, given that five items cross-loaded on other factors: pVerbal, pCompOut, pCellOut, vReligious, and vComp (see Tables 2 and 3 for item content).
Finally, a four-factor EFA solution was tested. As with both previous models, the four-factor solution showed uniformly strong communalities (h2 range = .61-.96) and collectively accounted for 82.4% of the variance. Pattern matrix inspection revealed that the first three extracted factors represented the same factors as in the three-factor model—traditional victimization (λ range = .55-.96), perpetration (λ range = .82-.95), and cyber victimization (λ range = .72-.77)—plus a fourth factor where only the items vRacial (λ = .50) and vReligious (λ = .47) had factor loadings above .30. This model also showed three items with cross-loadings greater than .30: pCellOut, vRacial, and vReligious (see Tables 2 and 3 for item content). Considering these EFA results together, the preponderance of evidence converged in support of retaining the two-factor structure as the preferred measurement model for subsequent CFA and multigroup invariance analyses.
CFA (S2)
To corroborate the two-factor measurement model indicated by EFA, CFA was conducted with S2 by regressing each of the 22 observed bullying items onto their respective latent factors of perpetration and victimization and allowing both latent factors to freely covary. Findings showed the model was characterized by strong latent factor loadings for perpetration (λ range = .72-.97) and victimization (λ range = .74-.97), as well as a large covariance between the factors (ϕ = .69). Model fit indices also indicated adequate to strong data–model fit—χ2(208) = 3174.054, p < .001; CFI = .971; RMSEA = .053 (90% CI = [.052, .055]); SRMR = .064 (see Table 6 and Figure 1). In addition, both latent factors had strong internal reliability and average variance extracted (AVE): victimization H = .98 and AVE = .72, perpetration H = .99 and AVE = .82.
Factor Loading Coefficients, Latent Variance, and Latent Covariance for the Two-Factor HBSC Bullying CFA Model.
Note. Unstandardized and standardized factor loadings refer to the pattern coefficients. rs refers to the structure coefficients. HBSC = Health Behavior in School-Aged Children; CFA = confirmatory factor analysis.

Full measurement model of the two-factor HBSC Bullying Measure.
Measurement and structural invariance analysis (S2)
Results from the multigroup measurement and structural invariance analyses showed good data–model fit at all levels of constraint across student race/ethnicity, grade level, and sex (see Tables 7 and 8). ΔCFI was uniformly less than .01 for every level of invariance across all demographics. In addition, the strictest invariance level, homogeneity of latent means, was the best fitting model for all three demographics, providing further evidence for measurement and structural invariance across groups.
Model Fit of the Two-Factor CFA for Each Demographic Group Used for Multigroup Invariance Analyses.
Note. χ2, RMSEA, and CFI estimates are presented with scaled-shifted corrections applied. All χ2 values were significant at the p < .001 level. CFA = confirmatory factor analysis; RMSEA = root mean square error of approximation; CI = confidence interval; CFI = comparative fit index; SRMR = standardized root mean square residual; λVict. = λ victimization; λPerp. = λ perpetration; St. Cov. = standardized covariance between the victimization and perpetration latent variables.
Measurement and Structural Invariance Results for the Two-Factor Model Across Demographics.
Note. χ2, RMSEA, and CFI estimates are presented with scaled-shifted corrections applied. All χ2 values were significant at the p < .001 level. RMSEA = root mean square error of approximation; CI = confidence interval; CFI = comparative fit index.
Observed Subscale Descriptive Statistics
In addition to the latent-level analyses described above, basic descriptive statistics were also calculated with S2 at the observed level for total scores from the victimization and perpetration subscales (see Table 9). Results indicated that these scores were characterized by strong internal consistency (ω > .90) as well as moderate (victimization r = .47) and large (perpetration r = .60) average inter-item correlations. In addition, the scores showed significant nonnormality (skewness and kurtosis > |2.0|) and relatively low bullying involvement frequency for both perpetration (Mdn = 11) and victimization (Mdn = 12).
Observed Descriptive Statistics of the HBSC Bullying Measure.
Note. Q1, Q3 = first and third quartile score; r = average interitem correlation; ω (95% CI) = categorical omega with 95% confidence interval calculated with 10,000 draws from a bias corrected and accelerated bootstrapping procedure. HBSC = Health Behavior in School-Aged Children; Skew. = skewness; Kurt. = kurtosis; CI = confidence interval.
Discussion
Although several instruments assessing youth bullying involvement are currently available, to date, none appear as conceptually broad or as balanced across victimization and perpetration domains as the measure included within the HBSC self-report survey. The present study was the first to investigate the structural validity of responses to the 22 items that make up this measure, using the publicly available, nationally representative sample of U.S. youth in Grades 5 to 10 from the 2009-2010 administration of the survey. We hypothesized that results from factor analyses would support a two-factor measurement model that represented bullying victimization and perpetration, and that this model would be reasonably invariant across key demographics.
After testing the tenability of two-, three-, and four-factor EFA models, the results converged on the conclusion that the hypothesized two-factor model of bullying involvement was in fact the best structure described by the data. CFA results provided additional evidence in support of this two-factor measurement model, showing strong data–model fit evidenced by a variety of absolute and incremental fit indices. Furthermore, findings from the follow-up multigroup CFA supported the measurement and structural invariance of responses across student sex, race/ethnicity, and grade level. However, due to insufficient subgroup sample sizes for several racial/ethnic groups, invariance for race/ethnicity was only established for the three largest populations in the United States (i.e., White, Hispanic/Latinx, and Black/African American students) and should therefore not be generalized beyond these groups based on this study alone.
Taken together, findings from the present study largely supported our hypotheses and suggest that responses to the HBSC self-report bullying items demonstrate structural validity for measuring student bullying perpetration and victimization behaviors. Given the measurement and structural invariance evidence derived from a representative sample of U.S. students in the present analysis, no adjustments to scoring or changes to administration procedures appear necessary, at least when using the measure to assess bullying involvement among 5th- to 10th-grade students identifying as White, Hispanic/Latinx, or Black/African American. Furthermore, because this study utilized such a large and demographically diverse dataset, the descriptive statistics derived from the observed subscales scores (see Table 9) could potentially be used as normative data for informing future research or practice, such as for identifying students at risk of bullying involvement via universal screening or for progress monitoring the effectiveness of prevention or intervention efforts at the level of classrooms, grade levels, schools, districts, regions, or nationally.
Although it is possible to administer either the perpetration or victimization scale independently, we suggest that this is not the optimal use of the measure. Given the differentiated outcome trajectories of victims, perpetrators, and perpetrator–victims, we recommend using both scales in conjunction to assess all three levels of bullying involvement, rather than only one. In addition, it is noteworthy that the covariance between the perpetration and victimization latent variables was large (see Table 6), suggesting that there was a considerable rate of perpetrator–victims in our sample who are at risk of the most detrimental outcomes. Although a thorough analysis of prevalence rates is beyond the scope of this measure development study, interested readers are directed to Renshaw, Roberson, and Hammons (2016) and Renshaw, Hammons, and Roberson (2016) for more detailed investigation of prevalence rates and risk outcomes among different bullying involvement groups using the 2009-2010 HBSC sample.
Results from this study should also be considered in light of a few notable limitations. As mentioned earlier, the research supporting the use of the robust DWLS estimator for nonnormal ordinal data under various model and sample size conditions is still nascent. Although the considerably large sample might allay some concern about the functionality of the estimates, substantive conclusions drawn from these results should be considered tentative. It is also not known to what extent the descriptive results presented herein (see Table 9) reflect contemporary bullying involvement among U.S. youth, as we are currently 6 years removed from the responses of participants from the 2009-2010 sample used in this study. In addition, as an artifact of splitting the full sample into random halves, invariance analyses could only be conducted for the three most prevalent racial/ethnic groups, which limit the generalizability of findings for most racial/ethnic minority groups within the United States. Although future administration of the HBSC in the United States is not guaranteed, this does not preclude this measure’s use in later health-behavior and bullying prevalence rate studies in other contexts. Further research is therefore warranted to replicate these analyses, whether in the context of formal national HBSC administration or not, to capture changing trends in bullying involvement and expand the validity evidence of the measure to more diverse student populations.
Future research investigating the concurrent and predictive validity of responses to the HBSC bullying measure with other physical, psychological, and academic well-being indicators is also encouraged. This would allow for a greater understanding of which classes of bullying behavior represented by this measurement model are salient indicators of risk. Relatedly, this measure could also be covalidated with preexisting bullying instruments to test the comparative validity of responses to competing measures for predicting valued student outcomes. Future studies within this line of research might also benefit from using other data collection methodologies beyond self-report (e.g., school discipline records, informant-report behavior ratings), as it will alleviate potential bias from relying on a single data collection approach (see Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). Similarly, randomizing the order in which items are presented to participants may also help curb biased responding, as some evidence suggests respondents may be primed to answer differently depending on the phrasing of previous questions (Huang & Cornell, 2015). Finally, in addition to basic psychometric validation studies, research investigating the treatment utility (see Hayes et al., 1987) of scores derived from the HBSC bullying measure is needed to inform use of the instrument for actual prevention and intervention purposes.
Footnotes
Acknowledgements
The authors are grateful to Ronald J. Iannotti for making available the Health Behavior in School-Aged Children survey datasets and their codebooks for public use.
Authors’ Note
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
