Abstract
Violence assessment can potentially be improved by Item Response Theory, that is, ordinal Mokken Scale Analysis. The research question is as follows: Does Mokken Scale Analysis of secondary pupils’ experience of violence result in a homogeneous, reliable, and valid unidimensional scale that fits all the requirements of Mokken scaling? The method used is secondary analysis of Dutch national data collected from secondary school pupils in 2008 by means of a digital school safety survey. The first random sample (n1 = 14,388) was used to develop the scale; the second sample (n2 = 14,350) is meant to cross-validate the first results. Pupils’ experience of violence is assessed by 29 items reflecting six types of antisocial or aggressive behavior. A Mokken scale of 25 items meets the requirements of monotone homogeneity and double monotonicity. Ordering is invariant between boys and girls; being born in the Netherlands or not; and feeling at home in the Netherlands or not. These results are cross-validated in Sample 2. The latent construct concerns pupils’ experience of violence in terms of severity, varying from verbal and mild physical violence (relatively most frequent), to combinations of social, material, and severe physical violence, to very severe and serious sexual violence (relatively least frequent). Some limitations and further developments are discussed.
Keywords
One of the main reasons researchers investigate experiences of violent behavior is that such behavior has various negative consequences for the person or group being victimized. Violence-related negative effects may be expressed in feelings of anxiety, depression, low self-esteem, and loneliness (Duke, Pettingell, McMorris, & Borowsky, 2010). An experience of violence may also involve physical pain and injury or material damage, for example, vandalism (Bayh, 1975). Moreover, frequent victimization by peers at school is associated with poor academic performance (Schwartz, Gorman, Nakamoto, & Toblin, 2005). Violence perpetrated in and around school, including on the Internet, is interpreted as a threat to safety and social cohesion (Chen & Astor, 2011; Finkelhor, Omrod, Turner, & Hamby, 2011; Siu, 2011). Violence-related experiences in and around schools are therefore a major concern in the educational policy of countries such as the United States (Mayer & Furlong, 2010), Canada (Beauvais & Jenson, 2002), New Zealand (Office of the Children’s Commissioner, 2010), and the Netherlands (Ministry of Education, Culture, and Science, 2011).
Researchers assess violent behavior or pupils’ experience of violence within the context of different behavior areas and by means of different procedures. Antisocial behavior and experiences can be measured by means of a self-reporting instrument such as the Olweus Bully/Victim Questionnaire (Solberg & Olweus, 2003). Wang, Iannotti, Luk, and Nansel (2010) have extended this instrument to examine the co-occurrence of victimization in subtypes of bullying, for example, physical and verbal bullying, social exclusion, spreading rumors, and cyber bullying. Mynard and Joseph (2000) have developed a multidimensional psychometric scale describing different types of peer victimization. Nylund, Bellmore, Nishina, and Graham (2007) have classified secondary school pupils by their victimization experiences and shown that victim groups are best understood according to level of severity rather than type. Felix, Furlong, and Austin (2009) clustered pupils with similar characteristics, resulting in five victimization subgroups. Mooij (2011a) has focused on relationship patterns between personal and other characteristics of secondary school pupils and their motives as a victim, perpetrator, or witness of six types of violent behavior, in relation to the complementary social roles of other pupils, teachers, other school staff, and pupils’ relatives. In the same vein, Mooij (2011b) has examined social interaction patterns between the personal and school characteristics of secondary school teachers and their experience of violence in differentiated ways.
Because violence is measured in different areas, in different ways, and within different groups, and because the results are used in different designs, it is somewhat difficult to compare the research findings of multiple studies or to assess their relevance for identifying or reducing the negative effects described above. Michie and Cooke (2006) have summarized the problems involved in violence measurement as multidimensionality in assessment; nonempirical ordering of violent acts; inclusion of undiscriminating items; and differential precision of measurement across the range of seriousness. They themselves used the MacArthur Community Violence Screening Instrument (MCVSI) to interview 250 male prisoners between 18 and 40 years of age. They applied Item Response Theory (IRT) and found that the instrument’s items were not ordered correctly in terms of the severity of the underlying trait; additional items were required to improve discrimination.
Given the diversity of violence concepts and assessment procedures, it is worth looking more closely at IRT. IRT models generally show how the items in a scale function relative to one another and where each item is situated on the continuum of an underlying construct from low to high severity or difficulty (cf. Schafer, 1996). When the items form a unidimensional scale, it is appropriate to combine them to reach a total score that can be related to the scores of other variables. IRT may therefore solve some of the problems associated with the measurement of violence. An example is given by Regan, Bartholomew, Kwong, Trinke, and Henderson (2006), who have evaluated the structure of the physical violence scale, part of the Conflict Tactics Scales (CTS). These researchers have determined the ordering of items used to assess 14 acts of physical violence within heterosexual relationships in a sample of women and men.
There has been little IRT-based research on violent behavior and the associated experiences (Regan et al., 2006). Information about various types of item scaling and IRT analysis has been presented by Van Schuur (2003). He began by elaborating a “Guttman scale” and then introduced “Rasch” and “Mokken” Scale Analysis. Both Rasch and Mokken analysis assume random or probabilistic deviations from a perfect Guttman scale. Van Schuur stated that, in probabilistic IRT models, the probability of a positive response to a dichotomous item depends on one or more respondent parameters and one or more item parameters. The respondent and item parameters can be distinguished and estimated separately (cf. also Mokken, 1997; Molenaar & Sijtsma, 2000). If the probabilistic IRT model holds, the item parameters can be estimated separately from the respondents’ scale values. This property is called “specific objectivity” or “respondent and item measurement invariance.” This scale characteristic makes it possible to compare respondents across groups or over time; it facilitates continuity between tests meant to measure the same concept at different but overlapping levels; and it supports computerized adaptive testing. A difference between Rasch and Mokken analysis is that Rasch analysis uses parametric logistic modeling, whereas Mokken analysis is based on nonparametric or ordinal modeling (Molenaar & Sijtsma, 2000). Rasch analysis allows item scoring on a continuous latent scale, whereas Mokken analysis uses addition of item scores to obtain a Mokken Scale sum score. Therefore, compared with Rasch analysis, Mokken analysis has less strict assumptions and can be used with a lower number of items (Van Schuur, 2003).
Given these potential advantages, it is worthwhile to investigate violence assessment using a probabilistic IRT model, in particular the Mokken Scale model. Inspired by Marshall and Hucker (2006), Nitschke, Osterheider, and Mokros (2009) have developed a Mokken scale based on 11 items to discriminate sexual sadists from sexual nonsadists. Nitschke et al. (2009) emphasized that research using Mokken unidimensional scaling is more efficient than research applying more dimensional scaling classifications. This issue can be explored empirically by performing secondary analysis of violence data on secondary school pupils collected by means of a national monitoring system (Mooij, 2011a, 2011b). In this research, teachers in the witness role, and pupils in the victim, offender, and witness roles, agreed perfectly in their frequency ranking of six types of violence (verbal, material, social, mild physical, severe physical, sexual). These results suggest that the different types of violence can be ordered more efficiently in terms of severity while indicating only one underlying construct. If this were shown to be true also for items indicating extreme violence (e.g., rape, using a weapon, stealing), then the resulting Mokken scale may well facilitate follow-up secondary analyses, assist in the design of future research and data analyses, and advance practical methods to be used by school personnel to actually reduce or prevent violence in and around schools. The research question is therefore as follows: Does Mokken Scale Analysis of secondary school pupils’ experiences of violence result in a homogeneous, reliable, and valid unidimensional scale that fits all the requirements of Mokken scaling?
Method
Data and Samples
The research question will be answered by means of secondary analyses of Dutch national data collected in a digital survey of school safety (“school safety monitor”). The data are part of a 2-yearly evaluation of school safety in secondary education (cf. Mooij, De Wit, & Fettelaar, 2011). The Internet-based survey was developed in 2005 to help the Ministry of Education, Culture, and Science record and evaluate trends in school safety among pupils, school staff, and school leadership. The results supply both national government and participating schools with cross-sectional and longitudinal information about school safety, experiences, and incidents of violence and school measures to reduce and prevent violence at both national level and school level (cf. Mooij et al., 2011).
The secondary analysis makes use of some of the data collected in 2008 from a total of 78,840 pupils who participated in the survey. These pupils attended 219 secondary schools distributed throughout the Netherlands. They used an individual login code to complete a web questionnaire. The login code enabled randomized collection of different versions and parts of the questionnaire. The pupils completed the questionnaire in their classrooms under teacher supervision (see further Mooij, 2011a).
Two separate, random samples of 20% (n = 15,768) were drawn from the total group of 78,840 pupil respondents. These samples were considered to represent the population of pupils adequately and to support efficiency in the Mokken analyses to be carried out. Moreover, the second sample offers an initial indication of the scale’s external validity. The variables, gender, country of birth, and feeling at home in the Netherlands, were included as grouping variables to check the Mokken results for specific objectivity or for invariance of respondent and item measurement respectively. For reasons of analysis, pupils with one or more missing values were excluded. The first sample (n1 = 14,388) was used to develop the desired scale and the second sample (n2 = 14,350) functions as an independent check or cross-validation of the first result of scale construction.
Variables
Violence experienced at school is assessed by scoring different antisocial or aggressive activities related to verbal, material, social, mild physical, severe physical, and sexual behaviors (Mooij, 2011a, 2011b). The specific items assessed with respect to each type of violence are given in Table 1.
Types of Violent Behavior and Specification Into Items
All 29 items were scored for the September 2007-January 2008 period. Scoring occurred by choosing one out of seven answer alternatives (from never to always). The scores obtained for each violence item were dichotomized (0 = never, 1 = once or more). Item scores per type of violent behavior were included in principal factor analysis and Alpha scale analysis. For each type of violence, the factor results indicate the occurrence of a homogeneous group of items. Reliable Alpha scale coefficients on the dichotomized items of the six types of violence are presented in Table 2.
Alpha Reliabilities of Scales on Types of Violent Pupil Behavior
A robust Mokken scale must be relatively immune to variables potentially related to the subject of the scale. The first relevant variable here is gender: Boys or men experience other and more severe types of violence than girls or women (Marshall, 1992; Regan et al., 2006; Wang et al., 2010). Furthermore, the variables, country of birth and feeling at home in the country in which one is actually living, are shown to be important for the experience of violence in and around school (Beauvais & Jenson, 2002; Felix et al., 2009). Not being born in the country and not feeling at home in the country are related to experiences of more violent behavior compared with being born and feeling at home in the country (Beauvais & Jenson, 2002). These three variables are included in the analyses to check invariance of measurement between groups when developing a Mokken scale. Gender is coded as boy or girl, whereas country of birth and feeling at home are coded as 0 = in the Netherlands and 1 = in another country.
Mokken Scale Analysis (MSA)
Molenaar and Sijtsma (2000) and Van Schuur (2003) have introduced and elaborated on the nonparametric or ordinal Mokken Scale Analysis (MSA; cf. also Mokken, 1971; Nitschke et al., 2009; Sijtsma & Molenaar, 2002). These authors define and use different homogeneity statistics to specify the requirements for this type of scaling. First, the coefficient of homogeneity H ij of a pair of items is expressed as the ratio of the covariance between Items X i and X j and their maximum obtainable value, given the marginal distribution of these two items. H ij thus indicates the internal consistency or reliability of each pair of items. Second, and likewise, the homogeneity of an entire scale, H, is the ratio of the sum of all pairwise covariances versus the sum of all pairwise maximum covariances, or the ratio of the sum of all errors observed versus expected. The H index then indicates the internal consistency of the whole scale. Third, H i is the ratio of the sum of all pairwise covariances with respect to Item i versus the sum of all pairwise maximum covariances concerning this item. H i thus reflects the scalability of one item in relation to the whole set of items.
Statistics H, H i , and H ij are essential in constructing and testing the first aspect of a Mokken Scale. To control for chance values, the null hypothesis being tested is that H, H i , or H ij , are 0 in the population. A Mokken scale is said to exist if each H ij > 0, and each H i > 0.30. This furthermore implies that H > 0.30. If these three conditions are met, monotone homogeneity is said to occur. The interpretation is that (a) the items reflect a unidimensional latent construct; (b) local independence or respondent and item measurement invariance is at stake; and (c) for each item it is true that the more the respondent can be described in terms of the latent construct, the greater the chance that the response to the item will be positive (“person ordering is item-free”).
If a scale is accepted as being monotone homogeneous, the set of items needs to be checked further for double monotonicity. Double monotonicity requires that the ordering of items should be uniform across groups of respondents or, in other words, that item response functions do not intersect (“item ordering is person-free”). Molenaar and Sijtsma (2000) define and use coefficient H T to test whether curves do intersect. H T indicates the degree to which item ordering is the same for each respondent in the population. Negative values of H T point to violations of the nonintersection requirement. The criterion is that the number of negative H T values should not be larger than 10% of the total number of respondents; simultaneously, H T for the whole group should be greater than 0.30. To find out which items intersect, Molenaar and Sijtsma (2000) use the P 1,1 matrix (see also Sijtsma & Molenaar, 2002). This is a square symmetrical matrix with items ordered according to their scores on the latent construct. The cell elements contain the proportion of respondents who give a positive response to both items in the pair. These positive responses are hypothesized to increase from left to right and from top to bottom, which indicates double monotonicity.
During the scale construction process, items violating scale requirements should be discarded. Van Schuur (2003) proposes removing items one at a time, to check the effects on the other items. While referring to Molenaar & Sijtsma (2000), he uses H i in a combination of different indicators to develop statistic Crit. Each time, the item with the highest Crit value should be discarded first; Crit values ≤ 80 can be accepted. Finally, Van Schuur (2003) bases reliability ρ of a whole set of items in a Mokken Scale, or Reliability Rho, on the diagonal cells of the P 1,1 matrix (cf. also Molenaar & Sijtsma, 1988; Sijtsma & Molenaar, 1987). By interpolation, these cells estimate the probability of a positive response being given to the same item if it were completed twice.
MSA uses a bottom-up hierarchical clustering procedure, starting with the pair of items with the highest H ij . The next best item in the scale is then included, and so on. MSA can be carried out with respect to dichotomous and polytomous items (Van Schuur, 2003) by using the computer program for “Mokken Scaling Analysis for Polytomous items” (MSP; see Molenaar & Sijtsma, 2000). This program was used in the present analyses. However, the calculation of double monotonicity for polytomous items, or H T , has not been implemented in this program. MSA was therefore conducted on items that were made dichotomous.
Results
Scale Construction Using MSP (Sample 1)
Applying the MSP program with respect to all 29 violence items listed in Table 1 demonstrates monotone homogeneity. However, the Crit value of Item 10 (“ignoring”) is 150, which is above 80 and therefore evidently blocks double monotonicity. Elimination of this item followed by MSA of the remaining 28 items resulted in Item 19 being discarded (“punching someone on purpose”). In successive analyses, Item 11 (“excluding”) and Item 21 (“fighting with someone”) were also eliminated. The MSA results for the remaining 25 items are shown in Table 3.
Sample 1: Mokken Violence Scale of 25 Items (Items 10, 19, 11 and 21 Eliminated) a
N = 14,388; Coefficient H = 0.56; ρ = .94. Matrix of H ij values per item pair: minimum 0.34; maximum 0.88. H T coefficient for entire group: 0.54. Number of negative H T values: 341 (2.7%).
Primary data were polytomous; data were dichotomized in Mokken Scale Analysis.
Only the two highest values are given.
Table 3 shows that the lowest H i value is 0.50 (Item 25 “making sexual comments”), which is higher than the required minimum value 0.30. The H value of the whole scale is 0.56, exceeding the required minimum value 0.30 and indicating a strong scale (Molenaar & Sijtsma, 2000, p. 12). The H ij values (not shown in Table 3) vary from 0.34 to 0.88, which are all above the required value 0. Taken together, the values of the three homogeneity coefficients demonstrate monotone homogeneity.
In addition, the highest Crit value is 33 (Item 8 “destroying things”), which is below 40. Coefficient H T is 0.54, which is above 0.30, whereas the number of negative H T values amounts to 2.7%, that is, below 10%. These scale results therefore meet the requirements for double monotonicity. The overall reliability of the scale is 0.94.
The scale results in Table 3 thus fulfill the Mokken scale requirements of both monotone homogeneity and double monotonicity. The 25 items are arranged from a low degree of violence (“calling someone names”) to a high degree of violence (“rape”): see the items and their respective means in Table 3. This ordinal Mokken scale then assesses a latent construct reflecting pupils’ experience of violent behavior in terms of severity.
The next step of the analysis involves checking the scale results of Table 3 to clarify whether scaling is invariant when measuring different groups of respondents. The dichotomous variables, gender, country of birth, and feeling at home in the Netherlands, are used to differentiate between the pupils. The distributions of these variables within random Sample 1 are presented in Table 4. Gender is distributed about equally within the sample, whereas having the Netherlands as the country of birth and feeling at home in the Netherlands both score above 90%.
Sample Distributions According to Gender, Country of Birth, and Feeling at Home
Conducting MSA with respect to all six groups of pupils identified in Sample 1 resulted in small differences in statistical indicators compared with the overall scale. However, within each subgroup the scale results are in line with MSA requirements. Summary statistics of the overall Mokken scale and the six subgroup Mokken scales are given in Table 5.
Sample 1: Summary Statistics for Gender, Country of Birth, and Feeling at Home in Country
Validating the Mokken Scale With Sample 2 Data
An independent check of the exploratory results of random Sample 1 can be carried out by applying MSA with respect to the data taken from random Sample 2. The results of the cross validation show that the same four items discarded in Sample 1 are also discarded in Sample 2 in successive steps. An overview of the MSA results is given in Table 6. The table shows that H is 0.55; H i values are at least 0.49, which exceeds 0.30; and H ij values range from 0.33 to 0.89. Sample 2 outcomes thus meet the requirements of monotone homogeneity. Generally speaking, the MSA Sample 2 results in Table 6 closely match the MSA Sample 1 results in Table 3.
Sample 2: Cross-Validation of Mokken Scale (cf. Table 3)
N = 14,350; Coefficient H = 0.55; ρ = 0.94. Matrix of H ij values per item pair: minimum 0.33; maximum 0.89. H T coefficient for entire group: 0.51. Number of negative H T values: 333 (2.5%).
Only the highest values are given.
Items 13 and 14 produce the two highest Crit values, 23 and 25, which do not exceed the level of acceptability (80). Coefficient H T for the whole sample is 0.51 and the percentage of negative H T values is 2.5. The results for Sample 2 in Table 6 therefore also meet the requirements of double monotonicity. A minor difference between Sample 1 results and Sample 2 results is the ordering of Item 8 (“destroying things”) and Item 14 (“blackmailing”). In Table 3, the order of these two items is 8, 14; in Table 6 this is reversed. The difference between the respective means is negligible, however.
Like Sample 1, Mokken scaling for Sample 2 has to be invariant when measuring different subgroups. Table 4 provides information on the distribution characteristics of Sample 2 regarding the dichotomous variables, gender, country of birth, and feeling at home in the Netherlands. As in Sample 1, the outcomes of MSA with respect to all six groups of Sample 2 result in some minor differences compared with the overall Mokken scale. Within each subgroup, scale results are in line with MSA requirements. Table 7 summarizes the statistics of the overall scale and the six subgroup scales for Sample 2.
Sample 2: Summary Statistics for Gender, Country of Birth, and Feeling at Home in Country
Discussion
Mokken Scale Analysis of secondary school pupils’ information about their experiences of violence results in a homogeneous and reliable Mokken scale that meets the requirements of both monotone homogeneity and double monotonicity. This affirmative answer to the research question demonstrates the existence of a unidimensional, cumulative scale for pupils’ experiences of violence. Moreover, the first exploratory results of Sample 1 (n = 14,388) are cross-validated by the independent check carried out by Sample 2 (n = 14,350). The latent construct underlying the ordinal Mokken scale measuring pupils’ experience of violence in terms of severity is composed of 25 items. These are ordered from “occurring most frequently” (Item 25, “calling someone names”) to “occurring least frequently” (Item 1, “rape”): See Tables 3 and 6. This result implies that the six Alpha scales representing different types of violence (see Tables 1 and 2) can be replaced, if one so desires, by one Mokken scale indicating experience of violence in terms of severity. This latent construct varies from verbal and mild physical violence, which occurs relatively most frequently, to combinations of social, material, and severe physical violence, to very severe and serious sexual violence, which occurs relatively least frequently.
This empirical ordering appears to be plausible and in line with what can be expected from secondary school pupils. Moreover, in both Sample 1 and Sample 2, this ordering demonstrates invariance between boys and girls; between those born in the Netherlands or elsewhere; and between those who feel at home in the Netherlands and those who do not. These verifications of measurement invariance for different groups support the applicability of the Mokken scale for different subgroups of respondents who are known to experience violence in varying degrees of severity. To further explore the present results, MSA was conducted using the 15 items that were chosen most frequently by a total of 74,260 pupils. Most of these items indicate verbal and mild physical types of violence, or lower levels of experience of violence in terms of severity (cf. Table 3). While using the same statistical criteria as before, the outcomes show that six items had to be removed to get an adequate Mokken Scale consisting of nine items. This scale combines verbally and physically disturbing behavior and clarifies that different Mokken scales can be obtained by analyzing different sets of items.
Furthermore, the measurement invariance for different subgroups of respondents concerns the requirements of both monotone homogeneity and double monotonicity for each subgroup, for example, boys and girls (cf. Tables 5 and 7). MSA focuses on the consistency and unidimensionality of the cumulative order of items in a scale within boys and within girls. This gender invariance differs from evaluation by a structural equation model (SEM) analysis or multiple regression analysis in which gender acts as an “exogenous variable” influencing a Mokken sum score. SEM and multiple regression analysis concentrate on analysis of potential differences with respect to between-gender variation in scoring on a latent construct or an overall Mokken scale score. For example, multiple regression analyses of these same data reveal among other things that boys score significantly higher than girls on the Mokken sum score concerning experience of violence in terms of severity (cf. Mooij, 2012).
One limitation of the study, however, is that the analysis had to be performed on items that were dichotomized. If it were available in MSP or another program, MSA involving polytomous items might throw more light on the consequences of working with dichotomized answer categories, as was necessary in the present analyses. Despite this limitation, the present Mokken results certainly support the approach used by Michie and Cooke (2006) and Nitschke et al. (2009) to overcome existing problems in violence measurement by applying Mokken Scale Analysis. The actual study results confirm the adequacy of their intentions and offer suggestions for future explorations of the relevancy and use of Mokken Scale Analysis in this type of research.
Given the national data used, the Mokken results can be generalized to the population of Dutch secondary school pupils (cf. Mooij et al., 2008). Generalizability to other groups of respondents, education systems, or countries should be based on adequate research, to provide more information and empirical evidence. In this respect, one important advantage of the scaling results in Tables 3 to 7 is that the information on the 25 items can be transformed into one sum score estimating each respondent’s experience of violence in terms of severity. This sum score can be used in follow-up secondary analyses, for example, to learn more about its meaning and practical uses (cf. Mooij, 2012). Furthermore, the score indicating the severity of violence experienced by a respondent can be included in a theoretical multilevel framework, to both explain and predict the severity of violence that a pupil may experience in an educational context. At individual level, the relevant variables are gender, educational attainment, social and emotional characteristics, and age; at class level, the relevant variables indicate teachers’ social or disciplinary policy and curricular differentiation; and at school level, variables such as social norms, school cohesion, school responsiveness to violence, and severity of disciplinary policy interact in various ways with pupil-level and class-level variables, producing a fairly complete, evidence-based picture of interrelated processes and effects (cf. Kettler, 2011; Mooij, Smeets, & De Wit, 2011; You, Ritchey, Furlong, Shochet, & Boman, 2011).
Footnotes
Acknowledgements
The national school safety monitor survey for secondary education was developed at the request of the Dutch Ministry of Education, Culture, and Science, which consented to secondary analyses of the data and publication of the results. The author is grateful to Rick Schaap, Halewijn Drent, and Daan Fettelaar, MSc, for their assistance in performing the necessary analyses.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
