Abstract
There are four sources of large-scale self-report survey data on victim rates, cross-nationally. These are EU Kids Online, Global School Health Survey, Trends in International Mathematics and Science Study, and Health Behaviour of School-aged Children. There are some differences in methodology between these surveys, but all use pupil self-report data. They have all been used to look at cross-national differences, in relation to other country characteristics and correlates. Here, we examine measures of internal validity (consistency within a survey) and external validity (agreement across surveys) on these data sets. We first report on internal validity issues, using available means within each survey (correlations across strict or lenient frequency criteria; across types of bullying; across ages; across genders). Generally, these correlations are high. We secondly report on external validity, in the sense of how much agreement there is between the four surveys, where they overlap in countries. Here, we find agreement to be from moderate to zero. These low external validity rates raise concerns about using these cross-national data sets to make judgements about which countries are higher or lower in victim rates. A range of possible explanations for the findings are discussed.
Bullying is generally defined as a subset of aggressive behaviour, namely behaviour with the intent to cause harm to someone. It is usually considered as also showing two specific criteria, namely repetition, and imbalance of power (Olweus, 1999). These criteria are well accepted in most of the literature on offline or traditional bullying, even if challenged by some researchers in the case of cyberbullying (Bauman, Underwood, & Card, 2013). Although bullying can occur in the workplace and other contexts, the majority of the research literature has been on school bullying, and the volume of this research has increased very rapidly in the last decade (Zylch, Ortega-Ruiz, & del Rey, 2015).
There are a variety of ways of assessing bullying, but large-scale surveys have generally used self-report questionnaires (Cornell & Bandyopadhyay, 2010; Smith, 2014). These typically report prevalence rates for being a victim of bullying, and often also for being a perpetrator or taking part in bullying others. The repetition criterion is usually assessed by means of a frequency scale for response; for example, how many times or how often one has been bullied over a specified time period (such as one school term). Usually, a response of being bullied only once or twice is not counted in reporting prevalence rates, whereas more frequent experiences such as ‘two or three times a month’ are, with weekly or more experiences being counted as more severe or a stricter criterion (Solberg & Olweus, 2003). The imbalance of power criterion is sometimes assessed by means of a definition of bullying preceding the questions; for example the Olweus Bully/Victim questionnaire makes it clear that in bullying, it is difficult for the victim to defend him- or herself. Sometimes however no definition is given and responses may depend upon the respondent’s interpretation of the word ‘bullying’.
As bullying has become recognised as an international phenomenon (Jimerson, Espelage, & Swearer, 2010; Zylch et al., 2015), interest has increased in making comparisons between different cultures. Such comparisons include prevalence rates and structural characteristics of bullying. Smith, Kwak, and Toda (2016) include a selection of studies comparing bullying in eastern and western cultures, and some of the methodological issues involved in such comparisons. Most such studies have compared two particular cultures in detail; for example, comparisons of bullying in the UK or USA and ijime in Japan (Barlett et al., 2014; Kanetsuna, Smith, & Morita, 2006; Matsunaga, 2010). However, there are sources of data on bullying available based on survey data carried out in a corresponding way in a large number of countries.
A number of surveys of children and young people have extensive data across a wide range of countries, including victim rates (as well as other topics). These are: EU Kids Online survey; Global School Health Survey (GSHS); Trends in International Mathematics and Science Study (TIMSS); and Health Behaviour in School-aged Children (HBSC). These four surveys all use self-report data. In addition, the Programme for International Student Assessment (PISA) surveys includes just one item in their school questionnaire on whether teachers regard bullying as a problem in their school. This is very different from the more extensive pupil-report data from the other four surveys, so we do not consider the PISA data further in this report.
These four survey sources have provided researchers with the opportunity to evaluate large samples of multi-national data on school aged children. A number of published articles have taken one of these survey data sources, and used it to look at cross-national differences in victim or bully rates. Besides reporting countries as high or low in prevalence rates, the data have sometimes been analysed in relation to how country prevalence rates relate to other characteristics. Examples of previous researches that have used these surveys include: EU Kids Online: Livingstone, Haddon, Gorzig, and Olafsson (2011) as well as Helsper, Kalmus, Hasebrink, Sagvari, and De Haan (2013) used cyberbullying experiences as one measure of risk, and clustered European countries in terms of levels of children’s internet use, online opportunities and risks experienced, and particular strategies of parental mediation of the internet. GSHS: Fleming and Jacobsen (2010) reported on comparative prevalence of being bullied among middle-school students in low and middle income countries; and Wilson, Dunlavy, and Berchtold (2013) examined determinants for bullying victimization among 11-16-year-olds in 15 low- and middle-income countries. TIMSS: Lai, Renmin Ye, and Chang (2008) used the 2003 data to compare rates of being bullied in middle schools in the Asia-Pacific region, and examine correlates including parenting, attitudes to school and academic achievement. HBSC: Nansel et al. (2004) used the 1997/1998 data to examine the relationship between roles as victim or bully and psychosocial adjustment. Craig et al. (2009) used the 2005/2006 data to provide a cross-national profile of bullying and victimization in 40 countries, and Due et al. (2009) as well as Elgar, Craig, Boyce, Morgan, and Vella-Zarb (2009) used such differences to examine possible effects of country wealth, and levels of economic inequality.
Although the four surveys sample different sets of countries, there is considerable overlap, especially between certain surveys. Altogether 49 countries are featured in more than one survey (see Table 2). However, when prevalence rates by country are compared across surveys, there are some obvious discrepancies. For example, Sweden appears as a high-ranking country for victim rates in EU Kids Online, but generally as one of the lowest-ranking in HBSC; whereas Portugal is very low-ranking in EU Kids Online but very high-ranking in HBSC. Such discrepancies suggest a need to examine systematically how these surveys compare in measuring cross-national differences.
Here we pursue two aims in relation to these four surveys. First, we aimed to examine the internal validity of each survey. We do this by examining consistency of country differences with respect to age, gender, or frequency criterion (depending on which was available for each survey). Prevalence rates do vary by age and gender, and are of course affected by frequency criterion; but there is little a priori reason to suppose that such differences will interact strongly with country differences. For example, although victim rates generally decrease with age, this has been found to hold true in different countries, irrespective of the overall country prevalence rates (Smith, Madsen, & Moody, 1999). While low measures of internal validity might nevertheless have to be treated with caution, high measures should give confidence that something related to country differences in victim experiences is being assessed in a reliable way by that particular survey.
Second, we aimed to examine external validity of the four surveys. We do this by comparing the victim prevalence rates across the surveys, pairwise for countries in common. Since all four surveys purport to report on pupil self-report prevalence rates for victims, and have been used to compare countries on this and in relation to other indicators, they should show some substantial level of agreement across countries. High levels of agreement would give confidence in the country differences reported, whereas low levels would suggest caution, and further examination of how differences in the surveys might lead to lack of consensus.
Method
The common features of the four surveys are that they used self-report data from school-age children, and all contained questions relating to frequency of being bullied. Full characteristics of the four surveys are given in Table 1. All sample students in the early/mid adolescent period, although age ranges vary; TIMSS reports separately for two age groups, HBSC for three age groups. HBSC also reports rates by gender, at each age. Sample sizes are large, a minimum of 1,000 in each country. Three used school-based surveys, but EU Kids Online gave a face-to-face interview in survey format. TIMSS gave no definition of bullying; EU Kids Online gave a definition but without power imbalance; and both GSHS and HBSC gave an Olweus-type definition. All gave a time frame, but this varied from one month to one year. Only EU Kids Online asked explicitly about cyberbullying; GSHS and TIMSS asked about various (different) types of bullying, whereas HBSC just asked a global question. All had a frequency scale for response, but with differing scale points.
The EU Kids Online survey (www.eukidsonline.net) was conducted in 2010, across 25 European countries. The Global School Health Survey (GSHS) (www.who.int/chp/gshs/factsheets/en/index.html) isaffiliated to WHO and focuses more on low- and middle-income countries. Data collection is carried out amongst approximately 79 participating countries, with the year of data collection varying by country. The Trends in International Mathematics and Science Study (TIMSS) survey (http://timssandpirls.bc.edu/timss2011/international-results-mathematics.html) is given every four years in about 63 countries, both developed and developing. The Health Behaviour in School-aged Children (HBSC) surveys (www.hbsc.org) are given every four years in about 42 countries, mainly from Europe and North America.
We selected surveys from a comparable period in time. For EU Kids Online this was 2010. We used the 2009/2010 data from HBSC, and 2011 data from TIMSS. For GSHS the survey dates vary by country and sometimes locality; we took the most recent survey up to 2012 and national or metropolitan data. All actual survey years are shown in Table 2. Altogether there was data overlap between at least two surveys, for 49 countries; these are shown in Table 2. All the data used is readily available in publications (Livingstone et al., 2011; Currie et al., 2012; Mullis et al., 2012) and on the survey websites.
Internal validity: To assess validity within each survey, we examined across-country correlations for measures that were readily available. These were across: different measures for frequency. For GSHS we took three measures: LN (lenient) = bullied 1 or 2 days or more during the past 30 days; ST (strict) = bullied 3 to 5 days or more during the past 30 days; and SS (scale score) = I was bullied in at least 1 of 7 different ways during the past 30 days. For TIMSS we also took three measures: LN (lenient) = bullied about monthly or weekly, in the last year; ST (strict) = bullied about weekly, in the last year; and SS (scale score: a composite measure reported by TIMSS [sign reversed for consistency]). different ages. For TIMSS, 4th to 8th grade. For HBSC, for ages 11, 13, and 15 years. different genders. Available at each age for HBSC. different types of bullying. For EU Kids Online, online and offline bullying.
External validity: To assess external validity, we compared the victim prevalence rates across the four surveys, pairwise for countries in common. We were able to carry out four comparisons: TIMSS vs HBSC (25 countries for TIMSS 4th grade, 15 countries for 8th grade); EU-Kids Online vs HBSC (23 countries); EU-Kids Online vs TIMSS (20 countries for TIMSS 4th grade, 10 countries for 8th grade); and TIMSS vs GSHS (9 countries for TIMSS 4th grade, 12 countries for 8th grade). We were not able to compare GSHS vs EU Kids Online (0 countries overlap) or GSHS vs HBSC (1 country overlap) due to insufficient countries in common.
We conducted both Pearson’s and Spearman’s correlations using SPSS version 22. Pearson’s correlations are appropriate in terms of considering actual prevalence rates, as has normally been done in use of these surveys in previous publications. However, Spearman’s correlations are more appropriate if the interest is in comparing by the rank order of countries across surveys, rather than actual prevalence rates. Significance is reported as * = p < 0.05; ** = p < 0.01.
Results
Internal Validity Assessments
(1) Correlations across frequency criteria: For GSHS, the correlations (Pearson’s; Spearman’s) among the 3 measures, LN (lenient), ST (strict), and SS (scale score), were LN vs ST (n = 15)r = 0.93**; 0.93**; LN vs SS (n = 16) r = 0.92**;0.92**; and ST vs SS (n = 16), r = 0.90**; 0.90**.
For TIMSS the correlations among the 3 measures, LN (lenient), ST (strict), and SS (scale score), were calculated separately by age. At 4th grade, the correlations (all n = 37) were LN vs ST, r = 0.94**; 0.96**; LN vs SS, r = 0.99**; 0.99**; and ST vs SS, r = 0.95**; 0.97**. At 8th grade, the correlations (all n = 28) were LN vs ST, r = 0.96**; 0.96**; LN vs SS, r = 0.98**; 0.98**; and ST vs SS, r = 0.93**; 0.97**.
(2) Correlations across ages: For TIMSS, 4th to 8th grade correlations were calculated separately by frequency criterion (all n = 22): for LN, r = 0.91**; 0.89**; for ST, r = 0.92**; 0.94**; for SS, r = 0.94**; 0.93**.
For HBSC, correlations (all n = 31) were: 11 vs 13 years, r = 0.92**; 0.90**; 13 vs 15 years, r = 0.92**; 0.85**; 11 vs 15 years, r = 0.83**; 0.83**.
(3) Correlations across gender: For HBSC, correlations were calculated at each age (all n = 31): the boy-girl correlation at 11 years was r = 0.92**; 0.91**; at 13 years, r = 0.89**; 0.87**; and at 15 years, r = 0.86**; 0.83**.
(4) Correlations across online/offline bullying: For EU Kids Online, the correlation (n = 25) was r = 0.76**; 0.70**.
External Validity Assessments
Here we used the SS frequency scores for TIMSS, and for GSHS. TIMSS vs HBSC (n = 25 for TIMSS 4th grade, n = 15 for 8th grade): correlations are shown in Table 3; EU Kids Online vs HBSC (n = 23): correlations are shown in Table 3; EU Kids Online vs TIMSS (n = 20 for TIMSS 4th grade, n = 10 for 8th grade): correlations are shown in Table 4; GSHS vs TIMSS (n = 9 for TIMSS 4th grade, n = 12 for 8th grade): correlations are shown in Table 4.
Discussion
Our aims were to assess the internal and external validity of four large cross-national data bases, available for prevalence of victim experiences by pupil self-report. Our main conclusions are that internal validities are very satisfactory; but that external validities are rather unsatisfactory. We first substantiate these conclusions. We then consider possible explanations of the findings, and finally consider some implications and mention strengths and limitations of our study.
Internal validities are very satisfactory: The values we obtained for internal validity are very high, and it makes very little difference whether Pearson’s or Spearman’s correlations are considered. Also, many small variations in the correlations make sense with expectations. For frequency criteria, all the correlations for GSHS and for TIMSS were very high. Across age levels the correlations are very high for TIMSS and for HBSC. For HBSC with three levels, the correlation is slightly lower for the larger age gap between 11 and 15 years, as would be expected. For gender differences, the correlations for HBSC are high at all three age levels. Finally for EU Kids Online, the correlation between separate measures of online and offline victim experiences is high, and at a level to be expected from much other research on cyberbullying and traditional bullying (Kowalski, Giumetti, Schroeder, & Lattanner, 2014).
These high correlations across countries are obtained from either different points on a frequency scale, different ages or gender of respondents, or responses to question items on online/offline victimization. They strongly suggest that each survey is measuring something about cross-national differences in victim rates, in a reliable or internally valid way.
External validities are rather unsatisfactory: But what is being reliably measured in each survey? Ostensibly, it is the frequency of victim experiences. But if this is being measured in each of the four surveys, we would expect high levels of agreement between them. Given some differences in ages and samples, one might not expect such high agreement, around 0.8 or 0.9, as found in the internal validity assessments above. However it would seem reasonable to expect agreements at around 0.6 or 0.7 if indeed these surveys are measuring a broadly similar construct.
This was not what we found. Of the 23 Pearson’s correlations reported in Tables 3 and 4, only two are statistically significant; and only one of the Spearman’s correlations. The highest correlation is 0.57, and the lowest –0.28 (Pearson) or –0.36 (Spearman). In fact four out of six (Pearson’s) or all six (Spearman’s) correlations between EU Kids Online and TIMSS are negative?
There are some meaningful patterns to be found in the correlations, looking across ages. TIMSS and HBSC both have different age levels. Taking 4th grade to be about 10 years and 8th grade about 14 years, then it is consistent that TIMSS 4th grade has the highest correlation with HBSC 11 years (Pearson’s only); and similarly that that TIMSS 8th grade has the highest correlation with HBSC 13 years (Table 3). Similarly EU Kids Online with an age range of 9–16 years would have a median of around 12-13 years, and indeed has its highest correlations with HBSC at 13 years (Pearson’s only) and lowest at 15 years (Table 3). GSHS has an age range of 11–18 years, with generally most respondents in the 13–17 year age range. Taking a median of around 15 years, it is consistent that the correlation of GSHS with TIMSS is substantial at 8th grade (around 14 years) even though near zero at 4th grade (Table 4). There was limited country overlap between GSHS and TIMSS (n = 9 at TIMSS 4th grade and n = 12 at 8th grade), so the correlations of 0.53 (Pearson’s) or 0.48 (Spearman’s) were not statistically significant.
The pattern of correlations regarding online and offline bullying is also of interest. Between EU Kids Online and HBSC (Table 3), correlations are higher for offline bullying compared to online bullying. This is meaningful, as although HBSC does not specify types of bullying, it quotes an Olweus-type definition which does not mention online or cyberbullying in its examples (Table 1). However the correlations between EU Kids Online and TIMSS (Table 4) show rather little difference between online and offline, even though TIMSS lists six types of victim experience related to physical, verbal and exclusion-based bullying, but does not mention cyberbullying.
Overall, it can be concluded that there is moderate agreement between TIMSS and HBSC (Table 3), with the most age-appropriate correlations being 0.39 (Pearson’s; 0.28 Spearman’s) and 0.57. There is also moderate agreement between TIMSS and GSHS (Table 4), the most age-appropriate correlation being 0.53 (Pearson’s; 0.48 Spearman’s). There is moderate agreement between EU Kids Online offline bullying with HBSC (Table 3), at 0.42 (Pearson’s; 0.32 Spearman’s), taking the most age-appropriate correlation. However there is near-zero or negative agreement between EU Kids Online and TIMSS (Table 4). Correlations of EU Kids Online with TIMSS 4th grade are actually all negative (over 20 countries); but even if we take TIMSS 8th grade as being a more age-appropriate comparison, and consider only offline bullying, the correlations are near zero at 0.06 (Pearson’s; or –0.09 Spearman’s) (over 10 countries).
Possible explanations for limited validity: Alth-ough the four surveys differ in methodology, they all claim to yield cross-national differences in self-reported rates of being a victim of bullying. In this sense they can be considered as constructive replications (Lykken, 1968); that is to say they can be considered as replications not at the detailed procedural level, but at the level of statements such as ‘country X has higher victim rates than country Y’. The low external validity correlations we found suggest considerable discrepancies in this. How can we explain these discrepancies? We consider here a range of methodological issues (see also Table 1): age range; sampling issues; dates of survey; administration procedures; questionnaire issues; definitions of bullying; time reference period; types assessed; frequency scales; and linguistic issues.
Age range: these vary between the surveys, but they all cover the secondary school period (with EU Kids Online also including 9-10 year olds). Within TIMSS and HBSC, correlations across ages were very high. Between surveys, taking the most age-appropriate correlations does yield higher agreement; however the pattern of correlations does not seem to suggest that even more exact age matching would change levels of agreement greatly (Table 3).
Sampling issues: all four surveys had reasonably large samples of at least 1,000 students and often many more. However it is not so clear how nationally representative the samples are. EU Kids Online sampled households randomly, but only used data from children who used the internet at all; their estimate of the percentage of children this applied to varied from 59% to 98%. In 14 of their 25 countries nearly all children used the internet, but in those countries where an appreciable proportion of children did not use the internet in 2010, this produces a sampling bias relative to the other three surveys. HBSC states that ‘in the vast majority of countries, a nationally representative sample is drawn. Where national representativeness is not possible, a regional sample is drawn’ (Roberts et al., 2009). According to the TIMSS website, it selects a random sample of students that represents the full population of students in the target grades; guidelines call for a minimum of 150 schools to be sampled per grade, with a minimum of 4,000 students assessed per grade. A minimum participation rate of 50% of schools from the original sample is required for a country’s data to be included in the international database. The GSHS website states that ‘the GSHS uses a standardized scientific sample selection process’; the available data sets are sometimes signalled as national, sometimes by area such as metropolitan area.
Dates of survey: Three surveys, EU Kids Online, TIMSS and HBSC, used data from a comparable time frame, 2009 – 2011. GSHS however has data sets between 2003 and 2012. Awareness raising and intervention programmes can impact over time on the prevalence of bullying, which has shown some modest decreases in many countries over the last 20 years (Rigby & Smith, 2011). However, any such concerns should only impact significantly on GSHS comparisons.
Administration procedures: How the surveys were administered varied, with EU Kids Online conducting a face to face interview with the child, followed by a self-completion interview for sensitive questions; this had a slightly shorter version for 9-10 year olds and a full version for 11–16 year olds. Anonymity and privacy were promised to encourage honest answers, but the fact that it was conducted in homes with parents often in the vicinity may have influenced the answers of some children, perhaps towards social desirability; in fact Görzig (2012) found that parental presence depressed risk reporting slightly. This interview approach is different from that of TIMSS, GSHS and HBSC who employed a school-based survey which uses a self-administeredquestionnaire.
Questionnaire issues: These surveys had long questionnaires covering many aspects of health and/or education. There are issues around the accuracy of self-report data, and for example question order has been found to influence rates of reported victimization (Huang & Cornell, 2015). While these could be expected lead to differences in overall prevalence rates between surveys, it is not clear that they would affect countries differentially. However, long questionnaires may also contribute to high dropout or non-response rates, and these may vary between countries. This is reported by GSHS, and their section on bullying often had substantial proportions of ‘missing responses’; these are around 15% but do vary considerably by country. This could be a serious issue in comparing countries; some preliminary analysis of EU Kids Online data has found an appreciable country level correlation between non-response rate and cyberbullying prevalence (Anke Görzig, personal communication, 06/02/2016).
Definition of bullying: Both GSHS and HBSC give a very similar definition, based on that of Olweus, that makes clear the importance of imbalance of power. EU Kids Online has a similar definition, but without differentiating bullying from two students of about equal strength fighting or quarrelling (i.e. the power imbalance criterion). TIMSS did not give any definition of bullying and just asked about various nasty experiences. The use of the term ‘bully’ and the imbalance of power criterion have been found to affect overall prevalence rates (Kert et al., 2010; Ybarra et al., 2012) but there is no evidence that they affect countries differentially.
Time reference period: Both EU Kids Online and TIMSS asked about experiences over the last year/twelve months; in contrast HBSC asked about the past couple of months, and GSHS about the last 30 days. While time reference period will affect overall prevalence rates, there is no reason to suppose it will affect country differences; indeed although EU Kids Online and TIMSS have the same reference period, their agreement is the lowest amongst the survey comparisons.
Types of bullying assessed: Although EU Kids Online had a main focus on online experiences, including bullying, the other three surveys did not have any mention of online bullying in the definition or types presented. This aspect was controlled for in Tables 3 and 4 by giving separate correlations with EU Kids Online for online and offline bullying (as well as total bullying). There are other more subtle differences. GSHS asks explicitly about various kinds of bias or prejudice-based bullying (race or colour; religion; sex; appearance). All four surveys mention social exclusion (in the definition, or types asked about), but none of them mention rumour-spreading, which is generally seen as a main form of indirect bullying (Smith, 2014). Such differences would only matter for country comparisons if countries differ in the kinds of bullying most frequently experienced. This has been found to be the case for some eastern countries such as Japan and South Korea, which do not come into our comparisons (Table 2); but has not been reported for the western countries that make up most of our country samples (Smith et al.,2016).
Frequency: the scale items for frequency vary slightly between surveys, GSHS being the most different. However it is possible to convert the GSHS scores to comparability, taking one or two days in the last 30 days to be equivalent to once or twice a month (‘lenient’ criteria), and three to five days or more in the last 30 days as about equivalent to once (or twice) a week (‘strict’ criteria). Given the very high within survey correlations across these frequency measures (internal validity), we doubt that these minor differences impact much on cross survey comparisons.
Linguistic issues: Finally, it is worth considering the language used in the questionnaires and particularly any word used for bully or bullying. This issue is usually ignored, and generally the language used in the questionnaire is not reported. But how words are translated will affect meaning, and translations may vary across surveys. This may matter less for factual, physical words like ‘stolen’ or ‘hit’, but words like ‘hurtful’ or ‘nasty’ may have shades of meaning that vary in translation. In particular this has been shown to be the case for words used to translate ‘bully’ (Smith et al., 2002). This would affect GSHS and HBSC which use the terms ‘bullying’ or ‘being bullied’ in their definitions, but do not report how they were translated. Some translations have been shown to pick up different sorts of behaviours from bullying. or example in Italian, ‘prepotenze’, a traditional term used, has a different meaning from ‘il bullismo’, now often used as a direct translation of the Englishterm.
In summary, while many issues affect these surveys and the kinds of prevalence rates they report, fewer seem to be strong contenders for explaining inconsistencies in cross-national differences. We believe the most important are: representativeness of sampling across a country; non-response rates; and the linguistic terms used to translate ‘bullying’. These might vary across countries in different ways in the four surveys. Other factors such as types of bullying assessed, as well as age range, date of survey, definition of bullying, time reference period, and frequency scale, will affect overall prevalence rates, but there is little evidence to suggest they would affect country rates differentially. Finally, the fact that EU Kids Online gave their survey face-to-face, unlike the other three surveys, might have differential effects if we consider countries to vary in willingness to divulge sensitive information to others (as for example in the long-term orientation dimension of culture proposed by Hofstede, Hofstede and Minkov,2010).
Implications of the Findings
Although these four surveys show evidence of internal validity, their levels of agreement between each other, where they have countries in common, is modest to zero or even negative. It is at best modest for TIMSS and HBSC, TIMSS and GSHS, and HBSC with EU Kids Online. It is around zero or negative for EU Kids Online and TIMSS. It was not possible to compare GSHS with HBSC or EU Kids Online due to lack of country overlap.
An obvious implication is that we should be cautious about judging how countries appear in terms of high or low prevalence rates for being bullied. This would be especially so if only one survey is relied on; claims for such differences would be more convincing if two or even three surveys agreed on a country’s relative position.
More research is also needed into why there is a lack of high agreement amongst the surveys. In future work we plan to also check levels of agreement on other aspects, such as rates of bullying others (as opposed to being bullied), gender differences, and age differences. At present it is difficult to pinpoint why agreement between EU Kids Online and TIMSS should be so low, even negative. Meanwhile, on the evidence so far, prime candidates for lack of agreement appear to be the representativeness of the samples nationally; non-response rates; and translation of words such as ‘bullying’.
There are implications too for future surveys. TIMSS, GSHS and HBSC really need to revise their definitions and examples to include online or cyberbullying, and (although this does not affect comparability) it is surprising that none of the surveys include rumour-spreading as an example of bullying. 1 More details of non-response rates (only easily available for GSHS), and how terms such as ‘bullying’ were translated into different languages (notably for GSHS and HBSC) would also be helpful in examiningcomparability.
Strengths and Limitations
Some strengths and limitations of our study should be noted. To our knowledge this is the first attempt to compare four data sources, all of which purport to give cross-national data on self-reports of being a victim of bullying. Furthermore, we have provided evidence that each survey does appear to be measuring something about cross-national victim rates reliably, given high measures of internal consistency or validity. However, for some of the comparisons between surveys, the number of countries in overlap is low. This is particularly so for comparisons between GSHS and TIMSS, with 9 and 12 countries overlap at TIMMS 4th and 8th grade; and for EU Kids Online and TIMSS 8th grade, which has only 10 countries to compare. However for other comparisons the number of countries overlapping ranged from 15 to 25. Another limitation is that for GSHS, matching for date of survey was not so good as for the other three surveys. There are also limitations to be considered as regards the surveys themselves; for example the validity of self-report questionnaires has been questioned (Cornell & Bandyopadhyay, 2010). However these issues are most relevant when we consider explanations for lack of agreement among thesurveys.
Footnotes
1
At the time of finalising this article, the HBSC Report with data from 2013/2014 was published (Inchley et al., 2016). This survey did contain two questions on cyberbullying, although only for being a victim and not a perpetrator. One question was on being cyberbullied by messages, the other on being cyberbullied by pictures.
Acknowledgments
We would like to thank Anke Görzig for comments, advice on sampling procedures used by EU Kids Online, and disaggregated data for Belgium (Flanders) as part of Belgium, and England as part of UK; Pierre Foy for advice on TIMSS; Antonella Brighi for sponsoring the visit by Barbara Marchi to Goldsmiths College; Robert Slonje for help with data collection; and Mike Griffiths for assistance with statistical analyses.
