Abstract
Missing data are a persistent problem in psychological research. Peer nomination data present a unique missing data problem, because a nominator’s nonparticipation results in missing data for other individuals in the study. This study examined the range of effects of systematic nonparticipation on the correlations between peer nomination data when nominators with various levels of popularity and social preference are missing. Results showed that, compared to completely random nominator missingness, systematic missingness of raters based on popularity had a significant impact on the correlations between various peer nomination variables. Systematic missingness based on social preference had a smaller impact. These results demonstrate varying (and potentially large) effects of systematically missing nominators on studies using nomination data. It is important that researchers using peer nomination data explore whether nominators are missing in any sort of systematic way and include these results as part of each study. Future research into the nature of systematic nominator missingness could make it possible to use advanced methodologies, such as multiple imputation, in an attempt to minimize the issues associated with systematic missingness.
Introduction
In recent decades, adolescent peer relationships researchers have noted that it is becoming increasingly difficult to obtain high participation rates in peer nomination research. Factors such as institutional review board (IRB) demands for active consent procedures (Brown & Larson, 2009), student absences from school (Crick & Ladd, 1989), and political or economic pressure to maximize in-class time toward improving achievement test scores (Marks, Babcock, Cillessen, & Crick, 2013), have complicated collecting sociometric data from a large proportion of students in a given classroom or grade. Indeed, one discouraged researcher recently wrote that “logistical issues render it essentially impossible to obtain … high participation rates among adolescents” (Fournier, 2009, p. 1154).
Low participation rates are especially problematic in peer nomination research when participants with particular traits or behavioral tendencies are underrepresented in the sample. Nonparticipation may occur for a number of reasons, including school absences, lack of parental consent, or lack of assent by the individual. Unfortunately, previous research has shown that nonparticipation in sociometric research is often systematic. Noll, Zeller, Vannatta, Bukowski, and Davies (1997), for example, found that children and adolescents whose parents did not provide consent were lower in sociability/leadership, were higher in aggression/disruptiveness, were less academically successful, were less liked, and had fewer friends than peers who had received parental consent. Fournier (2009) found that nonparticipating adolescents received lower ratings of social status and peer acceptance compared to participating adolescents. These types of studies indicate that students are more likely to participate in sociometric research if they are more socially and academically well-adjusted. These differences mirror nonparticipation findings in other areas of school-based research, which tend to find that children and adolescents who receive active consent from parents are White, more academically successful, and come from more socially and economically affluent families (Detty, 2013). Given that systematic missingness of nominators exists in many studies, the current study aimed to explore the effects of systematic participant missingness on the validity of peer nomination data by simulating different types and amounts of systematic nonparticipation from a large dataset.
Problems of Systematic Nonparticipation
The psychometric problems associated with systematic participant missingness in psychological and clinical research across numerous contexts are well-documented (Schafer & Graham, 2002). Missingness in peer nominations, however, is even more complex than the missingness issues typically conceptualized in social sciences research. Beyond reduction in power (Liu & Salvendy, 2009) and the statistical bias created by systematic missingness (Schafer & Graham, 2002), validity is compromised both directly and indirectly by systematic participant missingness with peer nominations. Unlike other survey research, peer nomination nonparticipation does not necessarily directly affect sample size and power. Many studies allow nominators to choose all of their classmates or grade mates as nominees (even those who do not provide nominations; see Noll et al., 1997, for a discussion). In these cases, analyses involving peer nomination counts will have the same N regardless of the number of participants who actually provided nominations, because N is based on the number of nominees and not the number of nominators. However, nonparticipation affects statistical power indirectly—because each nominator provides a choice or nonchoice for each nominee, a higher number of nominators provide a higher amount of systematic variance.
Direct effects of systematic nonparticipation on validity are common to all survey research. Researchers cannot come to accurate conclusions about the generalizability of findings if certain types of participants are not studied. In peer nomination research, for example, it is difficult to measure the role of rejection in peer relationships if rejected children and adolescents are less likely to be represented in the sample (see Noll et al., 1997).
In addition to such problems of external validity, systematic nonparticipation also impacts the internal validity of peer nomination measures. Because each nominator is providing information for all nominees, systematic nonparticipation affects nomination counts received by other participants. For example, if socially rejected children are likely to be friends with each other, having fewer rejected nominators means that rejected nominees will receive fewer friendship nominations and will be inaccurately viewed as having fewer friends than they actually have. For another example, if adolescents labeled as “bistrategic” (i.e., those high in both aggression and prosocial behavior; Hawley, 2003) are acting prosocially toward their popular friends and aggressively toward their unpopular peers, systematic missingness among unpopular participants may result in an underestimation of bistrategic adolescents’ levels of aggression.
Despite the problems that systematic nonparticipation in peer nomination research can potentially cause, no study has investigated the effects of systematic participant missingness on the validity of peer nominations. Gerrits, van den Oord, and Voogt (2001) did attempt to compare intercorrelations among nominators to intercorrelations among non-nominators in a sample of young children. However, this methodology is not a direct test of the effects of systematic nonparticipation, as the lack of participants’ data affects both those who participated as nominators and those who did not participate as nominators.
In summary, research has shown that nonparticipants in social science research often differ systematically from participants. There has been research into the effects of systematic missingness upon research where a person missing from the study implies that the same person’s measurements are missing. There has not, however, been a systematic missingness study for peer nomination-types of data, wherein a given person missing from the study implies that a different person’s nomination count is affected.
The Current Study
The current study aimed to explore the effects of systematic missingness on the validity of peer nomination data by simulating different types and amounts of systematic nonparticipation from a large dataset. This study simulated four forms of systematic nonparticipation by removing various proportions of the most popular, least popular, most preferred, and least preferred adolescents from a larger sample of nominators. Given that this research focused on four types of participant missingness and simulated extreme systematic missingness, the purpose of this study was to show how systematic missingness can potentially affect the validity of peer nominations, not to show how much systematic missingness actually affects peer nominations in any particular study. The nature of missingness makes it difficult to directly assess how systematic nonparticipation impacts any single sample without simulation. This study used popularity and social preference as the basis for missingness because: (a) these variables are very commonly assessed in peer nomination research; and (b) previous studies have shown that nonparticipants in social research are typically lower in social adjustment than respondents (Fournier, 2009; Noll et al., 1997).
The measures of validity in the current study were the intercorrelations between eight commonly-assessed peer nomination variables (popularity, social preference, friendship, overt aggression, relational aggression, overt victimization, relational victimization, and prosocial behavior). Systematic nonparticipation is not the only type of missingness that can affect peer nominations. Random nonparticipation reduces the internal reliability of peer nomination measures (Marks et al., 2013), which is to say that fewer nominators tend to produce less reliable peer nomination measures (just as exams with fewer questions tend to produce less reliable exam scores). This lower reliability restricts the possible bounds of correlations between variables (Liu & Salvendy, 2009; Spearman, 1907), so this study did not compare systematic nonparticipation to the full sample. This study instead compared the systematic nonparticipation correlations to correlations under conditions of complete random missingness at the same participation rates. For example, in looking at the effects of removing the most popular 10% of participants from the sample of nominators, the proper comparison data would be a group in which 10% of nominators were removed completely at random. The null condition in this study was, thus, the distribution of correlations under missing completely at random conditions.
Overall, the first goal of this study was to quantify how many correlations were significantly affected by systematic nonparticipation at various rates of participant missingness. The second goal was to explore the magnitude of the effects of systematic nonparticipation on the correlation coefficients.
Method
Participants
Data were analyzed for 1,630 adolescents (M age = 13.15 years, SD = 0.78; 50.4% boys) participating in the seventh wave of the Nijmegen Longitudinal Study (NLS) on Infant and Child Development. This study began with a community-based sample of 129 one-year-old infants, and the much larger sample size in this study represents the full classrooms to which the NLS participants attended at the time of data gathering. The NLS students’ classmates needed to be in the study in order to obtain valid peer nomination measures for the NLS students themselves. Participants were in 32 7th grade and 31 8th grade classrooms (class size M = 26.45, SD = 3.54, range 15–31); 7th and 8th grades represent the 1st year and 2nd year of secondary education in the Netherlands. The majority of the participants were native Dutch (96.2%) or had parents who were both born in the Netherlands (84.4%). Participants were recruited using a passive consent procedure, in agreement with the policies of the schools and as approved by the IRB.
The sample included 1,512 nominators. 87 participants (5.75% of the full sample) were absent on the day of data collection, and 2 participants did not receive consent to participate from their parents. Another 29 participants did not give any nominations across any of the variables involved and, therefore, were treated as missing for the purposes of this study.
Measures and Procedure
Participants completed measures on netbook computers. Each sociometric question was presented on a separate screen at the top of the page, followed by a roster with the names of all classmates. Participants could nominate classmates by clicking on their names. The order of names was randomized for each nominator, but kept constant across questions. Participants could name an unlimited number of peers of either sex for each item (for more details, see van den Berg & Cillessen, 2013).
The current analyses focus on eight sociometric constructs assessed by 19 items: popularity (“who is most popular?” and “least popular?”; Cronbach’s α M = 0.94, Cronbach’s α SD = 0.02); social preference (“who do you like most?” and “like least?”; α M = 0.70, α SD = 0.21); friendship (“who is your number one best friend?” and “who are your other best friends?”; α M = 0.23, α SD = 0.33); overt aggression (3 items; e.g., “who hits, kicks, or bullies others?”; α M = 0.95, α SD = 0.03); relational aggression (2 items; e.g., “who gossips about others?”; α M = 0.85, α SD = 0.07); overt victimization (3 items; e.g., “who is bullied?”; α M = 0.95, α SD = 0.04); relational victimization (2 items; e.g., “who is neglected or excluded by others?”; α M = 0.94, α SD = 0.05); and prosocial behavior (3 items; e.g., “who cooperates with others?”; α M = 0.81, α SD = 0.09). Internal reliability of each variable was assessed within each classroom using the “pasting” procedure outlined in Babcock, Marks, Crick, and Cillessen (2014) for use with multi-item peer nomination measures. Means and standard deviations of alphas were calculated across classrooms.
The popularity and social preference items were keyed in the opposite direction where applicable (i.e., a “most popular” nomination was coded as 1 and a “least popular” nomination was coded as −1; nonchoices were coded as 0). All other nominations were keyed in the same direction (i.e., all nominations coded as 1; nonchoices coded as 0). Raw scores for each item for each person were calculated by summing the nominations as keyed above, which functionally added and, at times, subtracted nomination counts, depending on the direction the nomination item was keyed. The scores were then standardized using within-classroom z-score transformations (i.e., subtracting a given classroom’s mean raw score, then dividing by the classroom’s standard deviation for each construct).
Nominators Missing from the Initial Dataset
This study had a participation rate of 93%. Although this is a high participation rate within the context of the sociometric literature, one should not ignore the potential systematic differences between those who were missing and those who were present as nominators in the study. Compared to study participants that nominated their peers, the missing nominators had significantly different mean z-scores at α = 0.01 for two variables: social preference; and relational aggression. Missing nominators were significantly lower in social preference (M nominators = 0.02, M missing = −0.25) and higher in relational aggression (M nominators = −0.02, M missing = 0.27) than those who nominated peers. The sizes of these effects would be considered small by traditional standards of interpreting Cohen’s d, as the total group standard deviation was almost exactly 1.
Baseline of Random Missingness
In order to have a null condition baseline for comparison, this study created a distribution of correlational results under random missingness for comparison with the systematic missingness conditions. The bootstrapping algorithm removed between 5% and 95% of nominators in increments of 5%. At each percentage level, there were 5,000 repetitions in order to build a distribution for each of the 28 correlations under conditions of complete random missingness (Schafer & Graham, 2002). The remaining participants’ nominations went into the re-calculation of the raw nomination scores and the corresponding within-classroom z-scores. This made it possible to calculate the correlation matrix between the eight peer nomination variables and, thus, bootstrap a null distribution of missing completely at random correlations. The authors used this approach in order to make null hypothesis decisions that were free from many of the statistical assumptions of more traditional statistical tests, which this sort of simulation study likely violates (Howell, 2007).
Systematic Missingness
The two variables used to determine which participants were missing were popularity and social preference. The removal conditions for systematic missingness were based on removing either the participants most nominated or the participants least nominated, for a total of four missingness types. There were occasionally ties in levels of popularity or social preference. A random selection algorithm broke all tied scores.
The full-sample within-classroom z-scores were the basis for systematic missingness removal. The selection algorithm removed the nominations given by those with the highest or the lowest z-scores (depending on condition) in the full sample of participants pooled across classrooms. Raw scores and the within-classroom z-scores for nominations received were recalculated for each of the eight variables, and finally an 8 × 8 correlation matrix between all variables was generated. Levels of systematic missingness were based on the same percentages of missing nominators as the baseline random missingness (i.e., in 5% intervals from 5% to 95%).
Results
Statistical Significance of Effects
In order to understand the effects of the systematic removal of participants, there was first an examination of the effects of random removal. The bootstrapped distributions of correlations observed under random missingness determined the range of effects that random nonparticipation had on the correlations. This study used distribution limits corresponding to the 0.005 and 0.995 quantiles, which represented the range of 99% of the correlation coefficients at a given level of nonparticipation based on the 5,000 replications of random missingness at each level of nonparticipation. Using such an interval allows for the inference that correlations outside this interval at a given level of nonparticipation are less than 1% likely to have occurred by chance due to random missingness. In other words, a systematic missingness correlation that is outside of the null distribution interval is significantly different from a correlation due to random nonparticipation (p < 0.01). This technique bootstraps a distribution of the statistic under a null condition, thus allowing for a statistical test of the null hypothesis that systematic nonparticipation affects data equally to random nonparticipation. Random removal of candidates curvilinearly decreased the absolute value of correlations as the proportion of the sample removed increased, except in cases where the full sample correlation was already near zero, in which there was no mean effect across random missingness proportions. The width of the corresponding distributional confidence intervals increased as the proportion of missingness increased. The intervals were not necessarily symmetric around the null distribution due to the nonsymmetric nature of the distribution of correlations (Fisher, 1921).
In contrast to the effects of random nonparticipation, the effects of systematic removal of nominators based on popularity and social preference varied greatly. Figure 1 illustrates four examples of the magnitudes of these changes. The example graphs for removal based on popularity, which are in the top two panels, are examples of some of the largest effects in the study and illustrate the diversity of the effects of systematic missingness. The bottom two correlation graphs for removing based on social preference are the same correlations as selected for the popularity-based removal graphs in order to give the reader an idea of the differences in the magnitude of effects between removing based on popularity versus social preference. Systematic missingness of both the least popular and the least preferred nominators increased the size of the correlations between popularity and social preference, as seen in the left two panels of Figure 1. Systematic removal of the most popular and the most preferred nominators, in contrast, decreased the size of the correlations between friendship and popularity as seen in the right two panels. There were instances where systematic missingness changed the sign of the correlation, as shown in the top right panel of Figure 1 for the correlation between friendship and popularity when removing the most popular nominators. There were additional cases for removal based on popularity in which near-zero full-sample correlations became statistically significant nonzero correlations when removing systematically. For example, the correlation between friendship and relational aggression (not shown in figures) was 0.05 when using the full sample, but reached a value of −0.34 when removing the most popular nominators.

Four examples of the effects of random and systematic missingness on correlation coefficients.
Removing raters based on popularity generally had stronger effects on the correlational patterns than did removal based on social preference. The two strongest individual effects for popularity-based removal were, coincidently, correlations that involved popularity. Many of the other correlations of other variables with popularity, however, displayed small or nonsignificant differences from random removal. Systematic removal based on a given variable does not necessarily guarantee significant changes in all of that variable’s correlations beyond the changes observed in random removal. Most of the large effects outside of the top panels in Figure 1 were for correlations that did not involve the variable being removed systematically. Examining the full set of individual graphs from this study revealed that there was wide variation concerning how systematic missingness affected correlations between peer nomination variables.
In order to clarify trends in the effects, Figure 2 provides a summary of the number of systematic missingness correlations (out of the maximum of 28) that were outside of the 99% distribution limits for completely random missingness. Systematic missingness based on popularity had more statistically significant effects than systematic missingness based on social preference. At most levels of popularity-based missingness, over half of the correlations were significantly different from the null condition of random missingness. Removal based on social preference significantly affected fewer correlations, though there were select levels of nonparticipation where over half of the correlations were significantly different from random missingness.

Number of systematic missingness correlations that were outside of the 99% confidence interval for missing completely at random.
Figure 2 also shows how different the curves are for the popularity-removed correlations compared to the preference-removed correlations. The shapes of the most and least preferred removal lines indicate that removing the least preferred individuals has a more significant impact at lower levels of removal than removing the most preferred individuals. The number of significant differences when removing the least preferred individuals spikes early, and then generally decreases. The number of significant differences when removing the most preferred, however, generally increases as the removal algorithm removes more and more of the least preferred people. In contrast, removing the most popular and the least popular individuals had comparable impacts in terms of significant differences from random removal.
Magnitude of Effects
In addition to the number of significant differences between systematic and random removal, it is also important to examine the sizes of the correlation differences between systematic and random removal; Figures 3 and 4 display statistics for the absolute differences between random removal and systematic removal for popularity and social preference, respectively. The maximum and minimum statistics in these figures are the maximums and minimums of the absolute values of the differences between the systematic removal correlations and the mean of the random removal correlation distributions across the 28 correlations. This can be expressed mathematically for the maximum statistics as:

Absolute correlation difference statistics between the mean of the 5,000 random removal replications and systematic removal based on popularity.

Absolute correlation difference statistics between mean random removal and systematic removal based on social preference.
Social preference removal did not yield differences as large as those produced by removing based on popularity. The maximum difference from random missingness did not exceed 0.1 until after 40% of participants were missing. Combining these results with the results from Figures 1 and 2, one can see that the effects of systematic missingness have large variance. While systematic missingness may have small effects in certain situations, it can have large effects in other situations. These effects can be large enough to change a sizable positive correlation into a sizable negative correlation, as seen in the upper-right panel of Figure 1.
Discussion
The goal of this study was to investigate the effects of systematic nonparticipation on correlations derived from peer nomination measures. First, we compared systematic nonparticipation to the null condition of random nonparticipation to determine how many correlations systematic nonparticipation impacted significantly. Second, we calculated the absolute deviations for correlations under conditions of systematic nonparticipation. Results showed that different types of nonparticipation had distinct effects on correlations. Certain types of systematic nonparticipation had significant effects on over half of all correlations, and to a nontrivial magnitude, even in relatively low levels of nonparticipation.
Although the correlations of interest in this study were between eight commonly assessed peer nomination variables, we limited the scope of analyses to four types of missingness: removal of the most popular, least popular, most preferred, and least preferred nominators. The different types of missingness had unique effects on correlations. For preference-based missingness, removing the most preferred nominators significantly affected more correlations than removing the least preferred nominators. For popularity-based missingness, however, removal of both the most popular and least popular nominators significantly affected numerous correlation coefficients. This variation makes the effects of systematic missingness difficult to predict, which is why researchers should be diligent in searching for systematic missingness.
Even at a rate of only 5% nonparticipation, removing either the most or least popular nominators resulted in a statistically significant change in the coefficients of just under half of the intercorrelations between peer nomination variables. Those coefficients differed by as much as 0.05 at that level of missingness. That size of deviation of the correlation coefficients was the exception rather than the rule (the mean effect on coefficients at 5% missingness was 0.01), but the fact that nontrivial deviations can occur at low levels of nonparticipation is concerning. These differences become more problematic as participant missingness increases; at 20% nonparticipation, for example, the mean effect of removing the most and least popular nominators was about 0.05, and the maximum effect was 0.25.
It is worth noting that the differences in the magnitudes of correlations under systematic nonparticipation were not the full correlation changes for systematic missingness, but rather were computed in comparison to random nonparticipation at the same participation rates. Random nonparticipation decreased the absolute value of all correlations curvilinearly. When compared to an “ideal” correlation coefficient obtained from a full sample, effects of systematic nonparticipation will be additive with effects from random nonparticipation, making the overall effects of systematic nonparticipation even more difficult to predict.
A question left open by the results of this study is how much the findings apply to any other peer nomination research. This study deliberately simulated very focused and extreme types of missingness; it is unlikely that only the most popular or only the least popular adolescents would be missing from a sample. Previous studies have shown that participants and nonparticipants in social science research differ on multiple characteristics (e.g., Detty, 2013; Noll et al., 1997), each of which might be associated with differences in nominations. Thus, even if a sample does not include extreme systematic nonparticipation based on a single variable, real levels of systematic nonparticipation across multiple variables may interact to be just as problematic. In addition, there is little reason to believe that popularity is the only or most impactful variable for missing nominations. Future research should explore these issues.
One limitation of the current study was that it examined the effects of systematic nonparticipation in a sample that was itself affected by systematic nonparticipation. Although 93% of students in the sampled classrooms provided nominations, those who did not (mostly due to school absences during data collection) were significantly less socially preferred and more relationally aggressive than those who did. These effects were small in magnitude, but they may have contributed to the fact that missingness based on social preference had less impact than missingness based on popularity. This limitation also highlights the fact that systematic nonparticipation can occur even in studies with very high participation rates.
Another limitation of this research is that it examined missingness based only on only two variables. The intent of the study was to demonstrate the potential for the effects of systematic missingness, not necessarily the magnitude of the actual effects in any one study. The underpinnings of potential systematic missingness on peer nomination studies are surely more complex than this study simulated. Future research should more thoroughly study the underlying mechanisms of systematic peer nominator missingness, which would be valuable additions to the findings of studies such as Fournier (2009) and Noll et al. (1997). Future studies could use this information to simulate the effects of systematic missingness with more complex models.
A final important limitation to this study is that it used simulation to artificially introduce systematic missingness. The best way to study missingness would be to gather data, find those who are missing, gather data from the missing individuals with additional solicitations, and directly study the differences from the initial participants. While ideal, this method has the disadvantages of both requiring a great deal of resources and potentially never being able to gather data from certain individuals, thus limiting the power to fully detect and study differences.
This study focused on the general trends in the effects of nonparticipation on the validity of peer nominations across a large set of intercorrelations. We did not analyze which specific correlations were affected by nonparticipation. Such analyses were beyond the scope of the current study and would have involved qualitatively interpreting 112 graphs of the kind presented in Figure 1. This is, however, a potential topic for future research and highlights the fact that participants may vary systematically in terms of the configurations of their peer nominations across variables.
One possible solution for the missing nominator issue could be the use of the multiple imputation technique (Schafer & Graham, 2002). To use this technique, the researcher would analyze the data available for participants whose nominations are missing. The researcher would then construct probabilistic models for how each missing nominator would nominate the various classmates. One would fill in these gaps some large number of times, thus creating numerous imputed–complete datasets. The researcher would then conduct analyses on each of the imputed–complete datasets, and the mean of the many statistical analyses could be the researcher’s final analysis estimate. Future research should test the efficacy of this technique in peer nomination data.
Beyond the need for future research on systematic missingness, the results of the current study lead to two concrete recommendations for peer nominations’ researchers. First, this study joins previous research (Marks et al., 2013; Marks, Babcock, & Cillessen, 2015) in showing that low participation rates can affect the results of peer nomination studies, and in encouraging researchers to collect data from the maximum number of participants possible within a given context. Second, it is important that researchers assess whether nominators and non-nominators significantly differ in measurable demographic, social, or intellectual ways, and that they report the results of such assessments (even if there are no significant differences). Only by assembling a corpus of knowledge on systematic nonparticipation can researchers properly respond to its effects by reporting, gathering more data, or using advanced methodology, such as the multiple imputation technique, to minimize its potential effects.
Footnotes
Author note
The views and discussions presented in this research are not necessarily the official views of the American Registry of Radiologic Technologists.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
