Abstract
Online purposive samples have unknown biases and may not strictly be used to make inferences about wider populations, yet such inferences continue to occur. We compared the demographic and drug use characteristics of Australian ecstasy users from a probability (National Drug Strategy Household Survey, n = 726) and purposive sample (online survey conducted as part of a mixed-methods study of online drug discussion, n = 753) using nonparametric (bootstrap) and meta-analysis techniques. We found significant differences in demographics and drug use prevalence. Ideally, online purposive samples of hidden populations should be interpreted in conjunction with probability samples and ethnographic fieldwork.
Introduction
Due to the illegal and stigmatized nature of drug use, researching drug use and harm requires accessing hidden populations. Some scholars assert that stigmatized practices are best addressed using ethnographic approaches, where the research team participates in the cultural contexts of the target group over time, developing trust and rapport and understanding emic cultural logics (Moore 2005; Peterson et al. 2008; Sifaneck and Neaigus 2001). While ethnographic approaches are vital, they do not provide externally valid large-scale quantitative data. For this purpose, the gold standard is the probability sample. By employing inferential statistics, probability sampling allows estimates of population prevalence to a given degree of confidence, given particular assumptions are met (Dillman 2007; Kakinami and Conner 2010; Lohr 2010). However, using probability sampling methods in the investigation of hidden populations is extremely limited.
First, response rates for general population surveys are decreasing (Groves 2006), more so when hidden populations are the target. A response rate of less than 50% seriously decreases confidence in relying on inferential statistics to adjust for nonresponse bias. Voluntary bias distorts the resultant weighted data. For example, the data from young males who complete a household survey will be weighted more heavily to make up for the lower participation rates of their cohort, yet the young males that respond may have very different characteristics to the young males who do not respond. This kind of systematic bias cannot adequately be adjusted for by demographic weightings (Caetano 2001; Zhao et al. 2009).
Second, subsamples of a hidden population may be systematically excluded from probability survey sampling frames used in household surveys (i.e., coverage error). Lower-income, transient, and young populations are systematically excluded (Zhao et al. 2009).
Third, probability sampling methods are expensive (Fricker 2008; Kakinami and Conner 2010), especially when less-prevalent behaviors are targeted (Gile and Handcock 2010): If only 1% of the population are regular ecstasy users, then it would likely require a probability sample of 20,000 people to produce a sample of 100 regular ecstasy users, assuming a 50% response rate.
An example of the kinds of problems occurring in gold standard probability surveys can be found in the Australian National Drug Strategy Household Survey (NDSHS), used presently as comparative data. The NDSHS has a low response rate: 49%, 46%, and 50% for the years 2007, 2004, and 2001, respectively (Australian Institute of Health and Welfare [AIHW] 2002, 2005, and 2008). Although the NDSHS was weighted, this weighting cannot account for specific patterns of drug use that were not accessed using household probability sampling—that is, weighting is only based on sociodemographics, not drug use patterns (Caetano 2001; Zhao et al. 2009). It is not known whether the response rate indicates the systematic underreporting of drug use in the NDSHS. Furthermore, the young and transient, and mobile phone–only groups, who are not easily accessed through the household sampling frame, are also likely to report different drug use patterns (Livingston et al. 2013).
Online Purposive Sampling
The changing context of conducting social research has combined with the limitations of probability sampling methods to necessitate new approaches to gathering externally valid data on hidden populations (Groves 2011). Purposive sampling is a method of understanding hidden populations with a long history in the drugs field (e.g., Braunstein 1993; Peterson et al. 2008; Sifaneck and Neaigus 2001). Purposive sampling relies on the researchers’ situated knowledge of the field and rapport with members of targeted networks. Purposive sampling methods of hidden populations traditionally produced small samples and were associated only with qualitative research. However, in the last decade, purposive sampling of hidden populations through Internet recruitment and survey methods has become increasingly popular among researchers who, at a relatively low cost, use it to engage large samples of people who are otherwise difficult to access, such as illicit drug users (Fricker 2008; Miller and Sønderlund 2010). For example, a sample of 100 regular ecstasy users could be reached online for a mere fraction of the costs of employing probability sampling (e.g., Barratt and Lenton 2010; Miller et al. 2007). 1
Although the large samples achievable through online research methods lend themselves to quantitative analyses, the problem remains that the external validity of results arising from (online) purposive samples is unknown (Couper 2000; Lohr 2010). In much of this research, inferential statistics are applied to purposive sampling, with the caveat that the results cannot be inferred or generalized beyond the sample itself (see Williamson [2003] for more detailed discussion). Despite the caveats, these data are indeed often used to infer information about the wider population of drug users in the absence of any other corroborating information.
Balancing Rigor and Practicality
We have argued that inferring to a wider population from a purposive sample is problematic. Yet this practice continues, given the various limitations of probability sampling for hidden populations. Rather than considering all research based on purposive sampling as fully ungeneralizable, we explore the potential to use population sampling as a complement to purposive sampling in order to improve generalizability. We formalize what is already happening in the field and explore how it might be done more effectively, while being mindful of the real-world constraints within which researchers operate. This is an exercise in attempting to balance rigor with the practicalities of doing research, especially with hard-to-reach populations (see Crosby et al. 2010).
It is also important to explain what we are not trying to do. We do not suggest that a purposive sample can be used to estimate the prevalence of behaviors in a wider population. As explained by Lucas (2014), the social world is inherently “lumpy,” and some findings will apply differentially to some subsamples and not to others. We concur with Crosby and colleagues (2010) that the external validity of results gleaned from purposive samples of hard-to-reach populations is relevant, not simply with regard to prevalence estimates but also in relation to describing the relationships between variables among samples. Although researchers using purposive samples may continue to qualify their results to the “sample-at-hand” and studiously avoid making generalizations beyond this, we accept that at some point, researchers, or research consumers, often extrapolate findings beyond the sample studied. For that reason, we believe it is important to develop methods that “address the threat” of a “lumpy social world” (see Lucas 2014) by using representative data to support that collected by purposive methods.
This article tests how a purposive sample of ecstasy users recruited online and a probability sample of ecstasy users recruited from households differ by demographic and drug use characteristics. Unlike in previous studies (reviewed subsequently), our samples were matched by frequency of use and location of residence. We also use a more appropriate statistical approach to compare these samples by utilizing the nonparametric bootstrap method to estimate confidence intervals around the purposive samples (also reviewed below). Furthermore, the purposive sample was part of a larger mixed-methods project informed by virtual and multisited ethnography. In the discussion, we reflect on the value of situating surveys of purposive samples into a framework of emic knowledge to better understand the sample’s representativeness (see Guarte and Barrios 2006).
Representativeness of Ecstasy User Samples
Two comparisons between probability and purposive samples with regular ecstasy users have been undertaken (Miller et al. 2010; Topp et al. 2004), and we see our article as extending this work.
Topp et al. (2004) compared a purposive sample of regular ecstasy users interviewed face-to-face with two probability samples of regular and recent ecstasy users derived from the NDSHS. When demographic and drug use variables were compared, substantial concordance was found across samples.
In an extension of Topp et al.’s (2004) research, Miller et al. (2010) compared data from the first published online survey of Australian ecstasy users with both a purposive face-to-face sample and a probability sample (pencil-and-paper and telephone modes), also derived from the NDSHS. The web sample was significantly younger and less likely to report the recent use of other drugs (although more likely to report recent use of Gamma-Hydroxybutyric acid [GHB]) compared with the probability sample. While Miller et al. (2010) concluded that “the data provided by these three samples of ecstasy users converged” (p. 444), differences in how the samples were constructed limited interpretation of the results. The web sample was not matched with the other samples by frequency of ecstasy use, and the probability sample was not matched with the other samples by locale. For example, the differences in GHB use likely reflect elevated GHB use in Melbourne compared with other parts of Australia (see Stafford et al. 2005), rather than any differences in sampling type or survey mode. (See Table A, online supplementary file, for a tabulated summary of the methods and results of these two studies.)
Statistical Approaches with Purposive Samples
It is typical to find many examples of market research (e.g., Bock and Sargeant 2002; Smith and Albaum 2005) and studies of hidden populations (e.g., Braunstein 1993; Hathaway 2004) where purposive, nonprobabilistic sample methodologies have been used. Alongside this growing trend, statistical approaches that attempt to account for bias have been improving. For example, recent publications have begun exploring the utility of respondent-driven sampling (RDS; see Gile and Handcock 2010), a purposive sampling methodology, to recruit samples of MDMA/ecstasy users (Wang et al. 2005).
When samples of a study population are recruited through purposive or convenient designs, such as RDS or the purposive sample used in this study, it is considered inappropriate to draw inferences from the sample to a greater population. Notably, recent research (Heckathorn 2011; Salganik 2006; Salganik and Heckathorn 2004) has demonstrated how RDS can produce unbiased prevalence estimates from the sample and then use bootstrap methods to produce confidence intervals of these estimates. Guarte and Barrios (2006) adopt a similar approach to Salganik (2006; Salganik and Heckathorn 2004) to illustrate the benefits of bootstrapping confidence intervals from prevalence estimates drawn from purposive sampling data to estimate population parameters.
Bootstrapping is a statistical procedure that provides a way of estimating standard errors and other statistical parameters of interest drawn from the sample data available. Bootstrap confidence intervals treat the purposive sample as the population of interest by taking random samples from that population to estimate confidence limits (Rodgers 1999). According to Adèr et al. (2008), bootstrap procedures are recommended when the theoretical distribution of a statistic is unknown or complicated. An advisory panel for online public opinion survey established for the Canadian government (Public Works and Government Services Canada 2008) established that bootstrap procedures should be implemented to assess the statistical properties of nonprobability data and compared to any available data drawn from probabilistic samples. In essence, bootstrapped results produce unbiased estimates from the available sample; but this does not guarantee that the sample of interest is not inherently biased, given the nature of the sampling.
Aim of Current Study
This article extends the work of Topp et al. (2004) and Miller et al. (2010) by comparing samples of both regular and occasional ecstasy users recruited online with randomly selected probability samples from the NDSHS. We are interested in testing the external validity of online purposive samples, given that their use is increasingly common among hard-to-reach groups.
Our analysis differs from these previous articles in the following ways:
Unlike Miller et al. (2010), our samples are otherwise matched by frequency of use and location of residence.
Appropriate nonparametric statistical tests are used with the purposive sample, including bootstrapping (Miller et al. (2010) used parametric statistical analysis with a purposive sample, whereas Topp et al. (2004) used descriptive statistics).
Our online purposive sample is grounded in multisited and virtual ethnography.
Methods
Data Sources
Online purposive sample
A web survey, conducted in 2007 as part of the first author’s PhD, explored the use of online forums by people who use psychostimulants and hallucinogens (party drugs). The survey was part of a mixed-methods project guided by virtual and multisited ethnographic methods, involving participation and observation of Internet forums where drugs were discussed (Barratt and Lenton 2010). Multisited ethnography is spatially decentered, tracing networks of people rather than constructing place-based boundaries around fieldwork sites (Falzon 2009). Virtual ethnography occurs in online spaces and is, by the nature of online spaces and networks, also multisited (Hine 2008). Most of the 837 respondents (74%) reported finding out about the study through a “thread in online forum”; 19% reported being “referred via e-mail/through Internet”; 6% “saw the link on a social networking site”; and 2% were “referred by word-of-mouth (offline).”
No reimbursements or other incentives were offered to survey participants. Offering reimbursement or prizes provides motivation for multiple responding (Bowen et al. 2008) and involves taking identifying information from participants. Instead, we emphasized the anonymous nature of participation in our study, which did not collect Internet protocol (IP) addresses or e-mail addresses, which could potentially be used to identify individuals. Although we cannot identify respondents who may have taken the survey more than once from the same computer, the lack of financial incentive and the 15-minute time commitment reduced the motivation for respondents to deliberately respond more than once. The data were also inspected for duplicate entries (same data submitted at a similar point in time), and none were found. Although we did not collect IP addresses within the data set, we did use Google Analytics to conduct traffic analysis on the uniform resource locator (URL) used to promote the survey. These data indicate that 95% of the website visits over the study period were unique visitors, suggesting that multiple responding from the same computer could not have grossly affected the data.
The final sample consisted of 837 Australian residents aged 16 and over who reported recent (last 12 month) use of party drugs and recent (last six month) participation in online drug discussion. For this article, two subsamples were drawn: occasional ecstasy users (n = 328) who reported ecstasy use from one to five times in the last six months, and regular ecstasy users (n = 425) who reported ecstasy use monthly or more often in the last six months.
Probability sample
The 2007 NDSHS was the ninth iteration of a series of surveys asking Australians about their knowledge of and attitudes toward drugs and their history of drug consumption and related behaviors (AIHW 2008). It used a multistage, stratified area random sample design, with households selected from within individual census collection districts within each geographic region of Australia. Two survey modes were used: Drop and Collect and Computer Assisted Telephone Interviewing (CATI). A total of 23,356 people successfully completed the survey, mostly via the Drop and Collect mode (n = 19,818; 85%). The overall response rate was 49%. Design weights provided by the AIHW were used to adjust for imbalances arising in the design and execution of the sampling. For comparative analysis with the online sample data, two subsamples were drawn from those between 16 and 60 years: occasional ecstasy users (n = 546), who reported ecstasy use “every few months” or “once or twice a year” in the last 12 months, and regular ecstasy users (n = 180), who reported ecstasy use “about once a month,” “once a week or more,” or “every day” in the last 12 months.
Measures
Most survey items were comparable between the two samples; however, there were some differences. Recent drug use in the NDSHS referred to the last 12 months, whereas the online survey measured use in the last six months. We acknowledge that this difference biases our comparisons toward underreporting drug use prevalence in the online sample due to the shorter duration defined as recent use. However, this was counter to the likely difference between the two samples, so this was less of a concern than if the bias were reversed.
There were also differences in the way drug types were described: The online survey asked specifically about three different types of hallucinogens, a category that was aggregated in the NDSHS, and frequency of drinking alcohol had additional response categories in the NDSHS that were collapsed into “weekly or more often” to match the online survey item. The items used to derive recent and regular illicit drug use were sufficiently similar to be considered comparable (see Appendix), and any differences have been taken into consideration in our interpretation of the results. Readers can view the NDSHS questionnaire in the First Results report (AIHW 2008) and the online survey is available by contacting the first author.
Analysis
Means for continuous variables or percentages for categorical variables are presented for each sample. Stata 11.1 was used to estimate confidence intervals. Design weights and weighted numbers and percentages were reported for the NDSHS data. Nonparametric bootstrap confidence intervals were estimated around comparable estimates and proportions from the purposive sample. Subsampling was repeated 250 times to generate each estimate. While bootstrap confidence intervals are better suited to estimating confidence intervals using the convenience sample as the population, standard linearized confidence intervals (not shown) also produced very similar results. To compare the results from the two samples directly, we undertook an analysis using techniques typically applied in meta-analysis studies (Sterne 2009) using an α level of .05. This study was approved by the Curtin University and Turning Point Alcohol and Drug Centre ethics committees.
Results
There were statistically significant differences in age, sex, rurality, and education between occasional ecstasy users recruited online versus those who participated in the NDSHS (see Table 1). Occasional ecstasy users recruited online were younger (mean difference [M diff] = 3.9 years, 95% CI 2.8–4.9, p < .001) and a greater proportion were male (M diff = 7.5%, CI 0.3–14.5, p = .040), living in capital cities (M diff = 9.6%, CI 3.3–16.0, p = .003) and had completed secondary education (M diff = 16.9%, CI 10.7–23.2, p < .001). Statistically significance differences between the regular using samples were less pronounced but in the same direction, with the regular-using online sample being younger (M diff = 1.9, CI 0.6–3.2, p = .003), more likely to report living in a capital city (M diff = 12.7%, CI 1.6–23.8, p = .025), and having completed secondary education (M diff = 11.7%, CI 1.1–22.3, p = .030) than the matched NDSHS sample. The proportion reporting completion of a postsecondary school qualification did not differ between either samples.
Demographic Characteristics of Occasional and Regular Ecstasy Users for the Online Purposive Sample Matched with the 2007 NDSHS Probability Samples.
Note: NDSHS = National Drug Strategy Household Survey, SD = standard deviation.
aAll household survey estimates are weighted.
bExcludes those “still at school.”
cIncludes trade or technical certificate or diploma, undergraduate, and postgraduate qualifications.
*p < .05; **p < .01; ***p < .001.
With respect to “other” drugs, there were a number of differences between the occasional ecstasy users in the online purposive sample and the equivalent probability sample (see supplementary Table B). Occasional ecstasy users in the online purposive sample were more likely to report recent use of meth/amphetamine (Mdiff = 20.4%, CI 13.2–27.6, p < .001), hallucinogens (M diff = 24.8%, CI 18.8–30.7, p < .001), ketamine (M diff = 11.8%, CI 7.7–15.9, p < .001), GHB (M diff = 5.7%, CI 2.9–8.5, p < .001), and benzodiazepines for nonmedical use (M diff = 17.7, CI 12.1–23.3, p < .001) than the matched NDSHS sample. Occasional ecstasy users from the online sample were also more likely to report regular use of hallucinogens (M diff = 6.4%, CI 3.5–9.2, p < .001) and benzodiazepines (M diff = 9.2%, CI 5.0–13.4, p < .001).
In the regular-using samples, recent use of heroin (M diff = 4.5%, CI 0.4–8.7, p = .032) and cocaine (M diff = 11.7%, CI 0.4–23.0, p = .043), and having ever injected a drug (M diff = 7.2%, CI 0.2–14.1, p = .043), were significantly greater in the NDSHS sample, whereas recent use of hallucinogens was significantly greater in the online sample (M diff = 11.0%, CI 0.9–21.2, p = .033). Occasional ecstasy users in the NDSHS sample were more likely to report drinking alcohol monthly or more often (M diff = 6.8%, CI 2.1–11.4, p = .004) and weekly or more often (M diff = 14.5%, CI 7.4–21.5, p < .001) than the online purposive samples. Similarly, regular ecstasy users in the NDSHS sample were more likely to report drinking alcohol monthly or more often (M diff = 8.5%, CI 4.0–13.1, p < .001) and weekly or more often (M diff = 18.0%, CI 10.6–25.4, p < .001) than the online purposive samples.
Discussion
Purposive sampling of drug users through online methods increasingly occurs. Even though we make caveats that the findings should not be generalized, findings are often still generalized, perhaps due to a lack of alternative “better” evidence. One way to test the representativeness of purposive samples is to compare their characteristics with carefully matched probability samples, where such samples exist. We have performed this comparison here. When matched by time, locale, and frequency of ecstasy use, online purposive and NDSHS probability samples differed considerably, with the online samples generally younger, better educated, and more likely to report concurrent use of a range of stimulant and hallucinogenic drugs (i.e., “party drugs”). Coverage error and voluntary bias operate across both types of sampling. When used together, it may be possible to increase overall coverage of hard-to-access populations like ecstasy users, who are more likely to be the young and mobile groups least reached by household survey methods.
Our results regarding drug use estimates deviate from those reported by Miller et al. (2010): Their online-purposive sample reported less recent drug use than their probability sample. Our results challenge Miller and colleagues’ suggestion that the Internet “may be an effective way to recruit more dedicated, less polydrug-using ERD [ecstasy] users than other methods” (2010:444). As previously mentioned, Miller’s results may be explained by the two samples being mismatched on frequency of ecstasy use and locale. In the present study, online recruitment, specifically through online forums where drugs are discussed, provided access to samples of occasional ecstasy users who were more likely to report the use of other party drugs and regular ecstasy users who were just as likely to report other party drug use.
Value of Ethnographic Grounding
The online purposive sample reported in this article formed part of a larger project that also included online interviews and online participant observation, informed by virtual and multisited ethnography (see Barratt 2012; Barratt and Lenton 2010). We identify three reasons why an ethnographic grounding is useful for online purposive surveys.
First, developing an understanding of the sites and networks within which participants interact and interacting with them to gain their trust and respect is a vital part of the online recruitment process (Barratt and Lenton 2010; Potter and Chatwin 2011). Although this step is critical for traditional face-to-face ethnography, it is particularly important in virtual ethnographies where trust can only be built through online interaction.
Second, participant observation of online sites generates additional research sites. In this case, 40% of 40 forums were found through observing already-known forums. Thus, participant observation enabled the inclusion of a wider variety of online communities in the study.
Third, we are better able to interpret the survey findings. For example, we know that the forums that were most successful in recruiting participants to the online survey hosted online discussions about the nonmedical use of prescription drugs and the use of novel hallucinogens. Thus, it is not surprising that our purposive sample reported significantly more use of these substances than the matched household survey sample.
Limitations
This study has some limitations. Like Miller et al. (2010) and Topp et al. (2004), survey items between samples were not identical. Specifically, the alcohol question differences were likely to have biased the results toward more recent and regular alcohol use in the household survey given the more acute time frame, and the composite item measuring three different types of hallucinogens was likely to have biased the results toward more recent and regular hallucinogen use in the online survey. We believe that the differences in question wording cannot explain the other estimates that differed significantly, which were assessed using single comparable items in both surveys.
Wider windows of recall are likely to increase measurement error (Dillman 2007), and this fact informed the use of a six-month window in the online survey; however, the difference in time frames between samples is a limitation of this study. Identical questions and increased sample sizes—more specifically greater observations in population studies—would strengthen future studies of this type.
Future Directions
A general limitation of these kinds of studies is confounding of mode of data collection with sampling technique. Future research would benefit by implementing online versions of household drug use prevalence surveys aimed at improving access to younger subgroups of recreational drug users drawn from a probability sampling frame. Using identical items, order of presentation, and context would allow for comparisons between Internet purposive, Internet probability, and standard probability sampling modes. Teasing out how these samples differ will further our understanding of how best to interpret both bias in purposive samples and prevalence estimates of probability samples. Such work is also important because it informs interpretations of results arising from Internet research methods that are increasingly utilized for drug research.
Conclusion
Hidden populations cannot be effectively and meaningfully analyzed in generalized population-based surveys, as they are often out of scope or hard to reach. Obtaining generalizable results from such methods would be financially unviable. Consequently, convenience samples often are employed. But if these are increasingly rejected by many in the research community, researchers of hard-to-reach populations are stuck between a rock and a hard place. Yet, if purposive sample studies can be combined with comparable probability samples and ethnographic fieldwork, we can have more confidence in understanding and evaluating the external validity of purposive samples, and interpret the representativeness of the resultant findings. Probability samples can be enhanced by purposive online samples gathered and informed through participant observation of online discussions. We hope this article will prompt readers to take off the blindfold, admit that (online) purposive sampling of hidden populations happens and will continue to happen, and focus on developing better ways of harnessing these data to make more accurate inferences about hidden and stigmatized practices use in the wider population.
Footnotes
Appendix
Acknowledgments
We thank the survey respondents for their participation and the online communities that assisted with survey recruitment. We also acknowledge the Australian Institute of Health and Welfare, Department of Health and Ageing and the Australian Social Science Data Archive for providing access to the National Drug Strategy Household Survey data set. We declare that those who carried out the original analysis and collection of the data bear no responsibility for the further analysis or interpretation of it. We also thank Turning Point Alcohol and Drug Centre who supported this research in its initial stages.
Author’ Note
An initial version of this article was first presented in 2010 at the thirtieth Australasian Professional Society on Alcohol and Other Drugs Conference, Canberra, Australia.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The National Drug Research Institute at Curtin University is supported by funding from the Australian Government under the Substance Misuse Prevention and Service Improvement Grants Fund. J.F. is supported by the ARC Centre of Excellence in Policy and Security.
