Ostinato rigore : establishing methodological rigour in quantitative research

Abstract

The last 20 years in South Africa and abroad have evidenced huge changes in the ways in which research is accessed and produced. These changes were facilitated by the rapid developments in technology. Collaborating with researchers across the globe and accessing articles and research can be done at the push of a button and response times are as instantaneous. Conducting and communicating one’s own research are also much easier. This led to a veritable explosion of publishers and journals, some of which are legitimate and others predatory. In this climate, the adage of ‘publish or perish’ has become a lived reality placing increasing pressure on scholars to publish. An unintended consequence of this is the increasing lack of methodological rigour in studies. This article advocates for increasing attention to methodological rigour in quantitative research. In so doing, guidelines and suggestions are provided in terms of elements to be considered within each of the broad aspects of a study, namely, sampling, instrumentation, methods, design, and data analysis. These are drawn from the literature as well as the author’s own experiences in teaching quantitative research methods, supervising postgraduate student research, reviewing articles for local and international journals, as well as experiences of reviewing articles located within the quantitative paradigm as Associate Editor for the South African Journal of Psychology. Ultimately, this article seeks to create awareness among researchers around the necessity for methodologically rigorous research to enhance the quality of outputs. This will have the effect of producing impactful research that can confidently inform policy, practice, and training within the discipline.

Keywords

External validity internal validity quantitative reliability replicability research methods

There can be no doubt that technology has changed the face of research. It has never been easier to access information than it is now. In this knowledge economy, publication has gained even more currency and researchers the world over experience pressure to publish and maintain a competitive edge in their respective fields. While this is positive for society in terms of the volumes of research produced, it has also been detrimental to the scientific community. Leonardo Da Vinci is credited with the phrase ‘ostinato rigore’ or ‘relentless rigour’ which the artist used frequently to remind him of the need for precision in his work. Similarly, it is necessary that researchers remember the need for rigour. In the race to publish, researchers often gloss over the need for conceptual and methodological rigour.

In this article, rigour and methodological rigour are defined. This is followed by the aspects to consider when ensuring methodological rigour in quantitative studies. The article focuses specifically on quantitative methods, while qualitative research is the subject of an accompanying paper in this issue (Saville-Young, in press). A search on Google and Google Scholar on ‘methodological rigour’ also revealed that much had been written about rigour in qualitative and mixed-methods studies, but fairly little has been written on quantitative studies. Perhaps there is an assumption that all researchers should be familiar with the classical work of Campbell and colleagues from the 1960s and 1970s (Coryn, 2007), but increasingly it is becoming evident from the nature of research published that researchers particularly in the quantitative tradition are overlooking the need to conduct rigorous research.

In making suggestions for the criteria to consider in establishing methodological rigour, the author draws on her experiences of teaching and supervising research at undergraduate and postgraduate levels as well as her experiences of reviewing articles for a wide variety of journals in psychology and related disciplines locally and internationally. The author also draws on her experience of reviewing manuscripts (particularly those in the quantitative paradigm) as Associate Editor of the South African Journal of Psychology.

What is rigour?

Rigour generally refers to processes followed to ensure the quality of the final research product. In essence, it can be thought of as a quality control mechanism for research. According to Tobin and Begley (2004), rigour in research is normally conceived of as the means by which integrity and competency are confirmed. It is a way of demonstrating the legitimacy or soundness of the research process (Coryn, 2007, p. 26).

A search online for information on rigour in research largely yields texts written about rigour in qualitative research. Guba (1981) argued that this is probably because of the nonexperimental nature of qualitative research that it needed to prove itself to align with a more objective scientific paradigm. Although this has lessened considerably since Guba’s assertion, there is still a greater focus on establishing rigour within qualitative traditions. It is almost as if the rigour of quantitative research is assumed. Rigour within the context of quantitative research refers to how well the research idea or project has been developed, how concise and objective the design and analytic techniques are, and how scrupulously the rules have been adhered to and applied to all decisions (Krefting, 1991). When rigour in quantitative research is addressed, it appears to be largely from a medical perspective with the concern being on experimental research and randomised controlled trials (RCTs) (see Claydon, 2015).

Why the need for rigour in quantitative studies?

Rigour should be a core concern of researchers. If one examines the most common reasons for manuscripts being rejected, a significant proportion of those pertain to conceptualisation and methods. Fonseca (2013) identifies six core reasons for manuscript rejection as follows: (1) issues unrelated to manuscript, (2) mismatch with journal, (3) inadequate preparation, (4) design flaws, (5) poor writing and organisation, and (6) lack of originality. From the six core reasons, it is evident that Reasons 4–6 pertain to rigour. Issues with regard to inadequate preparation and poor writing and organisation are easily fixed. However, design flaws and lack of originality speak directly to rigour. Fonseca (2013) further identifies design flaws as involving poorly formulated research questions, poor conceptualisation of the approach to answering the research question, choice of an obsolete, weak, or unreliable method, choice of an incorrect method or model that is not suitable for the problem to be studied, inappropriate or suboptimal instrumentation, small or inappropriately chosen sample, inappropriate statistical analysis, and/or unreliable or incomplete data. Fonseca (2013) identifies results that are not generalisable; secondary analyses that extend or replicate published findings without adding substantial knowledge; studies that report already known knowledge but positions the knowledge as novel by extending it to a new geography, population, or cultural setting; results that are unoriginal, predictable, or trivial; and results that have no clinical, theoretical, or practical implications as aspects lacking in originality leading to rejection of manuscripts.

What is methodological rigour within the context of quantitative research?

From the description of the design flaws and originality requirements, it is evident that the traditional criteria of internal validity, external validity, and replicability are core to a rigorous quantitative study.

Internal validity is generally concerned with inferring causality between variables and therefore seeks to account for or eliminate the effects of extraneous variables (Babbie & Mouton, 2004; Cook & Campbell, 1979). Cook and colleagues have in their many publications identified several threats to internal validity advising researchers to take cognisance of them to ensure more methodologically rigorous research. Among these are history, maturation, testing, attrition, selection bias, and diffusion (Babbie & Mouton, 2004; Cook & Campbell, 1979). External validity evaluates the generalisability of results across persons (population validity), settings (ecological validity), and times (temporal validity; Shadish, Cook, & Campbell, 2002), while replicability refers to the extent to which the same or similar results would be obtained if the same study was conducted elsewhere (Shadish et al., 2002). Some texts would refer to replicability as repeatability or reliability (Coryn, 2007). However, within the context of this article, reliability will only be used within the psychometric context. Readers can find definitions and descriptions for each of the internal and external validity threats in any quantitative research methods textbook. In accordance with Shadish et al. (2002), the author advocates that threats are better identified through insider knowledge than from an abstract, nonlocal list of threats (p. 474). Thus, rather than adopting a mechanistic, checkbox approach which speaks to examining each threat proposed by Cook and colleagues, this article argues for a more contextual approach towards methodological rigour.

As such, the next sections of this article will discuss each aspect of a manuscript from sampling through to instrumentation, data collection, and analysis to provide criteria that should be checked prior to and while conducting the research as well as during the analyses and write-up phases of the study without locating them in the traditional threats. As indicated earlier, the approach will be partially autoethnographical in the sense that the author draws on over 15 years of personal experience to provide guidelines to improve methodological rigour in quantitative research.

Variety is the spice of life: choosing a representative sample

Sampling can be a complex issue. One needs to consider three broad aspects, namely, composition of sample, representativeness of the sample, and size of sample. These elements of sampling have implications for internal validity particularly in experimental studies where samples are further split into groups for comparison purposes. They also have important consequences for external validity in that one is unable to generalise from small, nonrepresentative samples, yet a number of manuscripts conclude with broad-based generalisations based on very small, specific samples (see Laher & Botha, 2012).

With regard to sample composition, among the most common omissions in manuscripts is a brief description and/or justification of the type of sampling method used. Very common in research is the reliance on nonprobability, convenience sampling. This is often not specified in the methods section and is rarely acknowledged as a limitation in the paper. Probability samples are not easy to obtain. The process is complex and time consuming and sometimes may not be possible. Hence, researchers often use nonprobability sampling without reflecting on its limitations. As it is a method commonly taught and used in graduate student research, it is becoming commonplace for researchers to continue using this sampling technique. Nonprobability sampling, even when justified, is never wholly representative and has implications for generalisability (Babbie & Mouton, 2004; Rosenthal & Rosnow, 2007).

Related to this is the reliance on student samples, particularly Psychology student samples, and the mistake of generalising from student samples to the general population. This has been highlighted across various disciplines within psychology. Studies by Sears (1986) and Henry (2008) are often cited as studies which demonstrate the limitations of relying on student samples. The most recent arguments against the reliance on student samples come from Henrich, Heine, and Norenzayan (2010) who demonstrate empirically across a range of fields in psychology the unrepresentativeness of undergraduate student samples. They provide evidence where results obtained on these samples are often outliers when looked at in comparison to similar studies on other samples leading them to coin the term ‘WEIRD’ to describe overreliance on Western, Educated, Industrialised, Rich, and Democratic individuals. They conclude the paper by advocating for rigour in sample composition and selection recommending that journal editors become far stricter on the types of studies accepted for publication where WEIRD samples should be interrogated so as to deem their relevance for the study and population they are supposed to represent.

In further exploring the composition and selection of samples, attrition and rates of attrition (participants dropping out from the study), missing data, and nonresponse need to be explored. These are issues often queried by reviewers. Aside from methodological rigour, these aspects have the potential to be among the more interesting cases in a study and should be explored where possible. Studies often rely on volunteers and volunteer bias is often not considered. Lönnqvist et al. (2007) have demonstrated significant personality differences between volunteers and nonvolunteers. Similarly, Marcus and Schütz (2005) found personality differences between those who choose to respond and those who do not.

It is interesting to note that Marcus and Schütz (2005) used an online survey to collect data. In an effort to diversify samples and obtain larger samples, researchers are increasingly using online platforms. Payne and Barnfather (2012) reported on a South African study that requested undergraduate students to complete a survey online or traditionally using a paper-and-pencil survey. Their results demonstrated clear differences between those who chose to complete the online version versus those who completed the paper-and-pencil version. However, Payne and Barnfather (2012) had access to a very specific undergraduate population through lectures or via the university’s online communication system.

This would be different to other researchers who have used online platforms like Facebook or LinkedIn to have access to large numbers of people. Typically, a message on Facebook or LinkedIn with a link to an online survey can get many respondents from across the globe. These more open-ended convenience approaches provide large numbers of people, but the quality of data is poor. Furthermore, there is no way of ensuring the integrity of such data (Casler, Bickel, & Hackett, 2013; Skitka & Sargis, 2006). More recently, studies have acknowledged the need for a more systematic approach to online sampling rather than a message on a website or similar. Crowdsourcing is fast becoming the approach to accessing samples.

Crowdsourcing, a solution?

Crowdsourcing is defined as ‘the paid recruitment of an online, independent global workforce for the objective of working on a specifically defined task or set of tasks’ (Behrend, Sharek, Meade & Wiebe, 2011, p. 801). One of the most popular crowdsourcing platforms is Amazon’s Mechanical Turk (MT). Requesters (in this case researchers) can outsource small tasks (surveys, etc.) referred to as human intelligence tasks (HITs) to a global workforce (potential respondents) in exchange for monetary compensation. In cases where money cannot be deposited into accounts, respondents can be rewarded with Amazon.com gift certificates. If a worker on MT produces substandard work, a requester may reject it and the worker’s rating on MT is decreased. Given this, there is the assumption that respondent biases will be reduced, but there is also the possibility that social desirability may increase (Behrend et al., 2011). Behrend et al. (2011) provide a balanced argument with regard to the pros and cons of sampling in this way, while Buhrmester, Kwang, and Gosling (2011) conclude that the benefits of MTurk outweigh the limitations. Paolacci and Chandler (2014) argue that while MTurk samples are more representative than student samples, they are still not as representative as the general population, while Casler et al. (2013) provide evidence for the superiority of MTurk in accessing samples as compared to face-to-face data collection.

There are as yet many unknown factors about samples obtained from MTurk and other crowdsourcing facilities like the specifics of how prior experience, community norms, and other factors influence responding by these individuals and how sampling decisions (e.g. when the task is posted and how it is described) influence the characteristics of MTurk samples (p. 187). It is essential that researchers utilising such platforms explore the benefits and limitations for particular samples especially as they pertain to South Africa and Africa. Issues of literacy, language, quality of education, class, and culture have often been identified as impacting on research findings and often require quite deliberate attempts at obtaining appropriate samples (Laher & Botha, 2012). This would be exacerbated in the context of online research given the current digital divide that exists on the continent (see Fuchs & Horak, 2008; Russell & Steele, 2013; Selwyn, 2004).

Bigger is better: sample size in quantitative studies

The adage that ‘bigger is better’ applies most to the quantitative tradition in that this type of research seeks to generalise and predict, and as such it can only do this if samples are large enough and from representative groups. Unfortunately, there is no simple rule as to how large a sample should be. Texts will, for example, propose rules like including three people for every item in a questionnaire (Kline, 1994), but these usually apply to studies in the field of test validation and development. In multiple regression, Field (2013) provides several guidelines on potential sample size based on the power, effect size, and number of variables to be included in the study. Field (2013) discusses the use of sample size calculators. These generally require one to specify the effect size, the power level required, and the level of significance one wishes to use. Some of the more reliable calculators recommended by Field (2013, p. 70) are G*Power (open source), the pwr package on the open-source statistics package R, and commercially, nQuery Adviser, Power and Precision, and PASS (Power Analysis and Sample Size). In using these estimators, one obtains a more accurate estimate of sample size required and eliminates the potential of finding false results. Part of the dilemma of big samples is that it is sometimes possible to detect significant results simply because the sample was big enough, not because a true difference exists (Field, 2013). Hence, there is increasing emphasis on calculating effect sizes and confidence intervals in research (see Ferguson, 2009; Marszalek, Barber, Kohlhart, & Holmes, 2011).

Both the composition of the sample and sample size ultimately affect the representativeness of a sample. As a researcher, it would be necessary to consider these aspects during the planning stages of a project through to implementation, data collection and analyses, and write-up. Where possible, extraneous variables arising from sampling should be anticipated and controlled for or at the very least examined in addition to examining the research questions particular to the study (Marszalek et al., 2011). It is further recommended that researchers consult statisticians in the planning stages of a project to obtain advice on sampling and sample size so as to obtain better data.

Randomised control trials: the gold standard?

In experimental research, papers also often make no mention of how the sample was assigned to groups. Random assignment is an important consideration for internal validity without which the possibility of extraneous variables in groups could confound the results (Babbie & Mouton, 2004). This is particularly salient within the context of experimental research in psychology and randomised control trials (RCTs) in particular. Thus far in the article, I have not mentioned any issues pertaining specifically to RCTs, but if one examines the literature on rigour in quantitative methods, a number of articles address this largely from the perspective of RCTs (see Claydon, 2015). There are universally accepted guidelines on how to conduct RCTs, how to select appropriate samples and decide on sample size, and how to analyse such data. The Consolidated Standards of Reporting Trials (CONSORT) was developed in 1996 and subsequently revised in 2001 and again in 2010 to address the issues pertaining to rigour in RCTs (Schulz, Altman, & Moher, 2010). Researchers using this type of design are referred to the CONSORT document, which has user friendly tables and checklists to allow researchers to evaluate their studies.

In South African psychology, RCTs are not as commonplace but experimental designs are often used. Using the combined experience of the editorial team at the South African Journal of Psychology, it is becoming more common to see manuscripts where authors claim to test intervention effects without using RCTs or even including control groups. Most often randomisation is also not addressed. Within this context, it becomes essential for researchers to consider the internal validity threats as espoused by Cook and colleagues (Babbie & Mouton, 2004; Cook & Campbell, 1979). Random assignment becomes vital to the design, and in the absence of random assignment, researchers would need to explain techniques used to eliminate internal validity threats. However, Kazdin (2008, 2014) raises important debates with regard to RCTs and experimental designs within the context of evidence-based treatment (EBT) and evidence-based practice (EBP). He acknowledges that psychology in particular needs to document the outcomes of its therapeutic interventions more systematically and demonstrate the success of its therapeutic techniques empirically. However, in so doing, experimental designs create artificial lab-like settings where clients are often not similar to clients who present for psychotherapy in terms of presentation and intensity of symptoms. While fully acknowledging the importance of this work for EBT and EBP, he argues that other research methods might be necessary to complement these findings. Kazdin (2008) makes a case for qualitative research to be used alongside quantitative research to provide the necessary evidence to demonstrate the efficacy of therapeutic interventions. He also raises the important point of statistical versus clinical significance and cautions researchers not to give preference to one over the other. In different contexts, one might be better to consider than the other, and in others, both would hold equal weight (Kazdin, 2008). Hence, RCTs are not necessarily the gold standard within research particularly in psychology. However, whatever form intervention research assumes, it would be necessary to include some sort of control condition and/or pre-measure against which to benchmark the efficacy of the intervention.

The tools of the trade: instrumentation

Largely because of its emphasis on internal validity and replicability, quantitative research uses very specific instruments that have set questions with set response formats that are standardised and have followed rigorous development procedures (see Foxcroft & Roodt, 2013; Nunnally & Bernstein, 1994). It is necessary for researchers to describe the instrument and what it measures, describe the subscales and what they measure (if any), describe the response format and scoring procedures, and include some research on the reliability and validity of the instrument ideally as was found in the local context. Oftentimes the instrument has not been used in the local context. On examination of the instrument, it can either be used as is or it might require adaptation. In both cases whether the instrument is going to be adapted or not, it is recommended that the instrument be piloted before use. Furthermore, in all cases, instruments should be subjected to psychometric scrutiny in that at the very least the reliability (usually internal consistency) of the instrument is examined and reported. However, in practice it is commonplace to see this aspect neglected in methodology descriptions.

Without examining the reliability, validity, and lack of bias in instruments, it is not possible to draw any conclusions based on the instruments. It is not within the scope of this article to outline the types of reliability, validity, and bias and how to assess them. There are a number of good South African textbooks that do this already (see Foxcroft & Roodt, 2013; Moerdyk, 2014). Furthermore, the International Test Commission (ITC) has a number of documents available on their website (www.intestcom.org) which may be consulted. Of particular interest would be the ‘ITC Statement on the use of tests and other assessment instruments for research purposes’ (ITC, 2014) and the ‘ITC Guidelines for Translating and Adapting Tests’ (ITC, 2005).

For this article, it suffices to say that to establish rigour in quantitative research a researcher needs to examine the psychometric credibility of the instrument. In so doing, the researcher needs to be careful not to confuse the psychometric concepts of reliability, validity, and bias with those in research (internal and external validity). While they are inextricably linked, they are different concepts referring to different aspects within the quantitative study. Furthermore, the uncritical acceptance of instruments created in contexts very different to those of the researcher also has strong implications for the external validity or generalisability of the results. While the space limitations of a paper often do not allow for detail on psychometric analyses to be included, it is still necessary to conduct them and the findings should be reported even if only in a few sentences in the relevant methods section.

Professional precision: clarifying the data collection methods

Journals do not necessarily require a section on research method or research design. Hence, often authors fail to include this information in manuscripts. Instruments are often incorporated into a survey or interview or other method of collecting data. Sometimes (although this is among the rarest of omissions), the paper fails to mention the method(s) of data collection.

More common in manuscripts is the evident lack of input or discussion of the design. If not explicitly requested, the information on design could be inserted briefly into the Procedures section. Designs that are ‘naturalist’ or nonexperimental will probably not require much description beyond brief mention and discussion of the aspects of the design, for example, is it cross-sectional or longitudinal, and what are the implications and limitations of selecting the design. More detail would be required for quasi-experimental and experimental research. Information on whether there were groups or not, the method of assignment, whether there were repeated measures, and the kinds of controls in place to ensure equivalence are among some of the aspects to be addressed particularly for internal validity and replicability as emphasised in the CONSORT document (see Schulz et al., 2010).

A common omission in this section is some mention of how ethical issues were addressed in the study. Usually these relate to but are not necessarily limited to informed consent, confidentiality, anonymity, invasion of privacy, feedback, and debriefing. There are some journals that will not require this unless there are specific issues pertaining to an ethical dilemma, for example, a vulnerable sample or invasion of privacy. However, the South African Journal of Psychology requires that researchers state whether institutional ethics approval for the study was obtained or not and by whom. Some journals will request a section on ethical considerations and others will want the information in a procedures or data collection section of a paper. Wassenaar and Slack (2016) provide further discussion on some aspects of research ethics in an accompanying paper in this issue.

Nuked by numbers: data analysis

Quantitative analyses for any paper proceed quite systematically, but often it is this section where a number of errors are found or clarifications requested. Papers often only speak about the analytic techniques used to answer the questions without reporting on what would have been done before using the inferential techniques to ensure the integrity of the data. It is essential that datasets be ‘cleaned’ before embarking on any analyses. This means exploring the dataset for errors and omissions. Where there are omissions, it is always useful to conduct missing value analyses. Whether the data are missing completely at random, missing at random, or missing not at random has important implications for the results of the study. There are many ways of statistically controlling for this missing data to ensure that the dataset maintains its integrity (see Field, 2013; Howell, 2013).

Following the cleaning, the data need to be subjected to psychometric analyses if scales are employed in the study. The reliability, validity, and lack of bias as discussed in previous sections of this article need to be examined. Item analyses can be conducted on scales if necessary (Foxcroft & Roodt, 2013).

The dataset should only be deemed ready for statistical analyses once the cleaning and psychometric analyses are complete. Broadly, statistical analyses fall into two categories, namely, descriptive statistical analyses and inferential statistical analyses. Descriptive statistics summarise data into a set of numbers that provide a good summary of trends in the data. Inferential statistics use the numbers generated in descriptive statistical analyses as estimates to test inferences or hypotheses about the phenomena being studied (Howell, 2013). Hence, it is necessary to examine descriptive statistics before undertaking inferential analyses.

Furthermore, each statistical/inferential analytic technique makes certain assumptions about the data that need to be fulfilled before analysing the data. Descriptive statistics assist one in making the decision about the kind of techniques that may be used. Often the decision is between parametric or nonparametric univariate techniques. In deciding this, one has to consider whether the data are normally distributed where applicable. Hence, skewness and kurtosis coefficients are examined and tests of normality conducted. Graphs may also be examined (Howell, 2013; Huck, 2009), but these do not necessarily have to be included in the manuscript. Sometimes authors include basic graphs to describe the demographic composition of the sample or a histogram to demonstrate normality. This is often not core to the study and can be mentioned in a single sentence.

From the collective experience of the editorial team at the South African Journal of Psychology, manuscripts use sample size as a justification for the use of parametric statistical techniques. While this is not wholly incorrect, it is incorrect to assume that parametric techniques require sample sizes over 100 for example. Parametric techniques hypothesise around the mean, and in skewed distributions, the mean is not the best estimate to use in making inferences about the population from the sample. However, the Central Limit Theorem (CLT) allows one to assume that the variables should be normally distributed in samples of 30 or more individuals (Field, 2013; Howell, 2013). Hence, researchers often use the CLT as a justification for meeting the assumption of normality, but it is better to use other objective empirical evidence like skewness and kurtosis coefficients and tests of normality if sample sizes are small. If distributions are not normal, researchers should consider trimming data (excluding outliers), winsorising (substitute outliers with the highest value that is not an outlier), bootstrapping, and/or the use of transformations (Field, 2013).

Aside from assessing whether the variables are normally distributed, it is necessary to understand the scale of measure of a variable to conduct inferential analyses. Certain analyses work with nominal data and others with ordinal or interval data (Howell, 2013). Homogeneity of variance/homoscedasticity/sphericity (as would be appropriate to the inferential technique) also needs to be examined (Field, 2013; Howell, 2013). If one is conducting any multiple regression, other assumptions would apply. Multivariate techniques also have assumptions (Field, 2013; Hair, Anderson, Tatham, & Black, 1998; Tabachnik & Fidell, 2013). Papers often do not show any evidence of having explored any assumptions before applying the techniques for analysis.

There tends to be a reliance on using only p-values in reporting and interpreting results. This is not enough. Effect sizes and confidence intervals are useful to consider as they lend further strength and credibility to findings (see Ferguson, 2009; Field, 2013). Ferguson (2009) provides an excellent background to effect sizes and their use and provides useful guidelines for effect size calculations and interpretations across the various statistical techniques. Hence, he discusses group comparison indices, strength of association indices, and risk estimates.

A more practical concern in papers is linked to the reporting and presentation of results. Journals have guidelines as to what should be reported in text and in tables, but often this is not adhered to. Most Psychology journals subscribe to the sixth edition of the ‘Publication Manual of the American Psychological Association’ (APA, 2010) which provides clear and explicit guidelines as to what needs to be included and how the analyses should be presented and reported.

Concluding comments

In the article, the concept of rigour was explored making a distinction between conceptual and methodological rigour in research. The article focussed on methodological rigour making a strong case for this aspect to be considered in the planning of research and eventually in the publication of research. Sampling, instruments, research methods, research design, and data collection are core to quantitative methods. Hence, the article focussed on those areas providing guidelines and suggestions to researchers working in a quantitative paradigm. The objective nature of this type of research is clearly evident from the systematic and structured manner in which these guidelines can be provided. This is in contrast to Saville-Young (in press) who provides a discussion of rigour in qualitative methods.

Quantitatively, the focus is on replication, prediction, and if at all possible causal relationships between variables. As such, there is a strong focus on eliminating extraneous variables (internal validity) and generalisability of results (external validity). The article did not describe these validity concepts as espoused by Cook and colleagues (Cook & Campbell, 1979; Shadish et al., 2001). Instead, a more pragmatic approach was adopted alerting the researcher to common methodological pitfalls and ways to overcome or control for them regardless of whether they may be internal or external validity issues.

While these approaches do have merit, the arguments of Kazdin (2008, 2014) also need to be borne in mind with regard to statistical versus clinical significance as well as the complementary nature of quantitative and qualitative research. Hence researchers will often find the boundaries between paradigms become blurred, leading to mixed-methods research. A lot of what is discussed in the article regarding sampling, instrumentation, and method can be applied to qualitative and mixed-methods research as well.

It may appear that this article adopts somewhat of a didactic, textbook approach. This intention must be viewed against the backdrop of common errors noted in quantitative research manuscripts received at the South African Journal of Psychology as well as others evaluated over the years for other publications. This article sought to highlight the issues regarding methodological rigour with the hope that more reflexive researchers will emerge who will take cognisance of these issues ultimately producing work of better quality, the findings of which can be incorporated with greater confidence in policy, practice, and further research.

Footnotes

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.

Babbie

E. R.

Mouton

(2004). The practice of social research. Cape Town: Oxford University Press Southern Africa.

Behrend

T. S.

Sharek

D. J.

Meade

A. W.

Wiebe

E. N.

(2011). The viability of crowdsourcing for survey research. Behavioural Research, 43, 800–813. doi:10.3758/s13428-011-0081-0

Buhrmester

Kwang

Gosling

S. D.

(2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3–5. doi:10.1177/1745691610393980

Casler

Bickel

Hackett

(2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29, 2156–2160.

Claydon

L. S.

(2015). Rigour in quantitative research. Nursing Standard, 29, 43–48.

Cook

T. D.

Campbell

D. T.

(1979). Quasi-experimentation: Design and analysis issues for field settings. Boston, MA: Houghton Mifflin.

Coryn

C. L. S.

(2007). The holy trinity of methodological rigor. Journal of Multidisciplinary Evaluation, 4(7), 26–31.

Ferguson

C. J.

(2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40, 532–538.

10.

Field

(2013). Discovering statistics using SPSS (4th ed.). Thousand Oaks, CA: SAGE.

11.

Fonseca

(2013). Most common reasons for journal rejections. Retrieved from http://www.editage.com/insights/most-common-reasons-for-journal-rejections

12.

Foxcroft

Roodt

(2013). An introduction to psychological assessment in the South African context (4th ed.). Oxford, UK: Oxford University Press.

13.

Fuchs

Horak

(2008). Africa and the digital divide. Telematics and Informatics, 25, 99–116. doi:10.1016/j.tele.2006.06.004.

14.

Guba

E. G.

(1981). Criteria for assessing the trustworthiness of naturalistic inquiries. Educational Resources Information Center Annual Review Paper, 29, 75–91.

15.

Hair

J. F.

Jr, Anderson

R. E.

Tatham

Black

W. C.

(1998). Multivariate data analysis (5^th Ed.). Upper Saddle River, NJ: Prentice Hall.

16.

Henrich

Heine

Norenzayan

(2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61–83. doi:10.1017/S0140525X0999152X.

17.

Henry

P. J.

(2008). College sophomores in the laboratory redux: Influences of a narrow data base on social psychology’s view of the nature of prejudice. Psychological Inquiry, 19(2), 49–71. doi:10.1080/10478400802049936.

18.

Howell

D. C.

(2013). Statistical methods for psychology (8th ed.). Belmont, CA: Wadsworth.

19.

Huck

S. W.

(2009). Reading statistics and research (5th ed.). Boston, MA: Pearson.

20.

International Test Commission. (2005). International guidelines on test adaptation. Retrieved from https://www.intestcom.org/files/guideline_test_adaptation.pdf

21.

International Test Commission. (2014). ITC statement on the use of tests and other assessment instruments for research purposes. Retrieved from https://www.intestcom.org/files/statement_using_tests_for_research.pdf

22.

Kazdin

(2008). Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care. American Psychologist, 63, 146–159. doi:10.1037/0003-066X.63.3.146.

23.

Kazdin

(2014). Evidence-based psychotherapies I: Qualifiers and limitations in what we know. South African Journal of Psychology, 44, 381–403.

24.

Kline

(1994). An easy guide to factor analysis. London, UK: Routledge.

25.

Krefting

(1991). Rigor in qualitative research: The assessment of trustworthiness. The American Journal of Occupational Therapy, 45, 214–222. Retrieved from http://apus.libguides.com/research_methods_guide/scientific_method_rigor

26.

Laher

Botha

(2012). Methods of sampling. In Wagner

Kawulich

Garner

(Eds.), Doing social research: A global context (pp. 86–100). London, UK: McGraw-Hill.

27.

Lönnqvist

J.-E.

Paunonen

Verkasalo

Leikas

Tuulio-Henriksson

Lönnqvist

(2007). Personality characteristics of research volunteers. European Journal of Personality, 21, 1017–1030. doi:10.1002/per.655

28.

Marcus

Schütz

(2005). Who are the people reluctant to participate in research? Personality correlates of four different types of nonresponse as inferred from self- and observer ratings. Journal of Personality, 73, 959–984. doi:10.1111/j.1467-6494.2005.00335.x

29.

Marszalek

J. M.

Barber

Kohlhart

Holmes

C. B.

(2011). Sample size in psychological research over the past 30 years. Perceptual and Motor Skills, 112, 331–348.

30.

Moerdyk

(2014). The principles and practice of psychological assessment (2nd ed.). Pretoria, South Africa: Van Schaik Publishers.

31.

Nunnally

J. C.

Bernstein

I. H.

(1994). Psychometric theory. New York: McGraw-Hill.

32.

Paolacci

Chandler

(2014). Inside the Turk: Understanding Mechanical Turk as a participant pool. Current Directions in Psychological Science, 23, 184. doi:10.1177/0963721414531598

33.

Payne

Barnfather

(2012). Online data collection in developing nations: An investigation into sample bias in a sample of South African University students. Social Science Computer Review, 30, 389–397. doi:10.1177/0894439311407419

34.

Rosenthal

Rosnow

R. L.

(2007). Essentials of behavioral research: Methods and data analysis (3rd ed.). New York, NY: McGraw-Hill.

35.

Russell

S. E.

Steele

(2013). Information and communication technologies and the digital divide in Africa: A review of the periodical literature, 2000–2012. Electronic Journal of Africana Bibliograph, 14. Retrieved from http://ir.uiowa.edu/cgi/viewcontent.cgi?article=1015&context=ejab

36.

Saville-Young

(2016). Key concepts for quality as foundational in qualitative research: milkshakes, mirrors and maps in 3D. South African Journal of Psychology, 46, 328–337.

37.

Schulz

K. F.

Altman

D. G.

Moher

(2010). CONSORT 2010 statement: Updated guide- lines for reporting parallel group randomised trials. BMC Medicine, 8, 18. doi:10.1186/1741-7015-8-18.

38.

Sears

(1986). College sophomores in the lab: Influences of a narrow data base on social psychology’s view of human nature. Journal of Personality and Social Psychology, 51, 515–530. doi:10.1037/0022–3514.51.3.515

39.

Selwyn

(2004). Reconsidering political and popular understandings of the digital divide. New Media and Society, 6, 341–362.

40.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and Quasi-experimental designs for causal inference. Berkeley, CA: Houghton Mifflin.

41.

Skitka

L. J.

Sargis

E. G.

(2006). The internet as psychological laboratory. Annual Review of Psychology, 57, 529–555.

42.

Tabachnick

B. G.

Fidell

L. S.

(2013). Using multivariate statistics, 6th ed. Boston, MA: Pearson.

43.

Tobin

G. A.

Begley

C. M.

(2004). Methodological rigour within a qualitative framework. Journal of Advance Nursing, 48, 388–396.

44.

Wassenaar

D. R.

Slack

(2016). How to learn to love your research ethics committee: recommendations for psychologists. South African Journal of Psychology, 46, 306–315.