Effect sizes in quantitative and qualitative research

Abstract

There has been much discussion in the fields of social sciences and education in general and second language research in particular around issues of appropriate research methods, rigour, and high standards for data collection, analysis, and interpretation. Many studies use a quantitative research design, in which the use of inferential statistics, including statistical significance tests, is widespread. However, issues have been raised regarding the adequacy of significance tests and the limitations of p values (see, for example, Nassaji, 2012). For example, while significance tests may tell us whether an effect exists, it does not tell us about the importance of the effect. In this respect, one suggestion has been to report measures of effect sizes (which refers to the magnitude of an effect or relationship) when presenting the results of significance tests. This recommendation has been made in the fields of psychology and education, and more recently in applied linguistics, with several applied linguistics journals now requiring or emphasizing the report of effect sizes (e.g., Language Learning, The Modern Language Journal, TESOL Quarterly, Studies in Second Language Acquisition).

However, such a recommendation has often been for quantitative research and not for qualitative research, and this is because effect sizes are quantitative in nature. However, it is also possible to calculate and report effect sizes for qualitative and interpretive research. One way of doing so would be by supplementing qualitative data analysis with a quantitative component when possible. For example, when coding qualitative data into themes and categories, those themes and categories can be quantified to ensure the precision of the analysis and determine quantitatively the degree to which different themes and categories are supported by the words, texts, or statements within the data.

Onwuegbuzie (2003) discussed two main types of effect size for qualitative research. One is what he called manifest effect sizes, which can be used for observable data. This type of effect size can be computed either by calculating the number of different themes or categories within and across participants or by calculating the number of statements or words that contribute to those themes. These counts, which can then be converted to percentages, can be taken to measure the degree to which the data support the themes. The higher the percentage or the frequency of a certain theme or the higher the number of statements contributing to that theme, the stronger the evidence for that theme. Manifest effect sizes can also be adjusted by taking into account the length of the unit of analysis used to code the themes. This can be done by dividing the number of themes by the total number of words, sentences, paragraphs or even the pages that have been analyzed. Effect sizes can also be calculated via an interval method by counting the number of themes or the statements/words supporting those themes within a section of the data, for instance, within the first 10 minutes of an interview.

The second type of effect size is what Onwuegbuzie called latent effect sizes. These effect sizes concern the no-observable aspect of the phenomenon under investigation and can be calculated by performing analyses that produce an underlying meta-theme. For example, when calculating the number of different themes, or the statements/words supporting those themes, exploratory factor analysis can be conducted to determine the meta theme (s) on which the emerging themes are loaded. When qualitative data becomes quantified, other types of inferential statistics can also be performed on the data. Traditionally inferential statistics have been used in quantitative research with the aim of generalizing from the sample to the population. However, inferential statistics can also be used on the number of words, statements, themes or categories in qualitative research. Here, the aim is not to generalize from the sample of participants to the population but from themes or words to the voice representing those themes or words. When doing so, effect sizes can also be calculated.

In this issue of Language Teaching Research (LTR), there are seven articles. Two are quantitative, three are qualitative, one has used a mixed-methods research design and one is a review article. In what follows, I will first summarize each and then briefly discuss them with reference to the use of effect sizes.

Rahimi examined the effects of revision in focused and comprehensive written corrective feedback. The aim was to investigate whether revisions in focused versus unfocused feedback had any effects on improving students’ writing accuracy and quality. Seventy-eight intermediate French ESL learners were randomly assigned to four groups: two focused groups (one with and one without revision) and two comprehensive groups (one with and one without revision). The findings showed that focused feedback was more effective in enhancing writing accuracy while comprehensive feedback was more effective in enhancing the quality of writing, measured in terms of meaning clarity and comprehensibility. These effects, however, varied across writing times. An effect was found for revision on overall writing accuracy but this was more evident for comprehensive feedback. As for effect sizes, this study used ANOVAs and t-tests and reported eta-squared (η2) and Cohen’s d as measures of effect sizes.

Lenkaitis and Loranc-Paszylk examined the extent to which lingua franca virtual exchanges involving the discussion of societal topics assisted the development of learners’ intercultural citizenship. Fifty-five graduate and undergraduate L2 learners in different institutions from four countries (Mexico, Poland, Spain, and the USA) met virtually to discuss macro-level societal topics. In weekly meetings, they discussed topics such as sports, patriotism, advertising, crime, and natural disasters in English or Spanish in small groups using zoom video conferencing. The study showed that discussing such topics facilitated the development of global citizenship by raising learners’ awareness and enabling them to notice cultural similarities and differences and reflect on their beliefs. This study was qualitative and analyzed the data by coding them into categories using NVivo. However, it also used quantitative analyses by calculating the frequency and percentages of the categories. These frequencies and percentages added extra support to the qualitative data, and in Onwuegbuzie’s term, it could be taken to provide an index for the manifest effect size.

Chan’s study examined how changes and developments in language teaching and learning were reflected in Hong Kong English language teaching textbooks and curricula over the last four decades. The findings showed a shift away from using structure-based methods to more task-based and communicative approaches in curricula and textbooks. It also showed that both curricula and textbooks became more student-centered, paying more attention to students’ active roles in their own learning. Such effects, however, were less apparent in textbooks. As for the analysis, the study conducted qualitative content analysis. However, in addition to coding and categorizing the data, the study also quantified them. For example, when analyzing the language activities or types of instruction, it also ranked or calculated their proportions across different textbooks and language skills. These quantifications enhanced the accuracy of the analysis and also provided evidence for the strength of the themes emerging.

Faez, Karas, and Uchihara reported a meta-analysis to examine the relationship between English teachers’ language proficiency and their perceived teaching ability. The meta-analysis also examined the role of a number of moderator variables such as type of research report, level of teaching experience, teaching degree, measures of self-efficacy and language proficiency and the language of the self-efficacy measure (L1 versus L2). The findings showed a moderate association between teachers’ level of language proficiency and their perception of their teaching skills. Among the moderator variables, only the type and language of self-efficacy measure explained variability in the relationship. As for effect sizes, the findings were primarily based on effect sizes as meta-analyses work with effect sizes. The study extracted the effect sizes from the studies included and reported them to summarize the findings.

Using a mixed-methods research design, Rao and Yu’s study investigated the effect of co-teaching (involving a native and a non-native speaker teacher) on English as a foreign language (EFL) students’ English proficiency in China. It also examined the students’ perception of such co-teaching. The study included four university classes, two of which served as experimental classes co-taught by a native speaker and a non-native speaker teacher, and the other two served as control classes taught by only one teacher. The study used a language proficiency test as a pretest and post-test and also interviewed and surveyed twenty students in the experimental classes using an open-ended questionnaire. The findings showed a positive effect of co-teaching on both learners’ language proficiency and their attitude towards co-teaching. As for statistical analysis, the study used t-tests. However, no effect sizes were reported. The Cohen’s d effect size could have been reported, which is an appropriate method when comparing two means. The students’ perceptions were analyzed qualitatively but these analyses were followed by some quantification, showing the strength of the emerging themes.

Alhassan conducted an exploratory study to examine EAP (English for academic purposes) students’ needs, focusing on business as an area of study. Using a longitudinal qualitative methodology, the study examined students’ language and skill needs in an English-medium master of business administration programme (MBA) in Sudan from both the students’ and teachers’ perspectives. The data were collected through semi-structured interviews from 31 participants, including 10 MBA teachers and 21 MBA students. The participants highlighted a range of language skills believed to be needed for success in the program including the importance of learning discipline-specific vocabulary. They also stressed the importance of developing communication skills, both oral and written. The data were coded and categorized and sample quotations from the interviews illustrated the findings. No quantified measures were reported.

The last article by Pawlak concerns issues around the role of language learning strategies. He discussed three key issues that, according to him, should be considered to move the field forward. These were the foci of future research, methodological choices, and the implications of research for pedagogy. It is argued that instead of focusing on the generalized use of learning strategies, research should focus on how learning strategies can assist learners in specific domains such as language skills, culture affective, and communication domains. It is also argued that large-scale research with adequate sample sizes, well-defined strategy taxonomy, and appropriate statistical analyses are needed. Finally, an important criterion for evaluating the utility of learning strategy research is the degree to which the findings can inform classroom instruction. Overall, Pawlak’s article opens up new avenues for research on learning strategies, making useful suggestions about how future research in this area can be more effectively conducted. Since this paper is a critical review article, no qualitative or quantitative data collection or analysis is reported.

References

Nassaji

(2012). Significance tests and their relation to generalizability of research results: A case for replication. In Porte

(Ed.), Replication research in applied linguistics and second language research: A practical guide. Cambridge: Cambridge University Press.

Onwuegbuzie

(2003). Effect Sizes in Qualitative Research: A Prolegomenon. Quality & Quantity 37, 393–409.