Abstract
While scholars in the field of writing studies have examined scientific writing from multiple perspectives, interest in its thematic structure has been modest. Recent studies suggest that the themes in scientific writing tend to be anchored on one or a few points of departure. There has also been an attempt at quantification using the thematic-density index (TDI), although this has only been tested on abstracts. In this study, we investigated the thematic structure and TDIs of 30 research articles in biology. The results revealed a progressive thematic pattern in the introduction section, followed by an anchored development in the subsequent sections. The anchoring was realized by the pervasive use of the first-person pronoun “we.” The mean TDI was lowest in the introduction section (2.593) and highest in the results section (7.095). The results were consistent across the articles in the corpus, underscoring the uniform way in which the articles were thematically structured, and in turn suggesting a core thematic pattern for scientific research writing in general. Based on these findings, the authors suggest that future studies compare the thematic structure of the introduction section vis-à-vis the other sections, and investigate the possible factors resulting in such a structure.
Keywords
In his study of the evolution of scientific writing in English, Banks (2008) usefully related social and historical developments surrounding science to the rhetorical conventions that followed. Investigating the discourse of key writers from the Middle Ages to the 18th century—e.g., Newton and the Royal Society, along with new forms of empiricism—Banks showed how the passive voice, the first-person pronoun, nominalization, hedging, the classic introduction-methodology-results-discussion structure, and theme came to characterize scientific writing over the years. This characterization is supported by a wealth of scholarship in this area. Studies on scientific research writing have focused on numerous areas, ranging from the textual—chiefly its rhetorical organization (Bazerman, 1997; Gross, Harmon, & Reidy, 2002; Swales, 1990), linguistic features, and style (Atkinson, 1996, 1999)—to the social, such as the decisions that writers make in response to the comments of editors and reviewers (Myers, 1985).
While this might convey the impression that scientific writing has solidified into an established genre, differences have been found not only across the sciences, but also within the same disciplines. Cargill and O’Connor (2009) noted at least two major variations of the basic AIMRaD (abstract, introduction, methodology, results, and discussion) structure: AIMRaD is usually found in molecular biology articles, and AIM(RaD)C is prevalent in shorter scientific articles, in which each result and its relevant discussion are presented consecutively in a section and then summarized in a separate conclusion. Cargill and O’Connor (2009) also observed that prestigious journals, such as Nature and Science, adapted conventional article sections to suit their own unique publishing aims of enabling even nonspecialist audiences to access scientific discoveries.
These variations may seem daunting to beginning researchers. Indeed, the scientific research publication process has grown into a sophisticated, highly demanding business that novice writers could become easily inundated. Since their inception in 1665—the Journal des Sçavans and the Philosophical Transactions of the Royal Society of London in France and England, respectively—scientific journals have become the most viable platform for scholarly communication (Larivière, Haustein, & Mongeon, 2015). In fact, in the 20th century, journals have increasingly strengthened their position as the main media, not only for disseminating research but also for career opportunities and advancement in academia. As observed by the Guardian, “professional success is especially determined by getting work into the most prestigious journals” (Buranyi, 2017). For these reasons, it is imperative for scientists and new researchers to understand the linguistic, structural norms for producing high-quality papers.
However, as we have seen so far, the rhetorical structure of scientific writing can differ among disciplines and journals. Moving away from such concerns and investigating scientific writing from a different angle, Hallidayan linguists have proposed an intuitive macro structure to shed new light on a neglected aspect of scientific writing. These research efforts, still modest in number, rest on the Hallidayan notion of theme, taken to be the starting point of the message (Halliday & Matthiessen, 2014). An understanding of what scientists select as themes can offer an alternative look at how scientific texts are structured, not in terms of the broader rhetorical moves as in past studies, but in terms of what writers select as the points of departure of their message. Tracing these individual starting points through the text reveals its core, or skeletal, structure, since it is from these starting points that the message in the text is developed. This raises the intriguing question as to whether the core structures of scientific research articles are similar, even though their rhetorical structures may differ. A preliminary study by Leong (2015), for instance, has shown that the way these starting points are developed in the text often produces, at the whole-text level, a vertical development of themes, what he refers to as an anchored development.
As we will show in the subsection on related studies, depicting such core structures requires the use of diagrams. A drawback of using diagrams, however, is that comparisons of such diagrams can at times be difficult and subjective. To avert this problem, Leong (2016) most recently proposed a thematic-density index (TDI)—a simple division of the number of themes by the number of starting points they represent—as a measure to quantify the thematic structure of texts. The TDI, however, was tested only on abstracts from the sciences and humanities, but not full research articles. We will review these and other related studies in greater detail in a later section.
As these efforts are preliminary in nature, and the findings tentative, more work is clearly needed to investigate whether the anchored development observed in Leong (2015) holds true in a specific discipline. Since the TDI has only been applied to abstracts thus far, we also do not yet have a gauge of the TDI for full scientific articles. This article sought to address these concerns by using biology-related research articles from a single journal. The approach was both diagrammatic and quantitative—the former was needed to display the shape of thematic patterns that numbers alone could not fully capture, and the latter was critical for making meaningful statistical comparisons among the diagrams. Specifically, the article addressed the following research questions:
What is the general thematic structure of the articles in the corpus? Do the diagrammatic representations reveal a discernible thematic pattern?
What is the TDI for the articles at the whole-text level?
What are the TDIs for the major sections of the articles?
By examining full-length research articles, we hope to contribute to the growing literature concerning the structure of scientific research writing. With this knowledge, scholars will be guided in organizing and “anchoring” information in a way that adheres to the norms in the major sections of research articles. More crucially, perhaps, we hope to show how the thematic structure of scientific research articles is broadly reflective of the basic function of such articles, that is, to inform the reader of the focus of the research, and what the researchers did and found.
The Hallidayan Framework and Related Studies
Theme and Rheme
The Hallidayan framework lays focus on the functions of language, employing the term metafunction “to suggest that function [is] an integral component within the overall theory” (Halliday & Matthiessen, 2014, p. 31). The framework recognizes the ideational metafunction (the construing of human experience), the interpersonal metafunction (the enacting of personal and social relationships), and the textual metafunction (the organizing of discourse). The textual metafunction brings together the ideational and interpersonal metafunctions, packaging them as a cohesive and coherent whole. As Halliday and Matthiessen (2014, p. 31) note, this metafunction “build[s] up sequences of discourse, organizing the discursive flow and creating cohesion and continuity as it moves along.” Theme belongs to the textual metafunction of language.
The notion of theme can be traced to the influential work of the French linguist Henri Weil (1818–1909). Referring to the flow of ideas, Weil (1844) made the following observation about an important structural division within the clause: There is then a point of departure, an initial notion which is equally present to him who speaks and to him who hears, which forms, as it were, the ground upon which the two intelligences meet; and another part of discourse which forms the statement (l’énonciation), properly so called. This division is found in almost all we say. (p. 29)
This two-part division underscored the clause as being not merely a grammatical unit, but a message-bearing unit. It came to be extensively developed in, first, the work of the Prague Circle linguists, and later, by Michael Halliday. In the Hallidayan framework, the theme is regarded as a clause-internal element and, in English, it is always positioned first in the clause. The theme is glossed as the “point of departure of the message; it is that which locates and orients the clause within its context” (Halliday & Matthiessen, 2014, p. 89).
Different types of themes are recognized in the framework since clause-initial elements can orient the clause in different ways. There are three theme types—textual, interpersonal, and topical themes—reflecting the three metafunctions of language. Of the three, the topical theme is obligatory, and is therefore the most important. In the clause, it is realized by the first participant, circumstantial adjunct, or main verb; it is the first clausal element that carries experiential content. Indeed, the role of the topical theme is so crucial that it is likened to that of an anchor. As Halliday and Matthiessen (2014, pp. 111–112) note, unless the topical theme appears, “the clause lacks an anchorage in the realm of experience.” The other themes (i.e., interpersonal and textual themes) connect textual units or express opinions and degrees of subjectivity, but they are nonobligatory. The thematic portion of the clause thus ends with the topical theme. The elaboration of that topical theme in the remainder of the clause is known as the rheme.
The clause-initial elements realizing the different types of themes are summarized in Table 1.
Textual, Interpersonal and Topical Themes.
Source: Adapted from Halliday and Matthiessen (2014, pp. 105–114).
In the clause, the topical theme may appear alone, or be accompanied by textual and/or interpersonal themes. This study focused on only topical themes since they provide the anchorage in texts; neither textual nor impersonal themes are capable of capturing the major points of departure in the text because they lack experiential content. The thematic analysis of clauses from the corpus is exemplified below. Each example—and others to follow in the rest of this article—is accompanied by the article’s code number, prefixed by T. The topical themes are underlined and in boldface.
(1)
(2)
(3)
The thematic analysis in this study adhered largely to the Hallidayan framework, as exemplified in (1–3) above, but made an exception for clauses with empty grammatical subjects (e.g., existential clauses). In such clauses, provided they were not introduced by circumstantial adjuncts, the complement of the verb was regarded as the topical theme rather than the empty subject. This approach differs from the mainstream Hallidayan framework, which regards the empty subject as the topical theme (Halliday & Matthiessen, 2014, p. 100). However, as Leong (2004) points out, labeling the empty subject as a topical theme runs counter to the framework on definitional grounds. This is because the framework regards the topical theme as a carrier of experiential content, but this would not be theoretically consistent if the topical theme were semantically empty. Examples from the corpus illustrating the thematic analysis of empty-subject clauses are provided below.
(4) . . . and there was
(5) Nevertheless, there was
Thematic Progression
Our understanding of how topical themes are developed through the text owes much to the work of Daneš (1970, 1974) on thematic progression (TP). By tracing the connections between themes and preceding themes/rhemes, he identified various thematic patterns, such as the simple linear TP and the constant TP, both of which are illustrated in Figures 1 and 2 with examples from the corpus (clauses are separated by double parallel lines):
(6)
(7) For example, there are

Simple linear TP, based on the extract in (6).

Constant TP, based on the extract in (7).
Text-based studies including such TP diagrams, however, are few and far between. Most studies offer textual examples and frequency counts of TP patterns (e.g., Hawes, 2015; Williams, 2009); the minority that do contain diagrams show only examples involving short segments rather than the entire article (e.g., Rodríguez-Vergara, 2017). This is likely due to the immense effort required to construct these diagrams, as there is currently no automated system to track each topical theme and link it to elements in the other clauses using lines. In an extended text, the manual positioning of each topical theme in the diagram can very quickly become laborious. Also, without some quantifiable measure, comparing TP diagrams among different texts can often be subjective. In the following section, we review related studies on TP, and the refinements made to shed greater light on scientific research writing in particular.
Related Studies
TP studies in general have involved a range of corpora, including Spanish translations of medical texts (Williams, 2009), linguistics articles (Jalilifar, 2010), and journalistic essays (Hawes, 2015). One of the earliest attempts to include English scientific articles is Dubois’s (1987) work on biomedical texts, in which other TP patterns were observed, such as the gapped pattern, where an existing TP is interrupted by another. Nwogu and Bloor (1991) compared medical texts and journalistic reports, and found that the constant TP occurred more often in the former than the latter, and that medical abstracts displayed an even use of both simple linear and constant TPs.
While such studies have been insightful in providing a tentative look at the thematic structure of scientific texts, a drawback is that they do not contain any TP diagrams of their respective corpora. This omission may cause further insights about the general thematic pattern of the text to be lost. This is because the familiar simple linear and constant TP patterns—and other variations observed by Dubois (1987)—apply to only individual segments of the text. Article-length TP diagrams, on the other hand, capture the thematic shape of scientific research writing, a picture of what the text looks like in terms of its message starting points. It may well turn out that scientific research articles share a particular macro pattern, but this may not be evident without the help of diagrams.
Leong (2015) made some effort to correct this in a study comparing 20 articles in the fields of DNA research and database development. Simplified TP diagrams were used; these relied on only topical themes, rather than topical themes and rhemes (as in Daneš’s original diagrams). The simplification was needed to prevent any cluttering in the diagrams and to show only the development of the topical themes through the text. With the aid of these simplified diagrams at the whole-text level, Leong found a general simple linear progression in the initial part of the article, followed by a major vertical line of themes striding through the rest of the text (see Figure 3). As the vertical line resembled an anchor rode, he termed such a pattern an anchored development. The anchored development, in other words, is the text-level version of the constant TP, and highlights the inclination of writers to rely on certain points of departure to frame their writing.

TP diagram of a scientific article.
Notwithstanding the usefulness of pictorial representations in illustrating the macro thematic structure of research articles, Leong (2016) felt that a quantitative measure would provide greater rigor to the analysis and enable such diagrams to be formally compared. In his study involving 200 abstracts in the sciences and humanities, Leong (2016) proposed a thematic-density index (TDI), which is arrived at by dividing the number of clauses by the number of semantic labels (representing the starting points), as shown in (8).
(8)
The TDI for a text therefore ranges from 1 to C, where C is the number of clauses (and thus, topical themes). At the lower end of the index the TDI is 1, where the number of topical themes equals the number of semantic levels. At the upper end of the index is C, where all the topical themes are represented by one semantic label. The lower and upper limits represent the two extreme thematic structures that a text can adopt at the macro level—a simple linear TP (where TDI = 1) and a constant TP (where TDI = C). The thematic density is lowest in the constant TP and highest in the simple linear TP. In his study, Leong found low TDIs for the abstracts—2.08 for the sciences, and 1.94 for the humanities—suggesting that abstracts in general follow a simple linear TP.
Converting TP diagrams into TDIs this way allows the thematic structures of texts to be statistically compared. As Leong’s (2016) work was restricted to only abstracts, however, there is a pressing need to extend the TDI measure to full research articles. Whether similarities in thematic structures show up in a defined corpus remains now to be seen in this study through both diagrammatic representations of texts and TDIs. The following section provides a fuller account of the corpus used in this study and the analytical approach.
Method
Corpus
In line with the objectives of this study, care was taken to select research articles that were similar in word length and rhetorical structure. This was to allow for fair comparisons to be made among the different articles. We eventually selected articles from the journal Cell for the analysis. This choice was motivated by two reasons. First, the journal had strict word-length and formatting guidelines. The main text of research articles, for instance, had three major sections—i.e., introduction, results, and discussion—and was limited to 55,000 characters, including spaces (Cell, 2017). Second, Cell is a highly prestigious journal; it was ranked first in the fields of biochemistry, genetics, and molecular biology in the latest journal ranking provided by Scimago (2016).
The corpus, comprising 30 articles, totaled 149,229 words. The articles were the most recently available at the time of writing. The analysis covered the entire main text of each article, but excluded the title, abstract, tables/diagrams, and references. The methodology section was also excluded as it was positioned outside the main text. Further, the information in the methodology section was grouped according to the materials used and procedures employed, and did not lend itself easily to thematic analysis.
Unit of Analysis
The unit of analysis was the independent clause, including any dependent clause(s) associated with it. Focusing on only the independent clause allows the analyst to “discern the method of development and thematic progression of a text” more easily, since “the structure of beta [dependent] clauses, including their thematic structure, tends to be constrained by the alpha [independent] clauses” (Fries & Francis, 1992, p. 47). In numerous studies involving long texts (e.g., Jalilifar, 2010; McCabe, 1999; Williams, 2009), such an approach is favored since the inclusion of dependent clauses, some of which are embedded in independent clauses, may complicate the analysis unnecessarily. The articles in the corpus yielded 6,141 independent clauses.
Only the topical themes of independent clauses were identified. Interpersonal and textual themes were omitted as these are optional and contribute little to the main ideas in the text. The corpus was analyzed by all members of the team, with one member double-checking the analysis of the entire team.
Simplified TP Diagrams and Thematic Density
A simplified version of Daneš’s TP diagrams was used in this study. The simplified diagrams displayed only the topical themes to clearly trace their development through the text. Including rhemes would have complicated the diagrams and obscured the broad developmental patterns of the topical themes. The simplified TP diagrams in this study followed the technique employed by Leong (2015, 2016). The topical theme of each independent clause was plotted using the Microsoft Excel program, as follows:
(a) Each row in the spreadsheet corresponded to an independent clause. All the rows were numbered.
(b) The width of each column was narrowed to 1.89 character units. This was done to create cells resembling small squares.
(c) The semantic content of each topical theme was specified on the header row of the spreadsheet, and the cell representing the topical theme was colored black. This procedure was repeated for all the other topical themes in the research article. New semantic labels were added as necessary. Topical themes were grouped under an existing semantic label if they met the following criteria proposed by Martin (1992):
Identity chains are based on co-referentiality, which is realized through pronominal cohesion, instantial equivalence, the definite article and demonstratives (or lexical repetition if the reference is generic). . . . (p. 419)
A partial example of a TP diagram is given in Figure 4. This is based on the first five independent clauses of T07, as listed in (9). The numbers in square brackets refer to the numbers assigned to the independent clauses.
(9) [1]
[2]
[3]
[4]
[5]

TP diagram, based on the extract in (9).
To measure the thematic density of the research articles, we computed multiple TDIs based on the formula first introduced in (8), reproduced below as (10):
(10)
Each research article had four TDIs—one for the entire article, and one for each of the three major sections. Using Figure 4 as an example, if the five clauses constituted the introduction section of T07, the TDI for the introduction section would then be
Statistical Analysis
IBM SPSS Statistics Version 23 was used for analysis of variance (ANOVA). The Games-Howell post hoc test was used for significant ANOVA results. The significance level for all tests was α = .05.
Results and Discussion
Text-Level Thematic Patterns
The mean TDI at the whole-text level was 9.008 (SD = 1.531), ranging from a low of 7.095 to a high of 11.333. The coefficient of variation, CV = 0.170, suggests low variability. Taken together, these numbers provide a glimpse of the thematic density of the research articles in the corpus.
Insofar as TP is concerned, the corpus exhibited a general pattern in which topical themes appear to be clustered around a restricted number of semantic labels. These reflect, and lend support to, the anchored-development pattern of scientific research writing as observed in Leong (2015). Three representative TP diagrams are presented in Figure 5 to illustrate this pattern. In the diagrams, horizontal lines are used to separate the introduction (I), results (R), and discussion (D) sections of each article.

TP diagrams of three representative articles—T06, T12, and T23.
Vertical lines, resembling anchor rodes, are marked with an asterisk. Although they are interrupted at various points, there is enough continuity throughout the text for them to be clearly discernible. These vertical lines appear to be characteristic of the results and discussion sections, rather than the introduction section. As seen in Figure 6, the thematic pattern of the introduction section is closer to the simple linear type.

TP diagrams of the introduction sections of T02, T20, and T28.
This is unsurprising since introduction sections typically serve to establish key concepts and the research problem and objectives, which are then further elaborated in the rest of the article (Swales, 1981, 1990). These naturally lead to a simple linear development of ideas. An introduction section that flouts this general structure is likely to compromise the comprehensibility of this section.
The diagrams presented so far, then, illustrate the basic shape of the articles in the corpus. Broadly, the data suggest that articles begin with a simple linear thematic development in the introduction section before settling into an anchored development in the rest of the text. This in turn suggests that the various sections of the article should have different TDIs; specifically, we expected the TDIs for the results and discussion sections to each be higher than the TDI for the introduction section. Also, as the discussion section serves chiefly to elaborate on the key findings of the research, we did not anticipate that there would be any significant difference between the TDIs for the results and discussion sections. To investigate this further, we turn next to the thematic density of each major section.
TDIs for the Introduction, Results, and Discussion Sections
The mean TDIs for the major sections are summarized in Table 2.
TDIs for the Introduction, Results, and Discussion Sections (N = 30).
Levene’s test showed that equal variances across the groups could not be assumed, F(2, 87) = 4.962, p = .009. In consequence, a one-way Welch ANOVA test was conducted to compare the mean TDIs for the introduction, results, and discussion sections. Differences in the mean TDIs were found to be highly statistically significant, F(2, 54.972) = 230.219, p = 1.9195E–27. The Games-Howell post hoc test further showed that the mean TDI for each section was significantly different from the others: introduction versus results (p = 5.1014E–9), introduction versus discussion (p = 1.8568E–7), and results versus discussion (p = 5.101E–9). The findings also indicated less variability in the TDIs for the results section (CV = 0.139) as compared to the same for the introduction and discussion sections (CV = 0.223).
While the higher TDI for the results section was expected, the considerably lower figure for the discussion section (3.770) caught us by surprise. We had anticipated that the results and discussion sections would not differ by much, but the results presented here showed that the difference between the numbers were highly significant.
There are at least two possible reasons for this difference. The first is the shorter length of the discussion section relative to the results section. The discussion section averaged 45.867 independent clauses, as compared to the 133.333 average for the results section. A smaller number of clauses has the effect of reducing the TDI unless there is a corresponding reduction in the number of semantic labels, which was not evident in the corpus.
The second reason relates to the content of the discussion section. In the absence of a separate conclusion section, the components that one often finds in such a section (e.g., implications of the study, suggestions for further research) must therefore be included in the discussion section. This can sometimes have the effect of introducing semantic labels not previously mentioned in the preceding text, thus lowering the TDI.
A case in point is T09, an article on gene editing. In the discussion section, the authors highlighted two different aspects—on precision/specificity and nuclease activity—as parting advice to the reader. Toward the end of the discussion section, they broadened their writing to include an evolutionary perspective and cited the “Red Queen theory,” which is more accurately a hypothesis about the need for organisms to reproduce, adapt, and evolve in order to survive. These inclusions constitute four new semantic labels, and collectively contribute to the lowering of the TDI. The independent clauses exemplifying the above are given in (11); the TP diagram representing the entire discussion section of T09 is shown in Figure 7.

TP diagram of the discussion section of T09.
(11) [118]
[119] and
[135]
[137]
With the lower TDIs for the discussion section, one might expect the thematic development here to be more fragmented as compared to the results section. We see some traces of this fragmentation in the TP portion for the discussion section in Figure 5c. Notwithstanding this, the anchored pattern is still evident in this section. For instance, despite the insertion of four new semantic labels in the discussion section, Figure 7 still displays a discernible vertical line emanating from the preceding two sections.
A further point of interest to note is that while the TDIs for the introduction and discussion sections do not appear to differ by much (2.593 vs. 3.770), the difference is nevertheless statistically significant (p = 8.4205E–7). The primary reason for this is that the TDIs for all but one of the discussion sections in the corpus were higher than those for the introduction sections. The spread of TDIs for each major section is given in Figure 8.

Spread of TDIs across the research articles.
The data thus far show that writers tend to retain one or a small number of topical themes beyond the introduction section. This naturally raises the questions about the semantic content of these message points of departure and what the norms detected in this study imply for scientific research writing. We address these issues in the following section.
First-Person Pronoun as the Anchor
In all but one of the articles in the corpus, the anchored development—marked by the asterisks in Figure 5—corresponded to the first-person pronoun “we.” The use of “we” was most pervasive in the results section. A close examination of the occurrences of “we” in the corpus suggests that its use is most likely attributed to the unique function of each section. As noted in Swales (1990), each section of a typical research article has a specific function and set of rhetorical moves. For instance, the introduction section, which typically moves from general to specific issues, first contextualizes the study before announcing the authors’ research topic and goal(s). In consequence, the use of “we” increases as the writing moves toward the end of the section (Carciu, 2009; Martínez, 2005; Swales, 1990). The example in (12) illustrates authorial presence in four topical themes in the tail end of the introduction section (the section itself contained 20 topical themes).
(12) [15] Here,
[16]
[17] To probe whether SLC7A5 is essential also in humans,
[18]
[19] and (
It is in the results section that the “we” pronoun was most pervasively used, contributing to the vertical lines we observed in Figure 5. The results section is often the longest section of a research paper; in this section, authors do not only present their own findings but also give reasons for them, frequently referring to particular choices in the methodology (Martínez, 2005; Nwogu, 1997). We see this in (13), where the pronoun “we” appears repeatedly to signal not merely the authors’ actions, but also their evaluation, observation, and conclusion.
(13) [55]
[56]
[57]
[58] To assess the impact of leucine and isoleucine deficiency on mRNA translation in the brain of Tie2Cre;Slc7a5fl/fl mice,
[59] Specifically,
[60]
[61] In contrast,
[62]
[63] To test whether abnormal regulation of 4EBP1 and eIF2α in mice lacking Slc7a5 leads to changes in translation efficiency in the brain,
[64] Importantly,
[65]
Indeed, the pronoun “we” in the corpus was almost always coupled with verbs associated with tangible outcomes (e.g., “we examined,” “we investigated,” “we used,” “we denote”), conveying a sense of the authors’ intervention, empiricism, and authority.
The fewer occurrences of “we” in the discussion section (and hence the more dispersed TP pattern in this section) is attributable to the numerous ways in which research outcomes tend to be explained and discussed (Nwogu, 1997). In this section, a variety of themes can be introduced to highlight the significance of the findings in the context of related studies. In the present corpus, the pronoun “we” was frequently used to contrast the author’s work with those of previous studies, as seen below in (14). This lends support to a similar observation made by Martínez (2005, p. 186) on the authors’ “need to restate their own results for interpretation or comparison with the work of others.”
(14) [148] It is generally accepted
[149] yet
[150]
[151]
The inclination to include the first-person pronoun in research writing may thus be seen as an assertion of authorial presence, and indicate the willingness of authors to profile or position themselves within the scientific community (Hyland, 2002; Kuo, 1999; Tarone, Dwyer, Gillette, & Icke, 1998). Through the pronoun “we,” the authors claim responsibility for both methodological decisions and the expression of opinions or arguments in the rest of the article. Martínez (2005) suggests that this tendency to articulate in the first person could well represent a trend in scientific research writing toward “authorial intervention, argumentation, and personalization” (p. 182). In thematic terms, then, the authors themselves become the points of departure for the message in the text.
Implications of the Study
The dominance of the topical theme “we” in the results section seems to reflect the general move toward a greater use of the active voice in scientific research writing in recent years (Banks, 2017; Leong, 2014; Millar, Budgell, & Fuller, 2013). Some scientific journals, in particular, explicitly state their preference for the active voice to be used: Nature journals prefer authors to write in the active voice (“we performed the experiment . . .”) as experience has shown that readers find concepts and results to be conveyed more clearly if written directly. (Nature, 2017) Use active voice when suitable, particularly when necessary for correct syntax (e.g., “To address this possibility, we constructed a λZap library . . .,” not “To address this possibility, a λZap library was constructed . . .”)” (Science, 2017)
Even though Cell does not have such specific instructions, the official blog of Cell Press, CrossTalk, contains posts that advocate the use of the active voice. For instance: Passive sentences are often needlessly long, and writing concisely improves the flow and readability of a text. For example, the active version of the following sentence cuts out a whopping eight words. Passive: The free energy released from this series of oxidation-reduction reactions is combined with production of an electrochemical gradient that can be used to drive ATP synthesis. (26 words) Active: This series of oxidation-reduction reactions releases free energy that combines with an electrochemical gradient to drive ATP synthesis. (18 words) (Evans, 2015)
The argument is that the active style of writing would enhance the effectiveness and readability of academic texts (Seoane, 2006; Weiss & Newman, 2011). The passive voice, in contrast, can sometimes result in writing that is unwieldy, wordy, and unclear (Amdur, Kirwan, & Morris, 2010; Conrad, 2018; Leong, 2014), as illustrated below using two examples from the corpus (the passive verb phrase is italicized):
(15) [11]
[12] but
(16)
In (15), it is unclear whether the mechanism is uncharacterized by scientists who have yet to describe it, or by a lack of chemical reaction. In (16), the agent of the antagonization is not disclosed.
Leong (2014, p. 12), however, cautions that “[t]here are situations where the passive voice may in fact be more appropriate.” In the present corpus, the use of the passive voice appears to serve a strategic function in the introduction section to highlight the writers’ research focus as the topical theme. There is a gradual buildup toward this focus before the switch to the use of the thematic “we” in the results section. This helpfully illustrates the core, stripped-down approach to scientific research writing—i.e., to inform the reader (a) what the research focus was and (b) what the researchers did and found. These passive-voice clauses typically occur in the early to middle parts of the introduction and, by the end, are overtaken by clauses expressing the study’s methodology, results, and conclusions with the first-person pronoun “we” in thematic position. The active-voice clauses, coming after the passive-voice ones, profile the authors’ roles in the current work, which either develop previous findings or differ from them. We see this progression in (17) below, extracted from the tail end of the article’s introduction section:
(17) [15] To enrich for intermediates,
[16] and
[17] As applied to the SSU (Sashital et al., 2014; Sykes and Williamson, 2009),
[18] Compared to the SSU,
[24] There are
[25] and to better understand the pathways in vivo,
[26]
[27] Here,
From the above excerpt, we see that while passive verbs are generally used to establish the extent of the research community’s knowledge of ribosome assembly ([15] & [18]), active verbs articulate the authors’ comments on that knowledge ([16] & [17]). In particular, the personal pronoun + active verb forms (“we developed” in [25], and “we show” in [27]) assert the authors’ autonomy and action as researchers in moving the knowledge forward.
Our present study, then, draws attention not only to the basic thematic shape of scientific research writing, but to a well-known aspect of grammar—the voice of a verb—and its importance in facilitating clear writing. In academia, where the currency, innovation, and significance of a study’s procedures, results, and implications often determine its publication in a top-tier journal, researchers and academics could be taught the strategic use of the active and passive verbs in their research papers to enhance their writing. University writing courses that incorporate sessions on the use of the active and passive voices could also go some way to familiarize students and junior researchers with the expectations and norms of scientific research writing. Finally, studies by Ding (2002) and Hyland (2001), which examined the way scientists and research writers conceive their scientific roles and textual self, could provide the relevant framework within which novices of scientific research writing, as well as scientists of different nationalities participating in scientific dialogues in the English language, could grapple with the technicalities of the active and passive verbs.
Conclusion
This study sought to investigate the thematic structure of biology-related articles and the TDIs for these articles at both the whole-text and section levels. The findings are summarized as follows:
(a) Research Question 1: What is the general thematic structure of the articles in the corpus? Do the diagrammatic representations reveal a discernible thematic pattern?
The articles in the corpus displayed a simple linear pattern in the introduction section, followed by an anchored development in the following sections. This basic thematic shape was observed in all the articles of the corpus. In terms of semantic content, the anchoring in the articles involved the first-person pronoun “we.” The pervasive use of “we” was likely due to a combination of factors, including the specific functions of each section, the assertion of authorial presence, and the increased use of the active voice in scientific research articles.
(b) Research Questions 2–3: What is the TDI for the articles at the whole-text level? What are the TDIs for the major sections of the articles?
At the whole-text level, the mean TDI was 9.008. The mean TDIs for the introduction (2.593), results (7.095), and discussion (3.770) sections were significantly different from each other. As a simple measure, the TDI was found to be useful in quantifying and highlighting the difference in the thematic density among the different sections, allowing for such differences to be objectively compared using statistical tests. This cannot be accomplished with TP diagrams alone.
The lower mean TDI for the discussion section, as compared to the results section, was most likely due to the former’s shorter length and additional issues raised as concluding remarks. Nonetheless, the anchored development was retained in the discussion section.
The results lend further support to the observations of Leong (2015) regarding thematic structure, and the work of numerous scholars regarding authorial presence in scientific research writing, offering some tantalizing indications of the thematic norms in such texts. The consistency in the results, as witness the low coefficients of variation in the TDIs, underscores the uniform way in which the articles are thematically structured. It brings to the fore a pattern that appears to be robust across the articles, independent of their research foci. This poses an interesting contrast to the variability noticed in the rhetorical structure of scientific research writing. As Bazerman (1988, p. 319) aptly notes, “the [textual] features we may associate with genre are hardly contained in their formal appearances on the page.” The findings of this study, however, suggest that at a more basic level—at the level of the message starting points—scientific research articles may actually be quite similar to each other. In reference to academic writing in general, Murray (2009) made the following point: Perhaps we can agree that academic writing is not infinitely various; there are recurring patterns and dominant norms and forms within and across disciplines. (p. 61)
The thematic shape observed in the corpus of this study is perhaps one such recurring pattern in the case of scientific research writing.
The tentative conclusions reached in this study are limited by the modest number of articles analyzed. Future studies on scientific research writing will need to include a larger database of articles from a range of disciplines. Another major limitation of this study is its use of articles sharing an identical rhetorical structure (in order to compare the TDIs for each major section). Given the findings presented here, and since it is unlikely for research articles in general to be so uniformly structured, further studies will need to view the article in terms of only two parts—the introduction section, and all subsequent sections as a major unit. This will go some way to investigate the unique function of the introduction section vis-à-vis the other sections, and whether the anchored development noted here is truly common in scientific research writing, regardless of the discipline or subject matter. A further area for investigation is the impact of author decisions on thematic structure. This is because scientific research writing is never an isolated activity; published articles are the outcome of negotiation among writers, editors, and reviewers. It would therefore be of interest to examine how this social aspect affects the thematic shape of the article as it moves from the initial to the final version. In so doing, we enrich our understanding of not merely the thematic structure of scientific research writing, but the possible factors resulting in such a structure.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
