Abstract
Using the British Academic Written English corpus, this study focuses on the use of grammatical complexity features in university level texts written by first language (L1) English writers to demonstrate knowledge and perform other specialized tasks required of advanced academic writers. While the primary focus of the analysis is on writing development from first-year undergraduate to graduate students, we also consider interactions with discipline and genre. The study goes beyond most previous work on grammatical complexity in writing by investigating the use of phrasal as well as clausal features. The results show that as academic level increases, the use of phrasal complexity features in writing also increases. On the other hand, the use of clausal complexity features in student writing, particularly finite dependent clauses, decreases as academic level increases. Results further indicate that the extent of the differences across level is mediated by discipline and genre, reflecting patterns observed in research on disciplinary variation in professional academic writing.
Keywords
The development of academic writing skills is widely recognized as a major educational concern, particularly at the university level (e.g., Zhu, 2004). As students progress through their college years, they are asked to move from more general academic writing tasks to more specialized, discipline-specific writing (Nesi & Gardner, 2012). Advanced academic writing is widely recognized as an elaborated form of discourse that is grammatically complex. For example, Wright (2008, p. 292) describes student-written chemistry lab reports as “elaborated forms of discourse” in which information is arranged “into more complex and explicit representations.” Indeed, most readers would agree that the language in the biology research article below would be perceived as “complex”:
Traditionally, elaboration and grammatical complexity have been associated with embedded clauses (e.g., Carter & McCarthy, 2006; Huddleston, 1984). Multiple researchers have argued that academic writing is characterized by such complexity, particularly a greater use of subordinate clauses (e.g., Brown & Yule, 1983; Chafe, 1982; Hughes, 2005). However, when we look at the grammatical structure of Text Excerpt 1 (main verbs are
Previous research on the grammar of different contexts of language use (known as registers or genres) 1 involving large, authentic collections of spoken and written language has shown that academic writing typically contains structures such as those found in Text Excerpt 1 (Biber, 2006; Biber & Gray, 2016; Biber, Johansson, Leech, Conrad, & Finegan, 1999).
Biber et al. (1999) provide extensive evidence that most types of finite dependent clauses (those traditionally associated with grammatical complexity) occur more frequently in spoken than in written language, and are especially less common in academic writing than in other types of writing. A typical example below from conversation illustrates this difference (verbs are
Yeah, I’
Text Excerpt 2 is in direct contrast to Text Excerpt 1, which illustrated the extensive use of phrases embedded into noun phrases.
Building on this body of research, recent corpus-based studies have challenged traditional notions of complexity, arguing that there are two fundamentally different types: phrasal and clausal (Biber, 1992; Biber & Gray, 2010, 2011, 2016; Biber, Gray, & Poonpon, 2011). And while face-to-face conversation frequently uses many clausal grammatical devices that are traditionally considered to be “complex” and “elaborated,” those same features are not particularly characteristic of academic writing. This research has argued that the use of phrasal structures relates to the context of writing as compared to speech. Academic writing in particular is produced in circumstances where language is carefully planned and edited, detailed and specific, and produced in a concise format.
Researchers have also found important differences within academic writing, showing that the use of complexity features varies across parameters like academic discipline and specific registers/genres of academic writing. For example, science research articles written by specialists for other specialist readers tend to use phrasal complexity features more extensively than those in the social sciences and humanities (see Biber, 2006; Biber & Gray, 2013b, 2016; Egbert, 2015; Gray, 2015).
These differences across discipline and register/genre can be discussed relative to the communicative characteristics of these varieties. For example, Hyland (2008, p. 16) positions humanities as a field in which “persuasion is more explicitly interpretative and less empiricist,” thus explaining the relatively less frequent use of phrasal structures. The implication is that any discussions of complexity in academic language production have to consider disciplinary and genre differences.
Much of the research discussed so far has focused on expert use, thus documenting the characteristics of advanced academic writing. However, by studying student texts, we can investigate how novice writers develop over time to produce such complex language. Since 1965, a large body of research has investigated the relationship between grammatical structure and writing development. Following Hunt (1965), much of this research has focused on the analysis of T-units (an independent clause and all associated dependent clauses), and has sought to link increases in T-unit length to grade level (e.g., Beers & Nagy, 2011). This methodology, however, places emphasis on clausal complexity and is largely unable to capture phrasal complexity (for a fuller discussion of T-units in writing development research, see Biber, Gray, & Poonpon, 2011, 2013; Biber, Gray, & Staples, 2014).
Given the well-documented patterns of complexity variation described above, and mixed results from T-unit based research, there is a growing call for the inclusion of phrasal features along with the more traditional clausal features in research on academic writing development for both first (L1) and second (L2) language writers (e.g., Biber et al., 2011, 2013; Crossley, Weston, McLain Sullivan, & McNamara, 2011; Norris & Ortega, 2009; Ravid & Berman, 2010). To date, a few studies have included a limited number of phrasal features in their analyses of student writing and have found them to be an important measure of academic writing development (Crossley et al., 2011; Haswell, 2000; Lu, 2011). Even fewer studies, however, have considered the influence of discipline or genre in relation to developmental patterns of grammatical complexity in student writing (Lu, 2011; Beers & Nagy, 2009). The results of this research indicate that student writers vary in their use of phrasal complexity features across level, but that those differences cannot be fully understood without accounting for the influence of register/genre.
Building on previous research on register variation and development in L1 and L2 writing, Biber et al. (2011) hypothesized a developmental sequence of grammatical complexity features intended to enable researchers to fully characterize the development of advanced academic writing. In contrast with conversational language, which is acquired naturally and at an early age, writing is acquired much later through explicit instruction. Thus, it is apparent that even L1 writers must develop the use of the phrasal complexity styles found in specialist academic writing.
The developmental sequence proposed by Biber et al. (2011) is based on the premise that novice academic writers begin with the clausal complexity features most common in speech, and then gradually develop proficiency in the dense use of the phrasal complexity features associated with specialist academic writing. Accordingly, a developmental sequence can be summarized as a progression through five stages that involve development along two grammatical parameters:
2
Grammatical form: finite dependent clause Syntactic function: clause constituents (e.g., direct object or adverbial)
nonfinite dependent clause
dependent phrase
noun phrase modifiers
As the discourse style relies more on dependent phrases (rather than dependent clauses) and noun phrase modifiers (rather than clause constituents), the writing becomes more compressed (rather than more elaborated). Specific structural features can be placed along these continuums, depending on their grammatical and syntactic characteristics. Recall Text Excerpt 2 above:
This excerpt includes three finite dependent clauses (Clauses 1, 2, and 3) and one nonfinite clause (Clause 4). While Clauses 1, 2, and 4 function as clause constituents, the relative clause in (3) functions as a noun phrase modifier.
This can be contrasted with Text Excerpt 1 above, from academic writing, which includes only dependent phrases, all of which function as noun modifiers:
Studies like Lu (2011), Parkinson and Musgrave (2014), and Biber et al. (2014) provide empirical support for the claim that phrasal features increase as L2 writers develop their academic writing skills in preparation for college and graduate-level work. In particular, Parkinson and Musgrave (2014) tested the developmental progressions hypothesized in Biber et al. (2011). Their findings provide evidence that advanced L2 writers develop in their use of phrasal features as hypothesized.
We hypothesize that, for L1 writers as well, phrasal complexity develops most noticeably during university years, much later than researchers have normally considered. This late language development occurs during college because students need to use increasingly complex and sophisticated language in order to convey precise and specialized meanings within disciplinary writing.
No previous research has explicitly investigated this developmental sequence in university level L1 writers. However, Ravid and colleagues have examined some of these specific phrasal features used in later academic writing development, lending support to the claim that writing development continues at the college level, even for native English speakers (e.g., Ravid & Berman, 2010).
The implication of the research summarized above is that writing development during the college years is not restricted to learning new genres or specific disciplinary expectations. Rather, development also occurs in the underlying grammatical structures that writers use as they move toward the discourse styles of successful professional academic writers. Importantly, this research has shown that both L1 and L2 writers experience such development over the course of university education. While there is a growing body of research focusing on such grammatical development in L2 writers (e.g., Biber et al., 2014; Parkinson & Musgrave, 2014; Taguchi, Crawford, & Wetzel, 2013), very little research has focused on L1 English writers.
The goal of the present study is to comprehensively investigate phrasal and clausal patterns of language development for L1 writers at different levels of academic study. Because previous research has demonstrated the importance of genre and discipline in the extent to which phrasal and clausal complexity features are used, the study also considers these two factors. Specifically, we address the following research questions:
Is the general hypothesis that L1 writers develop in the use of grammatical complexity across the university years supported by corpus evidence?
How are the patterns of development mediated by genre and discipline differences?
To what extent do the patterns of development observed in the corpus conform to Biber, Gray, and Poonpon’s hypothesized stages of development for grammatical complexity features?
To meet these aims, we examine the development of phrasal and clausal complexity features in L1 writing at four levels of study (first-year undergraduate, second-year undergraduate, final-year undergraduate, and graduate level). We base our analysis on the British Academic Written English corpus (BAWE; Nesi, Gardner, Thompson, & Wickens, 2008-2010), a collection of student academic writing with representation across genres and disciplines.
Method
In order to investigate development in grammatical complexity across academic level, discipline, and genre of university student writing, we conducted three separate analyses. The first focused on differences across level and discipline, the second on differences across level and genre, and the third across level, discipline, and genre. In the following sections, we first describe the corpus and subsets of the corpus used for the three analyses, then the grammatical complexity features used to measure academic writing development, and finally the statistical procedures we undertook to analyze this development.
BAWE Corpus
This study is based on an analysis of a subset of BAWE. BAWE (Nesi et al., 2008-2010) was developed from 2004 to 2007 to represent the breadth of writing produced by university students at four levels: first-year undergraduate, second-year undergraduate, final-year undergraduate, and graduate. It should be noted that the sample is cross-sectional rather than longitudinal. The corpus was collected from four universities in England, and the texts are equally distributed across disciplines (Arts and Humanities, Life Sciences, Physical Sciences, and Social Sciences). They were all positively assessed by subject area tutors, with the equivalent of “distinction” or “merit” in the British system, which, according to Nesi and Gardner (2012) correspond to “A” and “B” grades, respectively, in the U.S. system. For this study, we targeted the L1 English writers only since we were interested in seeing whether hypotheses about the variation in grammatical complexity would be borne out for L1 writers. L1 English writers account for approximately two thirds of the papers in BAWE, for a total of 1,948 papers and 4,480,371 words.
To examine the impact of discipline, we use the disciplinary groups identified by Nesi and Gardner (2012) for the BAWE corpus (Arts and Humanities, Social Sciences, Life Sciences, and Physical Sciences). These disciplinary groups were identified by the creators for the over 30 disciplines represented in the corpus (Gardner & Nesi, 2013; Nesi & Gardner, 2012). For efficiency and for the sake of comparison, we grouped the texts from Life and Physical Sciences together to represent science writing as has been done in other recent studies (e.g., Biber & Gray, 2016). We refer to disciplinary groups as “disciplines” in this article. Table 1 shows the distribution of papers across disciplines and levels for our first analysis.
Initial Corpus for Analysis.
We also use the genre families identified by the compilers of the BAWE corpus. A genre family represents a group of related genres (e.g., informative and persuasive essays both belong to the “Essay” genre family). The genre families included in the BAWE corpus were identified using a system that draws on principles of Systemic Functional Linguistics. Nesi and Gardner (2012) used both top-down and bottom-up approaches to identify the genres that were then grouped into thirteen genre families for the corpus: Case Studies, Critiques, Design Specifications, Empathy Writing, Essays, Exercises, Explanations, Literature Surveys, Methodology Recounts, Narrative Recounts, Problem Questions, Proposals, and Research Reports. It bears noting that these genre families are not equally well defined. For example, Literature Surveys seems to be a much clearer category than Essays. We thus acknowledge that these categories also contain variation within them. The reader is referred to Nesi and Gardner (2012) for a complete description of the methods they used for classifying texts into genre categories.
In this article we refer to genre families as “genres,” 3 and we focus our analysis on the following four genres: Essays, Critiques, Case Studies, and Explanations (see Nesi & Gardner, 2012, pp. 15-16). These four genres were chosen because they are relatively well represented across levels and disciplines (see Table 2). Both Explanations and Critiques require students “to demonstrate/develop understanding of the object of study,” while Critiques also focus on “the ability to evaluate and/or assess [its] significance” (Nesi & Gardner, 2012, p. 37). Case studies ask students “to demonstrate/develop an understanding of professional practice through the analysis of a single exemplar,” while Essays focus on “the ability to construct a coherent argument and employ critical thinking skills” (Nesi & Gardner, 2012, pp. 38-40).
Four Genres Across Academic Level.
There were only 8 Explanations at Level 4, so these were excluded from the analysis.
Finally, to investigate the interaction between level, discipline, and genre, we chose two genres that were relatively well represented across both level and discipline. While some of the genres are very frequent in the corpus (e.g., Essays), others have a fairly limited representation (e.g., Literature Reviews). In addition, some genres are very frequent in certain disciplines but are highly infrequent in other disciplines. For example, there were 89 design specifications in Life and Physical Sciences but only 1 in the Arts and Humanities and 3 in the Social Sciences. Other distributions were less extreme, but still need to be taken into consideration when analyzing the results. Explanations and to some extent Critiques and Case Studies were more frequent in the Life and Physical Sciences. Essays were more frequently found in Arts and Humanities and to a lesser extent Social Sciences when compared to the Life and Physical Sciences. As illustrated by these examples, the reason for this imbalance is based on the fact that certain genres are more frequently used within university level writing in general and that certain genres are used more frequently within particular disciplines. Table 3 shows the distribution of texts across level, discipline, and genre used for the third analysis.
Essays and Critiques Across Genre, Discipline, and Level of Study.
Linguistic Features Included in the Analysis.
Grammatical Complexity Features
The grammatical complexity analysis was conducted based on a wide range of grammatical features that have been identified in previous empirical research on register variation (see discussion in the Introduction) and, specifically, features included in the study of developing grammatical complexity (Biber et al., 2014). These include clausal structures (i.e., finite adverbial clauses, that and WH complement clauses controlled by verbs, and clausal coordinating conjunctions) and phrasal structures used for nominal modification and elaboration (i.e., nouns, attributive adjectives, nouns as nominal premodifiers, nominalizations, of genitives modifying nouns, and other prepositional phrases). Finally, we also include “intermediate” features from Biber et al. (2011). These structures are mixed or intermediate on the two parameters:
finite clause types functioning as a constituent in noun or adjective phrases (adjective complement clauses, noun complement clauses, relative clauses)
phrases (nonclausal) functioning as a constituent in a clause (adverbs, linking adverbials)
nonfinite clause types (to clauses controlled by verbs, adjectives, and nouns; -ing clauses controlled by verbs)
In addition, we consider passive voice as an intermediate feature. Although passive voice, found in the verb phrase, is clearly a clausal feature, it serves a number of the same functions as phrasal features, particularly allowing writers to follow the informational focus of the discourse and compress detailed information rather than elaborate on the performers of an action.
This three-way distinction among clausal, phrasal, and intermediate structures is based on the system of grammatical types laid out in Biber et al. (2011, pp. 19-21). In total, 23 linguistic features were included in our analysis, based on the research conducted in Biber et al. (2011), Biber and Gray (2013a), and Biber et al. (2014). The entire corpus was automatically annotated for lexicogrammatical features using a tagger and two additional programs. 4 The tagger has a very high accuracy rate for most features (from 95% for formal written writing to 90% to L2 student writing). We also ran a number of post-tag programs and calculated reliability on selected features. 5 The final recall and precision rates all approached 90% or better, so we did not undertake manual fixtagging.
Statistical Analysis
To investigate the differences in grammatical complexity measures across level of study, discipline, and genre, we computed three factorial models with the corpus divided in different ways. As mentioned above, this was done to account for the imbalanced nature of the corpus. We used the General Linear Model function in SPSS for the statistical analyses, setting a Bonferroni-adjusted experiment-wise alpha criterion of p < .002 (that is, .05 / 23 = .002). For those models that were significant, we also considered interaction effects. First, we computed factorial ANOVAs with two independent variables: Discipline with three levels (humanities, social sciences, and hard sciences) and Level of Study with four levels (first-year undergraduate, second-year undergraduate, final-year undergraduate, and postgraduate). To investigate the impact of genre as a mediating factor, we computed factorial ANOVAs with two independent variables: Genre with four levels (Essays, Critiques, Case Studies, and Explanations) and Level of Study with four levels (first-year undergraduate, second-year undergraduate, final-year undergraduate, and postgraduate). Finally, to examine the impact of discipline, genre, and level together, we computed factorial ANOVAs with three independent variables: Genre with two levels (Essays and Critiques), Level of Study with four levels and Discipline with two levels (1—Arts, Humanities, and Social Sciences; 2—Life and Physical Sciences).
Results
Development across Level of Study
The main focus of the analysis in this study was to determine whether clausal, phrasal, and intermediate features differed systematically across level of study. As can be seen from Table 5, most of the grammatical features were significant overall for the model, and many of the features were significant across level. It should, however, be noted that the R2 values, a measure of effect size, are fairly low for many of the features. As indicated in the framework discussed above, the overall hypothesized trend was for writers to show movement away from finite dependent clauses toward nonfinite dependent clauses and then to dependent phrases.
Summary of the Factorial Models for Four Levels of Study and Three Disciplines for 23 Grammatical Features Associated With Complexity.
The first aspect of this hypothesis—that phrasal features would increase across levels—was supported by the analysis. The general trend was that all phrasal features increased by level (nouns, nominalizations, of genitives, premodifying nouns, attributive adjectives, and prepositional phrases). However, from Figure 1, we can see that the increases took place at different rates. For premodifying nouns, there was a steady increase from Level 1 to Level 4. For attributive adjectives, the use increased at all levels, but more dramatically from Level 3 (last year of undergraduate) to Level 4 (first year of graduate). Nominalizations saw a rise from Level 1 to 2 and then another from Level 3 to 4. Finally, of phrases actually saw a decrease from Level 1 to 2, and then a small but steady increase to Level 4. These findings clearly support the claim that phrasal features are an important aspect of academic writing development. They also show that different phrasal features may exhibit development at different levels of study.

Phrasal features across Level of Study.
Premodifying nouns are the most robust measure of development with largest effect size (R2 = .346) of any feature. Premodifying nouns are more frequent at higher levels, becoming c. 40% more frequent from Level 1 to Level 4 (compared to only about an 18% difference for attributive adjectives and nominalizations, which also increase in frequency at higher levels). This finding is particularly interesting, as these noun-noun sequences represent one of the most distinctive features of the grammar of academic writing today: Premodifying nouns are nearly two times more frequent in academic writing than in other informational writing (newspaper prose) and about three times more common than in conversation (Biber & Gray, 2010, 2016). While other phrasal features are also more frequent at higher levels, premodifying nouns represent the most distinctive difference between writing at Level 1 and Level 4, and demonstrate that more advanced writers are moving toward norms associated with advanced academic writing.
The second hypothesis was that clausal features, particularly finite dependent clauses, would decrease across levels; this hypothesis was also supported by the analysis. The clearest trends for decrease across clausal features were seen for finite dependent clauses, particularly finite adverbial clauses. Figure 2 shows the steady decline of finite adverbial clauses across level. It also displays the decline of other finite dependent clauses (that and WH complement clauses). However, as can be seen, these last two features displayed a more dramatic decrease from Levels 3 and 4, again indicating that development may occur at particular points during university study, or may accelerate at specific times.

Clausal features across Level of Study.
Figure 3 shows the “intermediate” features identified by Biber et al. (2011). Within these we can separate out (a) phrasal constituents of clauses (e.g., linking adverbials), (b) finite clause types functioning as a constituent in a noun phrase, and (c) nonfinite clause types. Most of the phrasal constituents of clauses and finite clause types functioning as a constituent in a noun phrase (i.e., noun + that-clauses, WH relative clauses, and that relative clauses) decreased by level. Importantly, though, these features show a more marked decrease between Level 3 and Level 4 (last year of undergraduate and first year of graduate). In fact, there is a slight increase in the use of WH relative clauses between Level 1 and Level 2. Of the nonfinite clause types, only verb + to-clauses decreased across level. Thus, notably, the nonfinite clause types that modified nouns or adjectives did not show differences across level.

Intermediate features across Level of Study.
Taken together, it seems that development does take place at the undergraduate level of writing, and more notably from undergraduate to graduate-level writing. There is a general trend of a decrease in finite clauses and an increase in phrasal features. However, there was not much evidence to suggest an increase in the use of nonfinite clauses.
Development across Disciplines
The development across levels was mediated by discipline, as was hypothesized. As can be seen from Table 5, many of the features that were significant for level were also significantly different across disciplines. In addition, some features showed not only main effects but interaction effects for
The phrasal features increased by level regardless of discipline (the exception being noun + of phrases in the Social Sciences). While premodifying nouns were used more frequently in the Life and Physical Sciences, nouns, nominalizations, and attributive adjectives were used most frequently in Social Sciences, and Arts and Humanities used more of genitives and prepositional phrases. Figure 6 displays these trends for the greater use of individual phrasal features depending on the discipline.
Premodifying nouns showed a significant interaction effect for
To some extent, the same patterns of decrease in use of clausal features can be seen across disciplinary groups as well as level, with finite clauses used most frequently in Arts and Humanities, followed by Social Sciences, and used least in Life and Physical Sciences. Figure 5 shows that verb + that-complement clauses increase during the undergraduate years in Arts and Humanities, followed by a steep decline between the last year of undergraduate and the first year of graduate school. Figure 4 also plots the trends for two intermediate features—noun + that-complement clauses and WH relatives—showing that these features are also more frequent in Arts and Humanities, but decline across levels for all disciplines.

Phrasal features across Level of Study and Discipline.

Finite clauses across Level of Study and Discipline.
Passive voice verbs (finite and nonfinite) and linking adverbials follow different patterns of use from the dependent clause features: Life and Physical Sciences use notably more passive structures than Arts and Humanities, and Social Sciences use the most linking adverbials (see Figure 6). Passive voice verbs displayed the most striking difference across disciplines. Writers did not use passive voice significantly different across levels, but that likely is related to their wide variation in use across the disciplines. Only Arts and Humanities writers showed a slow but steady increase in the use of passive voice. Passive voice increased in the Life and Physical Sciences for the second and third years of university, but then declined slightly in graduate-level writing. In the Social Sciences, the use decreased from Levels 1 and 2, then increased in Level 3, and then decreased from the last year of undergraduate to the first year of graduate school. Nonfinite passives showed little movement during undergraduate years but declined for all three disciplines in Level 4. Linking adverbials followed the trends seen across level in each of the disciplines, but with Social Sciences using more of this intermediate feature.

Intermediate features across Level of Study and Discipline.
Taken together, these findings for
These findings are in line with results from studies like Biber and Gray (2016) and Gray (2015), which demonstrate that while all academic writing has a high reliance on phrasal complexity features compared to nonacademic registers, the extent of that reliance varies across disciplines: science writing relies on phrasal complexity features to the greatest extent, followed by social sciences, and then humanities texts. The fact that these patterns are also reflected in student writing across these disciplines demonstrates the students’ movement toward disciplinary norms.
Development across Level of Study and Genre
Table 6 shows the findings across level and genre, as well as the interaction between these two factors. As mentioned in the method section, when interpreting these results, we must bear in mind that to a large extent the genres are not equally distributed across disciplines, and thus the disciplinary differences seen in the last section parallel the genre differences reported in this section.
Summary of the Factorial Models for Four Levels of Study and Four Genres (Case Studies, Critiques, Essays, Explanations) for 23 Grammatical Features Associated With Complexity.
This trend is generally found for the phrasal features, most of which followed disciplinary patterns. For example, Explanations and Case Studies, which were found most in the Life and Physical Sciences, used more premodifying nouns than Essays and Critiques (see Figure 7). Of genitives and nominalizations were used the most in Essays, which were found primarily in Arts and Humanities and Social Sciences. However, attributive adjectives were used most frequently in Case Studies, which were found most commonly in Life and Physical Sciences.

Phrasal features across Genre and Level of Study.
Essays, which were found most frequently in the Arts and Humanities, and to a slightly lesser extent the Social Sciences, were generally characterized by greater use of clausal features (adverbs, linking adverbials, finite adverb clauses, verb + that-clauses, noun + that-clauses, noun + to-clauses, and WH relative clauses). The use of passive voice also lined up with disciplinary preferences, with the most frequent use in Explanations, a genre that was found most commonly in Life and Physical Sciences. These trends can be seen in Figures 8 and 9.

Passives, verb + that-complement clauses, WH relative clauses, and noun + that-complement clauses across Genre and Level of Study.

Finite adverbials, desire + to-clauses, and that relative clauses across Genre and Level of Study.
Notably, passive voice verbs show dramatically different trends across the genres, particularly Case Studies, where we see a decrease from Level 1 to Level 2, a sharp increase to last year of undergraduate and then a decline to the same level as first-year undergraduates for first-year graduates. Critiques and Essays show mixed patterns but there is a general increase in passive voice between the final year of undergraduate and the first year of graduate school. Only Explanations show a steady increase in the use of passive voice. It should be noted that while Explanations and to a lesser extent Critiques are more common in Life and Physical Sciences than Social Sciences, Case Studies are found more evenly in these two disciplines at the undergraduate level but are overwhelmingly more common in Life and Physical Sciences at the graduate level.
There are additional features that did not simply follow disciplinary patterns. For example, the genre of Case Studies showed the greatest use of finite adverbial clauses in the undergraduate years, more than either Essays or Critiques. This is surprising given that finite adverbial clauses were overall more frequent in the Arts and Humanities, and Essays and Critiques were much more common in this discipline than Case Studies, which were most frequent in the Life and Physical Sciences. Case Studies also showed the greatest use of desire verb + to-clauses, another feature found most frequently in Arts and Humanities writing. The more frequent use of these clausal structures seems tied to the particular functions of this genre, rather than larger disciplinary preferences.
Development Across Level of Study, Discipline, and Genre
Finally, an examination of
Summary of the Factorial Models for Four Levels of Study, Two Disciplinary Macro-Groups (Arts, Humanities, and Social Sciences and Life and Physical Sciences), and Two Genres (Critiques and Essays) for 23 Grammatical Features Associated With Complexity.

Premodifying nouns across Level, Discipline, and two Genres (Critiques and Essays).
Discussion
The results of this study show that L1 English university writers follow the hypothesized developmental progression presented in Biber et al. (2011). As level increased, writers used fewer finite dependent clauses and more dependent phrases; they used fewer clause constituents and more noun modifiers. Intermediate features (e.g., nonfinite clauses, linking adverbials) remained relatively more stable across levels. In the discussion, we provide examples to illustrate these trends as well as the ways in which they are mediated by discipline and genre.
Text Excerpt 3 and 4 illustrate development from Level 1 to Level 4. In the following excerpts, the bolded text indicates finite clauses while the underlining highlights phrasal features. Intermediate features are marked with italics. Nominalizations are marked with capital letters. In the Level 1 text (Text Excerpt 3), we see six finite clauses and one nonfinite clause.
Liberal historians had argued
More specifically, the types of finite clauses are verb complement clauses (e.g., that the number of . . .), adverbial clauses (e.g., although the number of participants . . .), and one noun complement clause (e.g., that the Bolsheviks had very little support). While phrasal features are still an important characteristic of this passage, we see only one premodifying noun (revisionist) and few attributive adjectives. Cause and effect relations as well as contrastive statements help the author to refine the argument in relation to the previous literature. The predominance of clausal features allows for a relatively more elaborated style of writing, focusing on describing relationships and comparisons rather than presenting a great deal of abstract information.
Text Excerpt 4 from Level 4 illustrates how the concise packaging of information increases across academic level. In Text Excerpt 4, we see that there are longer noun phrases (e.g., triumphant consequence of transnational capitalism) premodifying nouns (e.g., military expenditure) and nominalizations (e.g., capitalism, bureaucratization, corruption). In addition, there is only one finite clause (a relative clause), but still quite a few nonfinite clauses. The argumentation becomes much more informational and, for the most part, distanced from the author and even from source material.
Firstly,
The findings for academic level are consistent with previous investigations of development in academic writing, for both L2 writing (e.g., Biber et al., 2014; Parkinson & Musgrave, 2014) and L1 writing, for features such as number of modifiers per noun phrase (Crossley et al., 2011), and number of attributive adjectives and noun phrase length (Ravid & Berman, 2010). The present study adds to these findings by more systematically showing this development across a wide constellation of specific linguistic features. These findings also align with more general investigations of advanced academic texts that indicate the importance of prepositional phrases and noun phrases when compared with conversation (e.g., Biber, 1988, 2006): Even Level 1 texts rely on phrasal complexity along with clausal complexity. However, this reliance increases as the level increases.
Text Excerpts 5 and 6 show the development of grammatical complexity in a very different discipline and genre: Engineering Critiques. In the Level 1 Engineering Critique (Text Excerpt 5), we see fewer finite clauses than the Level 1 History Essay. We also see more phrasal complexity when compared to the Level 1 History Essay (Text Excerpt 3). The writer is using both attributive adjectives (e.g., mathematical model, good level, cantilevered shaft, long shaft) and nouns as premodifiers (e.g., carbon steel bounds, metal behavior). The use of passive voice is slightly increased from that found in the History texts, but it is limited to the last sentence, as a means to distance the writer from the stance expressed in this critique.
The mathematical model for the simply support provides data to a good level of
In Text Excerpt 6 from Level 4 a variety of phrasal features are used, including attributive adjectives (e.g., atomic, accurate, unique, efficient), premodifying nouns (e.g., force, deformation, material, nanometer, stress, strain), and postmodifying prepositional phrases (e.g., of an atomic force microscope . . . , of a wide range of materials . . .). There is only one finite dependent clause (as they are incrementally strained).
With regard to its properties it is quite similar to the machine designed by Marsh and can be structurally compared to it in many aspects. However in this device the
The information provided in this passage is, relative to the previous passage, much more compressed. As a result, the writer is able to provide detailed descriptions of the machine as well as an evaluation. Passive voice is used in both finite and nonfinite clauses not only to distance the writer from the stance expressed in the critique but also to place emphasis on processes and the physical objects discussed (machine designed by Marsh, materials as they are incrementally strained).
The use of phrasal features by writers at higher academic levels in part reflects increased information packaging demands as writers gain disciplinary content knowledge and are expected to convey this knowledge concisely for their audiences. In both History and Engineering, this disciplinary content is packaged within dense noun phrases (e.g., real cold war position of powerlessness; macroscopic stress strain behavior). Clausal features, on the other hand, make relationships between ideas more explicit, and also allow writers to more overtly express their stance or the stance of others. In both Level 1 excerpts above, the greater use of clausal features allows the writers to more explicitly review others’ arguments (in the case of the History Essay) and display a step by step analysis of strengths and weaknesses (in the case of the Engineering Critique). At higher levels, students increasingly use writing as a vehicle to make their own arguments and report on their own research, which is likely one reason for the shift away from the use of extended clausal structures. Even within a given genre and discipline, communicative expectations shift as writers advance in their degrees.
The findings for the “intermediate” features identified by Biber et al. (2011) are also important to note. While finite clauses (even those modifying or complementing noun phrases) declined across levels as did phrasal features that were constituents of clauses (e.g., adverbs), nonfinite clauses (except for –ed passive postnoun modifiers) remained steady across levels. Thus, although these clauses play an important role in distinguishing between extremely disparate genres such as conversation and academic writing, they may not play much of a role in the overall writing development of L1 university level students.
As can be seen from the comparisons across the History Essays and Engineering Critiques, the factors influencing the use the phrasal and clausal features are related not only to level but also the genre and discipline of these texts. This study shows that the developmental trend from clausal elaboration to phrasal complexity is mediated by both discipline and genre. Arts and Humanities texts used more clausal complexity and less phrasal complexity even at higher academic levels when compared with Life and Physical Sciences texts. In addition, Essays used more of some of the clausal features (e.g., verb complement clauses) and fewer of some of the phrasal features (e.g., premodifying nouns) when compared with Critiques.
In part, these differences reflect the need for writers outside of the hard sciences to provide more extended explanations of and justifications for arguments and conclusions. In addition, the phrasal features used most frequently by Life and Physical science writers (premodifying nouns) allow students to communicate a great deal of technical information more concisely. Phrasal features that are more common in the Arts and Humanities (prepositional phrases) along with relatively greater use of clausal features allow writers to provide a more elaborated discussion of historical, social, and cultural events and ideas and the relationships among them.
Disciplinary variation for both clausal and phrasal features has been found in advanced professional academic texts. Biber and Gray (2016) and Gray (2015) show that while all professional academic writing relies on phrasal complexity, science writing relies on phrasal complexity to a greater extent that humanities texts. This study shows that these trends are followed by developing academic writers, and that as level increases, the writers are moving toward these discipline-specific academic norms.
However, it is also important to take into account the fact that particular genres tend to be produced in certain disciplines. Genres vary in the degree to which a writer needs to display detailed, technical knowledge about a procedure or problem. Thus, faculty in Life and Physical Sciences, particularly at lower levels, assign Explanations, which are designed to help students display knowledge about, for example, biological, chemical, physical concepts and processes. They also ask students to write Methodology Recounts to report on established procedures for empirical studies in order to eventually apply that knowledge (at higher academic levels) to their own research. Such communicative tasks require students to package information much more densely (at least at first) than writers in the Humanities and Social Sciences. On the other hand, Essays are assigned much more frequently by faculty in the Arts, Humanities and Social Sciences. As discussed above, Essays focus much more on the ability to construct a coherent argument than the genres found in the Life and Physical Sciences. Evidence for variation across genres has been found in previous studies (e.g., Beers & Nagy, 2009), but this study shows these trends across academic genres and for a wide constellation of specific linguistic features.
Conclusion
Taken together, the results of this study show that there are clear developmental trends in the academic writing of L1 university level writers. Clausal features (particularly finite clauses) are used more predominantly in lower level texts while phrasal features are increasingly used as academic level increases. These changes run counter to previous notions of complexity by showing (a) phrasal complexity is increasingly important as writers develop throughout their university education and (b) clausal complexity is less important and in fact declines during the undergraduate years and into the graduate years. Instead of academic writing becoming more elaborated (e.g., Wright, 2008) student writers actually use more compressed phrasal structures and more simple clausal structures.
Notably, the trends across academic level are mediated by genre and discipline. First, writers in Arts and Humanities disciplines and, to a lesser extent, the social sciences use more clausal features than writers in the Life and Physical Sciences. The degree to which writers in particular genres and disciplines employ phrasal features varies, even at lower levels. Thus, even from their first assignments in a given discipline, developing writers recognize and/or are constrained by disciplinary conventions for displaying knowledge. These constraints may be more overtly signaled to writers through prompts or pedagogical instruction, or more implicitly gathered through exposure to reading in their discipline. That being said, it is also clear that writers are becoming more socialized to their own discipline, and using more discipline-specific characteristics as they go through their college years and into graduate-level study.
From a writing studies perspective, the results of this study have important implications for the conceptualization of academic writing and writing development. Our findings offer additional evidence to support recent claims that linguistic features of academic writing development are best measured using phrasal structures, such as premodifying nouns, rather than clausal structures, such as T-units. In order to accurately measure development and complexity in academic prose, it is crucial that we focus our attention on the key grammatical characteristics of this domain. In this regard, our study has provided clear evidence that phrasal compression is an important predictor of writing development in the university. Phrasal expressions are more economical and allow for writers to package information more densely. However, they also cause challenges for academic readers and writers, as the relationships between a noun and its modifiers is less explicit than that of a clause (e.g., an orientation which is theoretical and which focuses on the analysis of systems vs. a systems, theoretical orientation). Thus, while phrasal compression is a key feature of academic writing development, we also need to determine the relationship between the use of these linguistic features and other aspects of writing development, such as how these features relate (more explicitly) to rhetorical and communicative success.
A final implication of this study relates to curriculum designers. For those working within a writing across the curriculum (WAC) model, it can be seen that while academic writing follows similar broad trends across genre and discipline, important differences exist in the features employed within disciplines, and even within genres within the same discipline. As WAC scholars know, generalizations about academic writing need to be mediated by the details of more specific contexts. By examining the linguistic means through which writers in different disciplines and genres communicate, we can make better recommendations for ways to help writers acclimate to the writing of a particular disciplinary community.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
