Abstract
In the past decades, there has been a surge in interest in the study of language complexity in second language (L2) research. In this article we provide an overview of current theoretical and methodological practices in L2 complexity research, while simultaneously framing these within the broader scientific interest into the notion of complexity. In addition to focusing on the role of complexity in L2 research, we trace how language complexity has figured in formal theoretical and typological linguistics. It is argued that L2 complexity research has often adopted a reductionist approach to the construct, both in terms of its definition and its operationalization. As such, previous L2 research has often confused related but conceptually distinct and operationally separable notions, such as relative and absolute complexity, and it has overemphasized syntactic and lexical forms of complexity at the expense of complexity related to morphology or linguistic interface phenomena. We then discuss a collection of five empirical studies which react to several of these issues by highlighting hitherto underexplored forms of complexity, adopting an explicitly cross-linguistic perspective or by proposing novel forms of L2 complexity measurement.
I Introduction
In the past decades, the notion of complexity has raised significant interest in the language sciences, including formal theoretical linguistics, historical linguistics, language evolution studies, comparative linguistics and language typology, computational linguistics, psycholinguistics and neurolinguistics (for an illustration of the variety of approaches to language complexity, see for instance the contributions to Kortmann and Szmrecsanyi, 2012; Newmeyer and Preston, 2014). More recently, second language (L2) researchers from both theoretical and applied orientations have been eager to test the relevance of this notion to L2 acquisition, development, processing and structure.
Theoretical approaches to language complexity are typically motivated by the attempt to evaluate and refine theories of language structure, language evolution, or of the human language processor, or to describe and compare linguistic constructions and systems across languages and language varieties. Applied approaches to language complexity (e.g. in clinical linguistics, speech pathology and language disorders, language testing, language education and pedagogy, speech technology) have been primarily motivated by the attempt to develop accurate, reliable and objective metrics of language performance, proficiency and development; metrics that can be used for evaluating the language skills of individual language users or groups of language users, and the changes therein over time, across contexts or across conditions.
In spite of its abundant use in all walks of science in general and in linguistics in particular, there is no single, generally accepted definition or metric of complexity (Mitchell, 2009). In its most general sense, complexity has often been interpreted as a quantitative notion to do with the number and variety of parts or elements in an entity or system, and the relationships and interactions between the constituent parts (Rescher, 1998: 1; Simon, 1996: 183–84). Yet linguists have translated this general definition into a plethora of meanings and operationalizations, encompassing such seemingly disparate notions as recursion, ellipsis, movement and dependency distance, but also factors such as redundancy, markedness, and cognitively or developmentally related aspects, such as a language item’s processing costs or acquisitional timing (Dahl, 2004; Ellis, 2009; Housen and Simoens, 2016; Miestamo, 2008; Szmrecsanyi, 2004; Szmrecsanyi and Kortmann, 2012). At the core of many definitions of language complexity lies a tension between relative complexity (also called difficulty, subjective or user-related complexity) and absolute complexity (also called inherent, objective or structural complexity) (Dahl, 2004; Han and Lew, 2012; Housen and Simoens, 2016; Karlsson et al., 2008; Kusters, 2008; Szmrecsanyi and Kortmann, 2012). Relative complexity is generally understood as a property of language phenomena which are acquired late or which are cognitively taxing, with simple language phenomena being acquired early in development and/or taking up few cognitive resources in language processing and use (see also DeKeyser, 2005; Goldschneider and DeKeyser, 2001; Housen and Simoens, 2016). As such, relative complexity is frequently equated to cognitive complexity or difficulty (e.g. Bulté and Housen, 2012; Miestamo, 2009; Szmrecsanyi and Kortmann, 2012).
In contrast, absolute complexity refers to an inherent property of language constructions, systems or samples of language production and more closely approximates the general definition given above. It has to do with the internal formal structuring of linguistic units or systems, in terms of the number and variety of their constituent components and the elaborateness of their interrelational structure (see Miestamo, 2009; Pallotti, 2015; Rescher, 1998).
In this introductory article, we will first outline the principal ways in which the construct of complexity has figured in general linguistics by focusing on its role in both formal and functional approaches to language, and how this has contributed, or can contribute, to investigations of L2 complexity. We will then discuss a number of outstanding issues in L2 complexity research that provide the rationale of this special issue, alongside an overview of the contributions.
II General linguistic approaches to complexity
1 Complexity in formal theoretical linguistics
The first place to look for a general theory of language complexity may be formal theoretical linguistics, in particular generative theories of grammar, which have enquired into the nature and sources of language complexity since the 1950s. The multiplicity of complexity-related notions found within such theories – from representational complexity (Roberts and Roussou, 2003; Van Gelderen, 2009, 2011) and computational complexity (Chomsky, 1957, 1991, 1995; Chomsky and Miller, 1963) to derivational complexity (Jakubowicz, 2003, 2011; Mobbs, 2008; Trotzke and Zwart, 2014) and processing complexity (Hawkins, 1994, 2004) – reflects how interpretations of complexity vary from theory to theory (see, for instance, the contributions to di Domenico, 2017; Newmeyer and Preston, 2014). Moreover, formal approaches to grammar tend not to have a general theory of language complexity, only accounts of how specific linguistic phenomena relate to each other in terms of complexity. Nor is there a generally accepted metric in formal linguistics for quantifying and comparing the overall complexity of different grammars. Instead, a wide array of syntactic complexity measures is currently available to capture the complexity of different components of grammars and their respective outputs, such as the Terminal-to-Nonterminal-Nodes Ratio (Chomsky and Miller, 1963; Frazier, 1985), the Immediate Constituent-to-Word Ratio (Hawkins, 2004), the Dependency Distance Metric (Gibson, 2000), and the Derivational Complexity Metric (Jakubowicz, 2003, 2011; Jakubowicz and Nash, 2001).
Notwithstanding the absence of a central theory of complexity or agreed-upon measures to evaluate complexity, formal theoretical linguistics has still provided detailed accounts of particular grammatical phenomena in terms of complexity (e.g. accounts of movement constraints such as subjacency restrictions), which often assume a close link between absolute complexity and relative complexity (see di Domenico, 2017: 1; Hawkins, 2007: 8). But despite the theoretical precision they provide (Hawkins, 2014), these accounts of complexity have largely remained outside the purview of mainstream L2 complexity research, though some have been applied to second language processing and acquisition (e.g. Callies, 2008; Filipovic and Hawkins, 2013; Slabakova, 2014; Slavkov, 2011, 2014).
2 Complexity in functional and typological linguistics
In formal theoretical linguistics (especially in the generative paradigm), the question of complexity differences among languages does not arise because the complexity of individual languages is seen as determined by the invariant universal mechanisms underlying human language in general (Chomsky, 1957; Trotzke and Zwart, 2014). In contrast, cross-linguistic complexity differences have long been at the heart of functionalist and usage-based linguistics, particularly in relation to language typology where complexity has served as a yardstick for comparing languages and for examining diachronic and synchronic variation (Arnold et al., 2000; Baechler and Seiler, 2016; Baerman et al., 2015; Dahl, 2004; Givón, 2009; Kusters, 2003, 2008; McWhorter, 2007, 2011; Miestamo, 2008, 2017; Nichols, 1992; Rohdenburg, 1996; Sampson, Gil and Trudgill, 2009; Szmrecsanyi and Kortmann, 2012). Current typological interest in language complexity, which often adopts an absolute perspective on the notion, can be contextualized within the frame of the ‘equi-complexity hypothesis’, a long-standing theory which claims that all languages share an equal degree of total absolute (or structural) complexity since they need to satisfy the similar communicative needs of their respective speech communities. Complexity differences between languages in one domain of the language (e.g. in morphology) are hypothesized to be compensated elsewhere (e.g. in syntax) through complexity trade-offs (e.g. Fromkin et al., 2010; Hockett, 1958; Hudson, 1981).
While a strict interpretation of the equi-complexity hypothesis has recently been refuted on both empirical and methodological grounds (e.g. Ehret and Szmrecsanyi, 2016; Fenk-Oczlon and Fenk, 2008), typological linguists have still expressed an interest in charting the intricate relations between various parts of the language system, both from a synchronic and a diachronic perspective. Empirical studies have highlighted the variable and evolving nature of language complexity (see Sampson et al., 2009), demonstrating how language contact and L2 acquisition shape certain cross-linguistic differences in overall complexity (e.g. Bentz and Berdicevskis, 2016; Lupyan and Dale, 2010; McWhorter, 2001, 2011; Trudgill, 2011). McWhorter (2011), for example, argues that all languages accrue complexity throughout their history, but that in high-contact languages (e.g. English, Spanish, German) widespread adult L2 acquisition disrupts this process and functions as a source of grammatical simplification. In contrast, languages spoken in isolated communities are more likely to maintain their accumulated complexity or even develop further complexity (McWhorter, 2011; for an alternative view see, for example, de Groot, 2008).
As the above discussion indicates, typological linguistics has been explicitly interested in measuring the complexity of entire grammars of languages or parts thereof, resulting in a range of global and local complexity measures. For instance, in an attempt to assess the overall complexity of grammars, McWhorter (2001, 2007) focuses on overspecification (number of communicatively ‘redundant’ grammatically expressed semantic or pragmatic distinctions), structural elaboration (number of rules mediating underlying forms and surface forms), and irregularity. Other typologists, however, doubt whether it is possible to compare the global complexity of entire languages, since there are no objective, nonarbitrary criteria for measuring complexity across different language dimensions, let alone across different languages (Croft, 2003; Deutscher, 2009, 2010; Miestamo, 2008; Sinnemäki, 2014). This position is aptly illustrated by Jackendoff and Wittenberg (2012) when they write: ‘To take about the simplest possible case: What is more complex, a language with forty phonemes and two morphological cases, or a language with twenty phonemes and six cases? How would one decide?’ (p. 71).
It is, however, both possible and relevant to measure and compare the local complexity of languages at the level of their sub-systems, such as the case system, tense system, numeral system, or the verb’s argument structure (see Sinnemäki, 2011), for example by counting the occurrences of ‘complexity indicators’ (Shosted, 2006: 3) of a given linguistic system (e.g. the size of a language’s phoneme and syllable inventory, the number of phonological alternations or the number of inflectional categories). Phonology and particularly morphology are the domains where typologists have traditionally made such complexity measurements and comparisons (e.g. Baerman et al., 2015; Bentz et al., 2016; Maddieson, 2006, 2009; McWhorter, 2001, 2007; Nichols, 2007, 2009).
Critics of this approach point to the fact that the units of measurement are descriptive and intuitive and that there is no principled way for selecting, out of the myriad linguistic properties that could be measured, the relevant complexity indicators (Dahl, 2004; DeGraf, 2001). A more recent, alternative approach has therefore turned to independently motivated notions of complexity from information theory. Two types of information-theoretic complexity have been applied to measure the complexity of linguistic systems: Kolmogorov complexity (Kolmogorov, 1965; see Bane, 2008; Ehret, 2014, 2017; Ehret and Szmrecsanyi, 2016), which relies on text deformation and compression, and effective or Gell-Mann complexity (Gell-Mann, 1995; see Moscoso del Prado Martín et al., 2004; Mufwene et al., 2017; Sinnemäki, 2014), focusing on the notion of entropy (for a critical discussion of the application of both types of complexity to language, including their limitations, see Dahl, 2004; Pellegrino et al., 2007).
Current typological studies of language complexity are firmly usage-based and often rely on computational and natural language processing (NLP) tools for the analysis of both massive and smaller, parallel and non-parallel corpora from different languages (see Bane, 2008; Bentz and Berdicevskis, 2016; Dahl, 2007; Juola, 1998; Li and Vitanyi, 2004; Mayer and Cysouw, 2014; Moscoso del Prado Martín, 2011). In addition to ‘empirical complexity measures’ (i.e. measures of complexity calculated using data from actual text samples produced in different languages), typological research has also used ‘descriptive’ or ‘typological’ complexity measures, that is, measures obtained from descriptive grammars of the sample languages. An example of the latter are the measures derived from the descriptions in The world atlas of language structures (WALS; Dryer and Haspelmath, 2013; see below).
As the following sections will bear out, L2 research has relied on the notion of language complexity for some time, but it has mostly done so independently from typological or cross-linguistic considerations or from developments in general linguistics. One of the goals of the current special issue is to present a number of studies which are situated at the intersection of these different approaches, drawing on the insights and techniques developed in typological approaches to language complexity to further the field of second language acquisition.
3 Complexity in L2 research
Complexity has a long history in the field of L2 research. Already in the 1970s and 1980s, the notions of ‘complexification’ and its antonym, ‘simplification’, figured prominently in early theoretical models of L2 acquisition (e.g. Andersen, 1983; Meisel et al., 1981; Schumann, 1978). Skehan (1989) proposed a model of L2 acquisition which included complexity as one of three basic dimensions (with accuracy and fluency), in terms of which learners’ L2 performance, proficiency, and development could be investigated. At that time, L2 complexity was also given its working definition, still widely used today, as the range of forms (i.e. items, structures, patterns, rules) available to a learner and as the degree of sophistication of these forms, reflecting the above-made distinction between, respectively, absolute and relative complexity (see Ortega, 2003: 492, 2012; Wolfe-Quintero et al., 1998: 69, 101).
As Ortega (2012) remarked, until recently L2 complexity was not studied for its own sake. Rather, complexity has typically served in L2 research as a dependent or secondary variable (often along with accuracy and fluency) to demonstrate the effects of other independent or primary variables on second language acquisition (SLA) (e.g. the effects of different types of instruction, task features and task conditions, or learning contexts). As an independent or primary research variable, complexity has been the focus of research in studies on the effects of L2 instruction (see Spada and Tomita, 2010) and in studies that adopted a Dynamic Systems or Usage-based perspective (e.g. Spoelman and Verspoor, 2010; Vyatkina, 2012; Vyatkina et al., 2015). One strand of research has been concerned with finding objective, quantitative ways to capture the multifaceted and often non-linear nature of the language learning process by charting the development of selected complexity phenomena (e.g. Bulté and Housen, 2012; Larsen-Freeman, 2006; Pallotti, 2009, 2015; Verspoor et al., 2012; Vyatkina, 2012). Another related line of studies has extended these aims by explicitly searching for measures representing proficiency and focuses on language features which are expected to be linearly or closely related to proficiency (e.g. Han and Lew, 2012; Lambert and Kormos, 2014; Ortega, 2012).
Complexity in L2 research has been measured either subjectively, through rating scales or, more commonly, through the use of objective quantitative measures, echoing the trend in first language (L1) acquisition research (Unsworth, 2008). Particularly in studies in which complexity figures as a dependent variable, these quantitative complexity measures have often served as indicators or proxies for other constructs, such as L2 proficiency, L2 development and L2 quality. Early studies used a handful of global syntactic and lexical complexity measures to compare samples of (mostly written) language collected from learners at different proficiency levels or stages of development. Higher values of these measures, such as the mean length of T-unit or lexical type-token ratios, were associated with higher proficiency or later stages of development (for reviews, see Ortega, 2003; Wolfe-Quintero et al., 1998). As the field evolved and with the advance of automated complexity measures and NLP tools (e.g. TAASC, T-Scan, Coh-Metrix), 1 the repertoire of complexity measures has exponentially grown to include a wide variety of not only global measures but also more fine-grained measures targeting more specific and linguistically sophisticated or developmentally advanced features (e.g. ratios of rare words, relative clauses or of noun premodifiers). More sophisticated analyses, including those inspired by methods from Dynamic Systems Theory (e.g. Spoelman and Verspoor, 2010), have also yielded a more nuanced picture of L2 complexity and its development, indicating important moderating effects of learner, task-related and contextual factors (see Kuiken et al., forthcoming). These recent studies further show that L2 use can be complex in different ways at different developmental stages or proficiency levels, and that growth in one domain or measure is likely to be accompanied by a plateau or decrease in another. Furthermore, studies that have explored both global and specific L2 complexity phenomena have found predominantly linear trends in the former but more complex developmental patterns in the latter.
III Core issues and recent developments in L2 complexity research
Recent advances notwithstanding, empirical L2 research on language complexity has produced many inconsistent and inconclusive findings (see reviews in Bulté and Housen, 2012; Norris and Ortega, 2009; Ortega, 2012). This fact, in combination with the growing attention to L2 complexity as a primary variable, has prompted researchers to take a more critical look at L2 complexity as a construct and as a research variable (Bulté and Housen, 2012; Norris and Ortega, 2009; Ortega, 2003, 2012; Pallotti, 2009, 2015), concluding that L2 complexity is still ill-defined as a construct and many of its operationalizations and measures have been ‘applied with little or no reflection about their theoretical underpinnings and issues of construct validity’ (Pallotti, 2015: 118).
1 Deconstructing/reconstructing L2 complexity
The problematic status of the L2 complexity construct in L2 research is arguably due to its complex (i.e. multidimensional, multi-facetted and multilayered) nature and its resultant polysemy (Bulté and Housen, 2012, 2014; Ortega, 2012; Pallotti, 2009, 2015). The term complex(ity) has been equated with a seemingly disparate variety of terms such as late(r) acquired, (more) advanced, (more) developed, (more) proficient, (more) sophisticated, (more) difficult, (more) elaborate, (more) embedded, (more) frequent, long(er), rare(r), (more) marked, (more) diverse, rich(er), better, and (more) mature (Bulté and Housen, 2014). The question is whether all these terms can be used interchangeably because they refer to the same underlying construct, or whether they refer to conceptually distinct and analytically separable constructs that may or may not be empirically (cor)related (Bulté and Housen, 2012, 2014; DeKeyser, 1998, 2005; Pallotti, 2015).
Researchers such as Pallotti (2015) have therefore called for a simpler approach to defining and operationalizing L2 complexity to avoid the risk of terminological confusion and circular reasoning (see also Bulté and Housen, 2012, 2014; Ortega, 2012). Such an approach first entails that the notion of L2 complexity must be clearly distinguished from possibly related but conceptually and analytically separable constructs such as L2 proficiency (‘more proficient’), L2 development (‘later acquired’) and L2 quality (‘better’). Second, it is crucial that the distinction between relative complexity (or difficulty) and absolute (or structural) complexity used in other disciplines is also maintained in SLA research (DeKeyser, 2005). In L2 research, relative and absolute complexity are often conflated, leading to tautological statements (e.g. ‘an (absolute) complex structure is a structure which is (relatively) complex (or: difficult)’) and to circular argumentation (e.g. when ‘complex L2 features’ are defined as ‘features that are acquired late’ while at the same time the late acquisition of a particular L2 feature is explained by its complexity).
The importance of complexity as difficulty is well acknowledged in psycholinguistic approaches to language acquisition (DeKeyser, 2005; Housen and Simoens, 2016). When cast in cognitive terms, the study of relative complexity is closely related to the fundamental issue of learnability (see Gregg, 2001). It aims to answer key questions such as what makes L2 learning in general difficult, why some L2 features are more difficult to learn than others (either in general or learner-individually), and why some language features develop before others in L2 learning. As such, the study of L2 relative complexity can contribute to a transition theory of SLA (Gregg, 2003).
A review of the literature suggests that two types of factors determine whether a given L2 feature, and by extension L2 learning as a whole, is more or less cognitively taxing (Bulté and Housen, 2012; Housen and Simoens, 2016). Objective determinants of L2 difficulty are learner-independent properties of the L2 features themselves, and potentially make a given feature more or less difficult for all learners. Objective factors include most notably an L2 feature’s saliency, itself a function of, amongst others, a feature’s input frequency, its redundancy, or its L1–L2 similarity, as well as its absolute linguistic complexity. Subjective or learner-dependent determinants of L2 difficulty, on the other hand, include individuality variables that contribute to whether an individual learner experiences the L2 learning task as more or less difficult, as influenced by factors such as aptitude and memory capacity, L1 background and level of L2 proficiency. Ultimately, it is the interaction between these objective, feature-related factors and the subjective, learner-related variables that determines L2 difficulty and, hence, L2 learning outcomes.
In comparison to relative complexity, there has been little reflection on what constitutes and determines absolute complexity in an L2. In L2 research, absolute complexity has been associated with the length or size of linguistic units (words, phrases, clauses, sentences, T-units), with the range, variety, richness or diversity of items in a linguistic system or domain, and with properties that refer to the composition and hierarchic organization of linguistic units (e.g. embedding, subordination). More generally, the study of absolute complexity relates to the what of SLA and can contribute to a property theory of SLA (Gregg, 2003) in the sense that it can provide an account of the significant properties of the underlying linguistic systems that L2 learners develop at various stages of the L2 acquisition process. The question, and challenge, is how this general notion of absolute L2 complexity can be further specified in an objective, non-intuitive and theory-neutral way, without any preconceived ideas or theoretical assumptions about when, how and why absolute complexity increases or remains constant in the process of acquisition, in order to allow for operationalization in empirical research designs for the description and comparison of different L2 systems and varieties.
To reiterate, a structurally more complex feature or system (absolute complexity) is not necessarily more difficult to learn or process (relative complexity), nor are difficult features necessarily more structurally complex. 2 It is a core objective of empirical L2 research to demonstrate whether, to what extent and under what circumstances absolute complexity and difficulty are related in practice. Indeed, these questions contribute to the theoretical relevance of the study of complexity in L2 research.
2 Measuring L2 complexity
As many SLA studies forgo explicitly defining the construct of L2 complexity, the meaning of the construct is essentially provided by the operationalizations and measurements used in the literature. As mentioned, most L2 studies rely on objective, quantitative metrics to measure L2 complexity, of which there are many (see Bulté and Housen, 2012; Ellis and Barkhuizen, 2005; Ortega, 2003; Wolfe-Quintero et al., 1998). The recent advance of automated tools has considerably enlarged the available arsenal of available complexity measures (e.g. Coh-Metrix provides more than 100 measures tapping into constructs related to language complexity). However, reviews of measurement practices in L2 research (Bulté and Housen, 2012, 2014; Norris and Ortega, 2009; Pallotti, 2009, 2015) have identified several important gaps and imbalances and highlighted the often reductionist approach to complexity measurement in L2 research, which often relies on only one or two measures targeting lexical diversity or syntactic elaboration at the level of clause-linking or the sentence but ignores other lexical phenomena (e.g. collocations) and other syntactic levels (e.g. the phrasal and clausal level; for an overview of current research, see Kuiken et al., forthcoming) while completely eschewing the domains of morphological and phonological complexity. This practice is in striking contrast to functionalist and typological studies of complexity, where phonological and particularly morphological complexity have been key foci of investigation. Also, complexity at the interfaces between linguistic domains, such as lexical-grammatical phenomena, are underinvestigated in L2 research. This disregard for morphology, phonology and linguistic interface phenomena in L2 complexity research is surprising. These aspects are not only crucial components of a comprehensive account of second language structure, use and development, but they also represent a major learning challenge and therefore potentially represent important indices of L2 development and indicators of proficiency level.
Furthermore, and again in contrast to typological research, previous L2 research has also paid scarce attention to the potential impact of cross-linguistic factors on the manifestation and development of complexity in a second language. Hence, little is known about whether measures tapping into complexity at various domains of language structure relate to linguistic development and proficiency in similar or different ways across languages.
IV Overview of the special issue
The brief review of the conceptual and methodological practices in linguistics and in L2 complexity research presented in this introductory article highlights the need for further conceptual clarification and operational refinement in L2 complexity research. The five contributions to this special issue showcase multiple approaches as to how this can be achieved. They examine how language complexity in L2 acquisition is influenced by factors that go beyond more traditional variables such as learner proficiency or task complexity. For instance, a number of contributions take inspiration from existing comparative and typological studies, by adopting a cross-linguistic perspective to complexity and examining the impact of complexity configurations of the L1 and L2 (Brezina and Pallotti, 2019; De Clercq and Housen, 2019; Van der Slik et al., 2019). They also extend traditional conceptualizations of complexity as syntactic or lexical complexity by focusing on other, hitherto neglected forms and manifestations of complexity, in particular morphological complexity (Brezina and Pallotti, 2019; De Clercq and Housen, 2019; Van der Slik et al., 2019) and instances of complexity that arise at the interface between syntax and lexis (Paquot, 2019). More generally, the various contributions all endeavor to explicitly define and operationalize complexity following the absolute approach discussed above. The adoption of a definition of complexity that is independent of subjective factors allows these studies to shed new light on the relation between complexity and the learners’ level of proficiency, its development and learnability.
This special issue also addresses several of the methodological challenges of L2 complexity research outlined above. Several contributions propose new measures of complexity that rely on automation and natural language processing techniques in order to represent the recently highlighted forms of complexity or to reflect a more nuanced understanding of the construct. For example, Van der Slik et al. (2019), building on earlier work (Schepens et al., 2013, 2016), use information from The world atlas of language structures (WALS; Dryer and Haspelmath, 2013) to evaluate from a typological perspective how the presence or absence of (morphologically) complex features in languages influences their cognitive learning complexity. Further typologically inspired work is undertaken by Ehret and Szmrecsanyi (2019), who adopt an information-theoretic approach to L2 complexity by employing the compression technique and text-deformation method of measuring language complexity based on the formalism of Kolmogorov complexity developed by Juola (1998, 2008).
The studies reported on in this special issue further reflect a shift from the more traditional focus on small convenience learner corpora representing individual learner development to the measurement of complexity in larger parallel learner corpora, opening the path to research methodologies used in corpus linguistics, such as the use of pointwise mutual information (MI) scores to assess collocational complexity (Paquot, 2019). We will now present a short overview of the topics and questions that are raised in each of the five central articles in this special issue.
1 Szmrecsanyi and Ehret
The contribution by Katharina Ehret and Benedikt Szmrecsanyi, entitled ‘Compressing learner language: An information-theoretic measure of complexity in SLA production data’ touches upon the themes of complexity measurement and cross-linguistic influences on complexity. Their study builds on previous information-theoretic approaches to complexity by measuring the compressibility of texts in their original version and by distorting information at the level of the word and of the sentence. They argue that this compressibility represents, respectively, overall complexity, morphological complexity and syntactic complexity. Through an investigation of learner texts from the International Corpus of Learner English (ICLE), they show that more advanced learners use considerably more complex texts than beginner learners, although this tendency is not always reflected in a clear, linear relationship between proficiency and complexity. While this increase is observed for overall complexity and morphological complexity, more advanced learners also use less rigid word order, which is interpreted as a decrease of syntactic complexity. In addition, the two authors tentatively conclude that L1 background can also contribute to L2 complexity, with the German learners in their corpus producing more complex texts than the learners with a Romance L1.
2 Van der Slik, van Hout and Schepens
In ‘The role of morphological complexity in predicting the learnability of an additional language: The case of La (additional language) Dutch’, Frans van der Slik, Roeland van Hout and Job Schepens approach the relation between complexity and L2 development by focusing on the influence of the morphological configuration of L1s on L2 learning achievement. As such, the authors adopt an explicitly cross-linguistic perspective on the study of complexity while simultaneously highlighting the underexplored role of morphological complexity. Their contribution presents a novel approach to answering questions regarding the link between absolute structural complexity and the relative ease and success of acquiring the morphological system of Dutch by learners from various L1 backgrounds.
The study uses data from the State Examination of Dutch as a Second Language (STEX) to sample proficiency data from 8,754 L2 learners of Dutch from various L1 backgrounds. By coding each L1, as well as any other languages the learners knew, according to typological complexity information gleaned from the World Atlas of Language Structures (WALS), the authors are able to replicate and extend on findings from a previous study (Schepens et al., 2013). They demonstrate that learners with an L1 that is less complex than Dutch with regards to eight language features achieved lower L2 Dutch proficiency scores than learners with an equally or more complex L1. Furthermore, the authors find that length of residence in the host country also has a beneficial effect on levels of Dutch oral proficiency obtained, but only if a learner’s L1 is equally or more complex than the L2.
3 De Clercq and Housen
Continuing the theme of cross-linguistic influences on L2 complexity and further exploring the notion of morphological complexity, Bastien De Clercq and Alex Housen investigate how traditional measures of language complexity can be supplemented with measures tapping into morphological complexity. Their article, ‘The development of morphological complexity: a cross-linguistic study of L2 French and English’, examines how three measures of complexity, Malvern et al.’s (2004) Inflectional Diversity (ID), Brezina and Pallotti’s (2019) Morphological Complexity Index (MCI) and Horst and Collins’ (2006) Type/Family Ratio (T/F), represent inflectional, derivational and overall morphology in oral L2 French and L2 English at four levels of proficiency. Their results indicate considerable increases of morphological complexity in the L2 French learners’ oral narratives when complexity is measured by MCI and ID, whereas such differences were restricted to the first two proficiency levels in the L2 English corpus. These empirical findings emphasize the usefulness of including morphological complexity measures when assessing L2 learner development in languages with a rich morphological system, such as French, although they also indicate that important conceptual and methodological differences lie at the basis of various available morphological complexity measures.
4 Brezina and Pallotti
In their contribution, entitled ‘Morphological complexity in written L2 texts’, Vaclav Brezina and Gabriele Pallotti further build on the Morphological Complexity Index (MCI) first introduced by Pallotti (2015). Like the study of De Clercq and Housen (2019), the authors further develop the themes of cross-linguistic complexity measurement and the development of morphological complexity. After a review of previous approaches to the measurement of morphological complexity, the authors discuss an approach that is based on the notion of exponence and on the diversity of inflectional verbal forms. They subsequently demonstrate how their conceptualization of the construct can be applied to a morphologically simple language (English) and a morphologically more complex one (Italian) by introducing an automatized computer analysis tool that will undoubtedly prove to be of great value to other researchers working on language complexity. They apply the MCI to data from the CALC corpus (Kuiken and Vedder, 2014), representing native and non-native Italian texts, and to data from the ICLE (Granger et al., 2002) and LOCNESS corpora, which respectively represent L2 and L1 English written argumentative essays.
Their study indicates that the overall higher morphological complexity of Italian is reflected by MCI in both native and non-native (written) productions. In addition, their study finds a link between higher morphological complexity and higher proficiency in Italian but not in English, thus echoing the findings from De Clercq and Housen (2019). Yet, the two authors emphasize the importance of considering the link between complexity and proficiency as an empirical finding, instead of as a validation of the Morphological Complexity Index.
5 Paquot
Magali Paquot bridges the gap between her previous work on phraseology and traditional approaches to complexity in the fifth article entitled ‘The phraseological dimension in interlanguage complexity research’. By focusing on the interface between lexis and grammar, Paquot contributes to the investigation of hitherto neglected forms of complexity. To this end, she introduces a conceptualization of complexity that centers around the variety and sophistication of collocations, which are identified via pointwise mutual information scores in a POS-tagged corpus. The notion of phraseological variety or diversity is operationalized as a measure based on type-token ratios, while sophisticated collocations are identified using a reference list (Academic Collocation List; see Ackermann and Chen, 2013).
The empirical part of her contribution centers around a comparison of these phraseological complexity measures to more traditional lexical complexity measures calculated by Lu’s (2012) Lexical Complexity Analyzer. For her analysis, Paquot uses data from the VESPA corpus, from which she analyses 98 written academic texts that represents three L2 English proficiency levels, from upper-intermediate to near-native. While Paquot acknowledges that her proposed methodology still requires further validation, her findings nonetheless corroborate those of previous studies (see Paquot and Granger, 2012) that highlight phraseology and collocations as important components of the advanced development of L2 learners.
V Conclusions
The way the notion of complexity has been approached in L2 research illustrates the problematic nature of construct definition, validation and operationalization in our field (Norris and Ortega, 2003). These are crucial steps in all scientific disciplines but ones which are often ignored or treated in a step-motherly fashion in SLA research.
As the five contributions in this special issue explore many of the conceptual and methodological practices in L2 complexity research presented in this article, they also present an attempt to conceptually clarify and operationally refine the construct of language complexity. But while the recent developments in the study of language complexity that lie at the core of these articles undoubtedly enrich the field of SLA, it remains necessary to stay critical of how they contribute to a coherent understanding of the notion of complexity, to the extent that such a notion has ever existed.
As discussed above, a number of authors have voiced concerns about the wide range of definitions and measures of complexity, with Pallotti (2015) in particular arguing in this journal for a more streamlined, theoretically coherent interpretation and assessment of complexity in language acquisition research. With this in mind, the studies in this special issue also represent research aims that go beyond those of more traditional complexity research. Nevertheless, an awareness of the theoretical underpinnings of language complexity remains as important as ever in order to accurately characterize how a study contributes to our knowledge of second language structure and acquisition.
Footnotes
Declaration of Conflicting Interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
