Abstract
This paper presents a quantitative corpus-based variationist analysis of the English insertions used by Belgian Dutch and Netherlandic Dutch participants to the reality TV show ‘Expeditie Robinson’. The data consist of manual transcriptions of 35 hours of recordings for 46 speakers from 3 seasons of the show. Focusing on the expressive utterances in the corpus, we present a mixed-effect logistic regression analysis to pattern which of a variety of speaker-related and context-related features can help explain the occurrence of English insertions in Dutch. The results show a strong impact of typical variationist variables such as gender, age and location; but features that are more situational, such as emotional charge and topic of the conversation, also prove relevant. Overall, in its combined focus on (a) oral corpora of spontaneous language use, (b) social patterns in the use of English and (c) inferential statistical modeling, this paper presents new perspectives on the study of Anglicisms in weak contact settings.
Introduction
This paper aims to reveal the social context of the use of English in weak contact settings by patterning the characteristics of the speakers using English insertions and of the situations in which they occur. Specifically, we present a quantitative variationist analysis of the use of English loanwords and phrases by Belgian Dutch and Netherlandic Dutch participants to the reality TV show ‘Expeditie Robinson’ (known as ‘Survivor’ in the Anglo-Saxon world).
In its focus on the rise of English in the Low Countries, this paper can be framed against the background of Anglicism research in weak contact settings. Similar to Germans (Onysko, 2007) or Danes (Gottlieb, 2005; Sandoy, 2013), most speakers of Dutch are at least weakly bilingual: in the Eurobarometer data (European Commission, 2012), 90% of the Dutch claim to be able to have a basic conversation in English (see, for example, Gerritsen et al., 2007). 1 (For Belgian respondents to the survey, this level drops to 38%. However, because the Belgian respondents are sampled across the three linguistic communities of Belgium – French-speaking in the South, Dutch-speaking in the North, and a German-speaking minority in the East, these data are not necessarily very representative for Belgian Dutch speakers.) Despite this high percentage, the English language has no official status in the Low Countries, and contact with English is indirect: bilingual communication between Dutch and native English speakers is rare, because the prime source for contact is mass media (i.e. Internet, Hollywood, pop music and subtitled English-origin broadcasts on television) (Booij, 2001). Given this asymmetrical, indirect nature of the contact situation, the linguistic outcome is different from that in more intense settings, such as migration or colonization. Specifically, the impact of English is largely restricted to the receptor language lexicon, to which loanwords and loan phrases are added (see, for example, Furiassi, Pulcini & Rodriguez-González, 2012 and Zenner, Speelman & Geeraerts, 2013 on borrowed phraseology).
Over the past decades, inventorying this lexical influence of English has grown into a research tradition of its own in Western-Europe. With this paper, we aspire to complement this tradition of corpus-based Anglicism research on three levels. First, with regard to data, existing studies have for practical reasons mainly relied on print media corpora (e.g. Yang, 1990). Although these provide convenient resources for drafting inventories of established loanwords, they do not provide any information on the use of English in spontaneous conversation (see, for example, Backus, 2013). Second, the loanwords retrieved from these print media corpora have so far predominantly been subject to structuralist analyses that focus on patterns of nativization and adaptation (e.g. Onysko, 2009): variationist analyses studying the social features of the users of English insertions and the contexts in which they occur have largely fallen outside of the scope of Anglicism research. Third, although some attention to social patterns has recently emerged in more qualitative approaches (e.g. Androutsopoulos, 2012; Pennycook, 2003), quantitative analyses relying on inferential statistical modeling are virtually absent. However, as will be discussed below, inferential statistical modeling is an invaluable tool in disentangling the complex interplay of speaker-related and context-related features.
In this study, we aim to address these three points by conducting a variationist analysis of the use of English insertions in three seasons of the reality TV show ‘Expeditie Robinson’. In the next section, the design of the study is described in more detail. First, we briefly present the corpus with attention for the benefits and drawbacks of reality TV as a source for linguistic description. Then, we discuss which features are considered English insertions. Third, the speaker-related and context-related features that are included in the statistical analyses are presented. In Section 3, the results of the statistical model, i.e. a mixed effect logistic regression model, are discussed. Finally, the findings of the study are summarized in a conclusion.
Corpus and design
‘Expeditie Robinson’
The existing focus of Anglicism research on print media in large part results from the availability of material. Where sizeable newspaper corpora are accessible for most Western European languages, representative corpora of spontaneous spoken language are rare and – when available – small or highly diversified, due to the labor-intensive process of recording and transcribing oral data (Backus, 2013). For example, with ten million words, the Dutch CGN (Corpus Gesproken Nederlands) is quite large for a spoken corpus, but only a minority of the corpus consists of private informal discourse (alongside speeches, business negotiations, secondary school courses etc.). Hence, when aiming to acquire a comprehensive view on the social context of lexical borrowing in the spontaneous language use of a number of individuals, recourse has to be taken to other types of data (see, for example, Leppänen & Nikula, 2007).
For this study, we specifically explore to what extent reality TV presents new options, focusing on the game doc ‘Expeditie Robinson’. This show takes the form of a social game in which different participants compete in physical, intellectual and social challenges. In the course of approximately fifty days, sixteen contestants, who are initially divided in two ‘tribes’, try to survive on a (supposedly) desert island and strive to be awarded with the title of Robinson of the Year. Elimination is progressive: at regular intervals, participants gather in the so-called Tribal Council, where they have to vote one participant home. The final participant to survive these councils wins a sizeable amount of money and wins the title of Robinson of the Year. Based on the experiences of the participants during the game, three types of fragments are shown on television: informal dialogues on the island, dialogues at the more formal Tribal Councils, and utterances from video diary fragments, which are ‘like secret correspondence with the viewer, providing information about the game that the other castaways may not have’ (Haralovich & Trosset, 2004, p. 88).
Our corpus consists of annotated transcriptions of seasons 4, 5 and 6 of the show (broadcast in 2003, 2004 and 2005 respectively). Each of the 10,000 utterances of the 52 participants was transcribed manually, relying on the CHAT-conventions of the CHILDES project (MacWhinney, 2000). The analyses presented in this paper rely on a subset of this data (see below for more information).
Before proceeding to a discussion of the English insertions in the corpus, it is important to note the main benefits of working with these data. First and foremost, reality TV offers high quality recordings that are relatively easy to gather: it is feasible to acquire enough data to quantitatively study lexical patterns such as the use of loanwords. Second, the participants to ‘Expeditie Robinson’ come from a wide variety of social backgrounds, in contrast to other reality TV shows such as ‘The Bachelor’ or ‘Temptation Island’, and to the type of interview data usually collected in contact linguistic studies. Despite these benefits, one notable drawback has to be noted. Because ‘Expeditie Robinson’ is a TV show, the amount of editing, cutting and pasting that has been conducted prior to broadcasting is unclear. Moreover, some parts of the show could be scripted and thus could be less spontaneous than we consider them to be. However, when taking heed of this issue in creating the design of the study and interpreting its results, reality TV data can prove a useful tool in analyzing the social context of the use of loanwords and phrases.
English in expressive utterances: Loanwords and phrases
A first crucial task in tracing the social context of English loanwords and phrases in Dutch is to define what counts as an English insertion. Opposed to highly inclusive approaches that incorporate all transfer types (Poplack, Sankoff & Miller, 1988; and see, for example, Furiassi, Pulcini & Rodríguez González, 2012 for a full overview of all types of Anglicisms), we only consider those Anglicisms that are recognizable as English to native speakers of Dutch. The main reason for this is that ‘the non-Dutch character of a word can only exert influence on the language user’s behavior when the expression at issue is identifiable as a non-Dutch word’ (Geeraerts & Grondelaers, 2000, p. 56). Ideally, the decision on the status of an inclusion as an English loanword would thus be based on those language users’ judgments. However, retrieving judgments for all inclusions in the dataset is a very time-consuming task, if not a research project in its own right. Moreover, given the presumed social variation in the use and recognition of English words and phrases, it would be hard to find a group of test subjects that is representative for the community at large. In practice, we hence decided to use an approximation of speaker judgment that relies on the fact that Anglicisms are ‘frequently marked by their unusual interplay of phonological and orthographical form’ (Onysko 2007, p. 33) (cp. Gerritsen et al., 2007 and Gerritsen et al., 2010 for an alternative approach relying on lexicographical treatment). More specifically, only those insertions whose grapheme-phoneme mapping is not fully concordant to the conventions of Dutch are included in the study. For example, compare film and survivor: both are English loans, but where pronouncing the written form film according to the conventions of Dutch sounds very close to English (/fɪlm/), such a naive Dutch pronunciation of survivor would sound more like /sʏr’viːvɔr/. Hence, survivor is considered an English insertion, film is not. Furthermore, Anglicisms for which no Dutch alternative exists and which are thus unavoidable in discourse (e.g. cocktail, barbecue) are also excluded from the analysis.
(1) *MAR: ik (h)ad gewoon een BLACKOUT ik ik die die die vraag ik (h)eb die vraag zeker drie vier keer gelezen in in een heel allez in in een paar seconde(n) tijd en ik ik ik kon gewoon de link nie(t) make(n). 2 ‘I simply had a black out, I must have read that question three or four times in a couple of seconds but I couldn’t make the link.’
(2) *MAK: ik (h)ad echt zoiets van # SHIT hij moet wel gelijk dood zijn want anders vin(d) (i)k (h)et ook niet leuk en euh@fp maar hij was uiteindelijk toch dood. ‘I was like: shit, it has to be dead immediately because otherwise it’s just no fun, but eventually it died.’
Examples (1) and (2) provide utterances containing English insertions. The examples point at an interesting difference: whereas blackout is used referentially to indicate a specific psychological condition, shit is used expressively/pragmatically to mark annoyance. Because conflating both types of insertions in one analysis might hide the specific characteristics of each type of Anglicism, our study focuses exclusively on the more expressive items. Moreover, because expressive/pragmatic words typically occur in expressive utterances, we only consider such expressive utterances for the analysis. In this way we ensure that our results reveal features specific of non-referential English, not of non-referential language use in general. If, in contrast, all utterances were included in the analysis and, for example, we were to find more expressive/pragmatic English insertions in the men’s utterances than in the women’s, the result would be ambiguous. It could reveal a gender effect for the use of English, but it could also simply indicate that men and women use an equal amount of English in expressives, but that the proportion of expressive utterances is higher in the men’s dataset than in the women’s. This ambiguity disappears when focusing exclusively on the expressive utterances in the data.
Specifically, expressive utterances are identified according to the definition provided for exclamatives by the main Dutch reference grammar (Haeseryn, Romijn, Geerts, de Rooij & van den Toorn, 1997), i.e. all utterances spoken in a louder tone to express any form of emotion, such as surprise, annoyance or fear. The corpus contains 1,190 of these utterances for 46 speakers, 198 of which contain one or more English insertion. 3 Together, these 198 utterances contain 227 English insertions. Table 1 lists all types with a token frequency over two. Both loanwords and loan phrases are found in the corpus, but hardly any actual codeswitching occurs: insertions containing more than one word are typically highly fixed expressions, borrowed as a chunk from English into Dutch (see, for example, Doğruöz & Zenner, 2013).
Most frequent English insertions in expressive utterances.
The focus of this paper is on finding common features of the exclamative utterances that contain English insertions (see example (3)) as opposed to those that do not (see example (4)).
(3) *MER: JESUS jonge(n) seg tis [: het is] daarom da(t) (i)k er nog in zit. ‘Jesus man, that’s the reason I’m still here!’
(4) *FLE: weg met de stemme(n). ‘Away with the votes!’
To this end, a mixed effect logistic regression model is built. This statistical technique allows us to verify the effect of a number of variables on the choice for English. We discuss below the situational and speaker-related features that are included in this model.
Variationist analysis: capturing the social context of English loans
2.3.1 Speaker-related features
Each of the 1,190 expressive utterances in the corpus is tagged for a number of speaker-related and context-related features that could be associated with the use of English. With regard to the speaker, our first aim is to find quantitative proof for the assumption that, through its associations with symbolic values like modernity, fun and trendiness, English in weak contact settings is a typical youth language phenomenon (e.g. Androutsopoulos, 2005; Leppänen & Nikula, 2007). We classified the 46 participants (aged between eighteen and fifty-nine) in the three age groups that are used for the formation of the tribes in season 5: 18 participants are under thirty, 14 are between thirty and forty, and 14 are over forty.
Next, we include the gender of the participant: 23 are male, 23 are female. So far, only a handful of studies have looked for gender effects in the use of Anglicisms. In her study of English loanwords in Swedish, Sharp (2001) finds consistently fewer English insertions in the utterances of women. For intense contact settings, Poplack et al. (1988) also find that women use less loanwords overall, but note that this pattern is due to a confounding regional effect. Given these results and considering women’s generally higher sensitivity to linguistic norms, we expect the women on the show to use English insertions less frequently than the men.
As a result of the indirect and asymmetrical nature of English influence described above, the English proficiency of Europeans still largely depends on foreign language teaching at school. Hence, a linear relationship may be expected between the amount of English insertions found and the educational level of the participant. Because we do not have any information on the participants’ educational background, we focus on their jobs. Relying on the Standaard Beroepen Classificatie (‘standard professions classification’) of the Dutch Central Bureau of Statistics (CBS 2010), we make a binary distinction between jobs requiring a higher level of education (e.g. lawyers, marketing consultants) (23 speakers) and jobs requiring a lower level of education (e.g. roofers, bar tenders) (23 speakers).
With regard to the regional background of the speakers, we distinguish between three levels of granularity. On the national level, we focus on the pluricentric nature of Dutch (see, for example, Clyne, 1992) by contrasting the 24 participants from Belgium with the 22 participants from the Netherlands. In the nineteenth century, standardization of Belgian Dutch was slowed down significantly, in large part because most of public life was conducted in French. When the standardization process was speeded up in the 1960s, language policy was directed towards assimilation with Netherlandic Dutch. In practice, this resulted in an ardent rejection of French loanwords, which were far more frequent in Belgian Dutch due to the long dominance of French as the language of culture, higher education and government and which hence caused divergence from the Netherlandic Dutch standard. Based on this background, we propose two conflicting hypotheses concerning the use of English in Dutch (cp. Zenner, Speelman & Geeraerts, 2012). First, the rejection of French loanwords could form the basis of a more general Belgian Dutch purism, which also affects the use of English. In this case, more English insertions will be found in the utterances of the Netherlandic Dutch participants than in those of the Belgian Dutch participants (Geerts 1992, p. 85; see also Gerritsen et al., 2010). Second, the use of English by the participants of both regions could be highly comparable, because, unlike French loanwords, English loanwords did not have an impeding effect on standardization. An extra argument for this hypothesis is the similar, strong position English holds in the media in both regions (Booij, 2001).
On a more regional level, we look for an effect of the province the speaker is from, contrasting core with peripheral provinces (24 and 22 participants respectively). For Flanders, the core provinces are Antwerp and Flemish Brabant. For the Netherlands, these are North and South Holland and Utrecht. These provinces have been shown to lead linguistic change in the Low Countries (Geeraerts, Grondelaers & Speelman, 1999). Also, as they form the hubs of economic life, their degree of urbanization is higher than that of the peripheral provinces. Given how the spread of English in Europe is tied to business settings, globalization and a modern lifestyle (Erling, 2007; Piller, 2001), we can hence expect more English in the expressive utterances of speakers from these core, highly urbanized, provinces.
The same reasoning can be applied to the most local level, i.e. the town the speaker is from: people from provincial capitals live in a more urban environment than people from smaller towns. This higher level of urbanity might be reflected in a higher use of English. Of our 46 participants, 15 live in province capitals, 31 in smaller towns. 4
Turning to variables more specific to our reality TV data, we first verify whether speakers who use more expressive utterances on the show are also more inclined to use English in these utterances. Specifically, we calculate the proportion of expressive utterances over the total number of utterances for every speaker. Relying on the median of these proportions across the speakers, we divide the speakers in two groups: low users (i.e. less than 12.6% of the total amount of utterances of the speaker) and high users of expressive utterances (more than 12.6%). Each group consists of 23 speakers.
Next, we aim to assess the impact of a participant’s social position on his or her use of English. Because we do not have enough information to conduct social network analyses (e.g. by means of details on who votes for whom at the Tribal Council), we work with a rough approximation of social status based on the way the participant leaves the island. First, a participant can be voted home by other participants in the Tribal Council. Second, participants occasionally leave home on their own initiative, due to physical or emotional distress. Third, participants go home after losing a challenge. In this case, there is no social reason why they leave. Finally, three to four participants make it to the finals. Generally, the first two types are indicative of a weaker social position (29 participants), whereas the final two point to a stronger social position (17 participants). Given the associations of the use of English with openness and fun (see above), the participants using more English might have a better social position on the island.
Situational features
When aiming to discern the social context of using Anglicisms, it is important to complement the speaker-related features described above with more situational features. To this end, we first provided each utterance with a code indicating the topic of the conversation. Four different topic groups were identified, which can be ranked according to the degree of emotional involvement of the speaker. First, we find expressive utterances in conversations about food. The importance of food on the island cannot be stressed enough. At the start of the show, participants receive a minimal survival pack of rice and some canned food, but they are supposed to either find food (such as coconuts) on the island or win food in the challenges (see below). Several of the tribes run out of their food supply early on in the show, leading to hunger and weakness and, consequently, to numerous conversations on food (or the lack thereof).
(5) *FAT: a(l)s er iets is wat we nu op dit kamp nodig hebbe(n) voor de jonge(n)s is (h)et ete(n) # (i)k bedoel je kan niet overleve(n) op grassprietjes # dat gaat niet # dat gaat niet. ‘If there is anything we need in the camp right now, for the boys, it’s food. I mean, you can’t survive on blades of grass, you can’t, you can’t.’
Second, in conversations on life on the island, participants express their amazement at the fauna and flora they encounter, or they discuss the logistics of their camp site. The emotional involvement of the speaker is notably higher in the category “social life”, where participants talk about social affiliations, gossip, and strategic voting. Finally, the challenges form the most emotional setting, with participants encouraging each other during the game, or becoming highly annoyed when they lose. Given how English insertions are typically considered to be rather informal, their use might increase as attention paid to speech goes down (i.e. proportional to the speaker’s emotional involvement).
Next, because the utterances we focus on are used to express stance and emotion, we are interested to verify whether English is used more typically to express positive emotions (see example (4)) than negative emotions (see example (3)). Of the 1190 utterances in the database, 918 (or 77%) are used to express positive emotions.
In the context of the reality show, another possibly significant factor is temporal evolution. At the beginning of the show, the social environment is very new and somewhat artificial: participants are grouped together on a foreign island with a handful of people they have never met before. At this point, tribe members cannot be said to form a community yet. Participants will consequently be more aware of their social position and of their language use than later on in the show (see Gumperz, 1997, p. 15 in Auer & Roberts, 2011, p. 387). To verify to what extent group formation has an impact on the use of English, we contrast four parts of the show: episodes 1 to 3, episodes 4 to 6, episodes 6 to 12, and episodes 12 to 14. 5
Finally, we pay attention to the interactional nature of the expressive utterance, contrasting utterances where the speaker addresses a specific discourse partner (see (3)) with those that are not directed at anyone in particular (see (4)). This parameter might reveal whether English insertions primarily function interactionally or introspectively.
Analyses and results
Mixed-effect regression modeling
In order to disentangle the interplay of these different features (see Table 2 for an overview) in explaining the choice for English in the expressive utterances of the 46 participants, we rely on inferential statistical modeling.
Overview of predictors and hypotheses.
One of the most reliable ways of determining the simultaneous impact of several features on a binary response variable (in this case, whether English is used in the exclamative utterance or not) is logistic regression analysis. This technique has four benefits over traditional descriptive analyses. First, regression models produce estimates of the effect of a specific predictor that are more reliable, because individual effects are calculated taking the combined effect of all included features into account. Second, the model presents significance levels, indicating whether and to what extent the patterns found in the defined dataset can be confidently extrapolated to the entire population. Third, researchers can look for meaningful and significant interactions between the predictors included in their study (for example verifying whether gender effects are stable across different age groups). Finally, regression analyses allow for the introduction of one or more random variables, leading to so-called mixed-effects models.
The latter point is of particular interest for variationist studies like the one we present here: given that our dataset contains more than one utterance per speaker, the speaker him-/herself ‘becomes a source of variation that should be brought into the statistical model’ (Tagliamonte & Baayen, 2012; and see Baayen, 2008; Everitt & Hothorn, 2010; Galwey, 2006). In contrast to fixed factors such as gender, which have a limited number of values (‘male’ or ‘female’), the sample for a random factor is selected randomly from a large population: our dataset contains 46 different speakers, but many more could be added when rerunning the analysis (e.g. on other seasons of ‘Expeditie Robinson’). When, as is the case in our dataset, several measuring points are included for the values of a random factor (in this case, several utterances per speaker), we have to take into account that these observations will undoubtedly be correlated. Specifically, we have to make statistical allowance for possible differences between the speakers that are not captured by the features included in our analysis (in an intuitive linguistic sense, this would involve idiolectal differences that cannot be attributed to the effect of the factors that we have listed in the previous section). This can be achieved by introducing the speaker as a random variable to the regression model.
Before presenting the best-fitting model for our data, we briefly describe the steps needed to create this model. We start by building a fixed-effects-only model (i.e. without the random variable ‘speaker’), because the diagnostic tests for these more traditional models are currently more reliable than those for mixed-effects models. To assess which of our variables contribute significantly to explaining the use of English, we run a forward stepwise selection algorithm and cross-verify the results by means of bootstrapping. Both main effects and interactions are included. Next, we verify the degree of fit of the resulting model. The standard diagnostic tests reveal no significant issues. 6 As concerns the explanatory power of the model, tests indicate a mediocre fit. Pseudo R², a value between 0 and 1 indicating how much of the attested variation is explained by the model, is 0.205. The model’s C-measure, also a value between 0 and 1, with C values greater than 0.8 signifying predictability and C values greater than 0.7 indicating reportable models, is 0.759. 7 Although both pseudo R² and C indicate that there is quite some room for improvement of the model, the fit is sufficient to proceed to a mixed-effects model: we now include ‘speaker’ as random variable. The existing diagnostic tests reveal that there are no issues with the fit, and that the model captures the variability found in the dataset very well. 8
Table 3 presents the fixed effects for the mixed model, ranked according to their relative importance in the
Fixed effects for the mixed model.
Interpreting the results
The second and final column of Table 2 contain the most important information for interpreting the model. The second column shows the estimates, which capture the behavior of the predictors. Because we are dealing with categorical variables (i.e. variables whose value is one of a fixed number of nominal categories, e.g. ‘male’ or ‘female’ for gender), the behavior of one of the levels is captured in the intercept as reference value (in this case ‘male’). The behavior of the other levels (‘female’) is compared to this intercept. A negative estimate means that there is less chance of finding expressive utterances containing English than in the intercept. A positive estimate means that there is more chance of finding English than in the intercept. 9 The final column indicates the significance of the pattern: the more stars, the more significant the effect (*** for p < 0.001; ** for p < 0.01; * for p < 0.05).
Focusing on the gender pattern, we see a negative estimate for women: women are significantly less inclined to use English in exclamatives than men. This gender effect is the strongest effect we find in our model. Given the high popularity of English swearwords such as shit and fuck in Dutch (see Table 1), this result might illustrate that the women on the show express themselves more politely, without using (English or Dutch) swearwords. However, as explained above, it might also be related to women’s higher sensitivity to linguistic norms, which specifically applies to stable variants such as lexicology (as opposed to sound change) (Eckert, 2011). As such, this effect can be seen as tentative support for the fact that English is still marked for the speakers in our corpus, and cannot be considered a neutral alternative to Dutch (yet). The second most important feature in our model supports this idea. Table 2 reveals that participants living in the core provinces use significantly more English than participants from the more peripheral provinces. Given how (as discussed above) these core provinces typically take the lead in linguistic change, this result provides further support for the idea that the use of English loanwords and phrases is not dispersed equally across the community and hence is not yet fully established. Moreover, given the higher degree of urbanization of the core provinces, this result also supports the idea that English is tied to a more modern, urban lifestyle.
Whereas these first two features deal with the characteristics of the speakers typically using English in the corpus, the next predictor is oriented towards the context in which utterances containing English can be found. Specifically, we find an effect of the topic of the conversation. One of the four topics we defined stands out: in the context of the challenges, participants use significantly more English than when talking about any other topic. Emotional investment is far higher in this context than at any other point in the game: in the so-called reward challenges, participants can win crucial emotional or logistic support (food, a phone call home…) and in the elimination challenges, where immunity at the next Tribal Council can be won, the future of the participants on the island is at stake. The greater use of English in these contexts thus indicates how using English insertions seems to be tied to higher emotional involvement.
The following two features are again related to the speaker. First, participants with a job that requires a higher level of education are more inclined to use English than participants with a job that requires a lower level of education. As discussed above, we can interpret this effect as an indication that there is a given level of proficiency required to use English insertions fluently.
The next parameter shows an effect of age group. To take the ordinality of the predictor into account, we rely on reverse Helmert-coding: the information on how to interpret estimates for categorical predictors given above does not hold. In this case, each level of the variable is compared to the mean of the previous levels. Table 2 reveals a significant effect only for the oldest age group (participants over forty), who use significantly less English than the younger participants (aged under forty). As such, our data mainly provide evidence for an upper limit on the use of English: once at a given age, the use of English drops significantly.
The interpretation of the final two parameters in the model is less straightforward. First, we find significantly less English in exclamatives when a specific discourse partner is addressed than in more general exclamatives. This seems to indicate that English is used predominantly introspectively on the island, rather than with an interactional function. However, this is somewhat counterintuitive. In an alternative interpretation, the pattern can be seen as revealing accommodation strategies: perhaps the use of English (swearwords) is often toned down in interaction to accommodate to those discourse participants who are less inclined to use English. To arrive at a precise and reliable interpretation of the predictor, more in-depth (discourse-analytic) analyses are needed, scrutinizing the social characteristics of the individual speakers and listeners. This falls outside the scope of the present analysis (but see Zenner & Van de Mieroop, forthcoming).
Finally, Table 2 shows that participants use significantly more English when expressing negative emotions than when expressing positive emotions. To explain this pattern, it is important to have a look at the most popular English insertions we find in the data. Table 1 reveals that the swearwords shit and fuck are notably more popular than other insertions. Adding damn and variations on the three discourse markers such as damn it, what the fuck or the creatively coined shiteshit (which might be considered a pseudo-loan; see, for example, Furiassi 2010), the forms take up almost a quarter of the data (51 of 227 insertions). Moreover, the forms are used by almost half of the speakers, where the other English insertions are used by only a handful of participants. The popularity of these three types most likely accounts for the significant effect of the emotional polarity of the utterance in the regression model: English is used more frequently when expressing negative emotions because of the high entrenchment of the discourse markers shit, fuck and damn. Before bringing these results together in concluding comments on the social context of English in the Low Countries, two important remarks need to be made regarding the parameters that did not reach significance in the model (see Table 4).
Non-significant parameters.
First, the absence of ‘town size’ from the model can be explained by the fact that ‘province capital’ and ‘town size’ both capture the effect of urbanity. Because the predictors measure the same underlying phenomenon (of the 23 speakers living in the peripheral provinces, only 4 live in province capitals), the stepwise selection algorithm refrained from selecting both for the model. Second, the absence of pluricentric variation is noteworthy, given the differences in the socio-cultural history of foreign influence for Belgian Dutch and Netherlandic Dutch noted above. However, the result fits in with existing studies on the use of English in the two regions (Geeraerts et al., 1999, Zenner et al., 2012) which also showed that the use of English is highly comparable in the two varieties. Finally, the other features listed in Table 4 might not reach significance in our statistical model as their effect is located on a more fine-grained level: qualitative interactional analyses might reveal clearer patterns (see, for example, Zenner & Van de Mieroop, forthcoming).
Conclusions
With this paper we set out to identify the main speaker-related and context-related features contributing to the use of English insertions in informal spoken Dutch. To this end, we presented the results of a mixed-effect logistic regression model, based on the 1190 expressive utterances of 46 Belgian Dutch and Netherlandic Dutch participants to the reality TV show ‘Expeditie Robinson’. The model indicates that the use of English in expressive utterances is typical for younger, more highly educated, male participants and for participants from the core provinces. In addition, English is mainly used in contexts with a high degree of speaker involvement and to express negative emotions. Overall, our results underline that, in weak contact situations, a number of symbolic values are attached to the use of English. Using English and, more notably, using highly expressive/pragmatic English discourse markers such as shit and fuck, helps speakers express their own emotions, meanwhile underlining their identity as young, modern individuals. This idea is further supported by the fact that English cannot be seen as a neutral alternative to Dutch: it is not equally dispersed across the different regions and women tend to use English less than men.
From a more general perspective, this study aims to open up Anglicism research in weak contact settings, by (1) focusing on spoken data instead of print media corpora; (2) focusing on the social features of English instead of on the morpho-syntactic integration of Anglicisms; and (3) relying on inferential statistical techniques. This socio-variationist approach can also be easily applied to research in intense contact settings. Although statistical modeling is sometimes used in more structuralist approaches (e.g. Cacoullos & Aaron, 2003; Van Hout & Muysken, 1994), the use of inferential statistics for sociolinguistic analyses is – given some exceptions (e.g. Poplack et al., 1988) – quite rare in contact linguistics. In this respect, we hope that this paper has convincingly demonstrated the promising possibilities of socio-variationist analyses relying on complex statistical modeling for contact linguistics.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
