Abstract
This study compares the group-specific evaluations of (t) in Greater Manchester, England, with those of (ing) published in a previous study. The comparison is based on a set of perception surveys, in which study participants listened to manipulated audio stimuli and rated them on a series of scales. In contrast to findings for (ing), the social characteristics of listeners are not pertinent to the evaluation of (t): most social meanings associated with (t) are shared across the Greater Manchester population. It is argued that this is due to the pronounced attitude strength of T-glottalling in this particular region.
1. Introduction
Glottal replacement of (t), referred to frequently as glottalling (Wells 1982:261), has spread rapidly across the UK over the last century and is now used widely in England, Scotland, and Wales (e.g., Mees 1987; Britain 2005; Schleef 2013). It involves the complete substitution of /t/ with something that has an acoustically robust glottal quality (e.g., water [wɔːtə] ~ [wɔːʔə], what [wɒt] ~ [wɒʔ]). Previous research has made significant advances in documenting the phonetic details of the realizational variants of (t), in addition to the sociolinguistic constraints on T-glottalling (e.g., Milroy et al. 1994; Docherty & Foulkes 1999; Fabricius 2000; Straw & Patrick 2007; Schleef 2013). However, there is very little recent, empirical, perceptual evidence of the social meanings associated with the variation in (t).
This is surprising as theoretical developments within the field of sociolinguistics have enhanced the need for perceptual work from a variationist perspective. Research in what Eckert (2012) refers to as the third wave of variation has placed the study of social meaning at center stage (e.g., Agha 2005; Campbell-Kibler 2007; Podesva 2007; Eckert 2008; Johnstone & Kiesling 2008). This is because understanding the social meaning of variables sheds light on how they are used and the factors that may drive language change. Investigating the perception of variable features, in particular, allows us to explore the extent to which certain speakers associate a linguistic feature with the same or different social meanings. This paper explores to what degree macro-sociological categories, into which speakers fall, may influence the social meanings to which these speakers gravitate. I will explain briefly why group-specific evaluations of variable features may be expected.
In this paper, I follow Moore and Podesva (2009:448-450) and conceptualize social meanings as stances, personal characteristics, personae, and social types indexed through the use of linguistic features in specific interactions. For example, variation in (ing) has been shown to index specific social meanings, such that singing (rather than singin’) may express articulateness, intelligence, etc. The social meanings of variants are under-specified (Eckert 2012). Based on abstract meanings, which have developed from large-scale patterns in variously stratified, overlapping and complex communities (Eckert 2002:5), relevant social meanings emerge in specific social settings. Thus, a set of abstract meanings is attached to a variant, and they are adapted and specified in local contexts.
Social meanings are not only underspecified, but also dynamic. This dynamism can be captured in interactional data (e.g., Podesva 2007; Moore & Podesva 2009) but also in perception tasks by inspecting the variability in which a feature is evaluated. Preston (2010, 2011) has developed a processual model of where and why such variability may emerge. In a first step, listeners must become aware of a form (“noticing”); otherwise they cannot react to it. In a second step, the noticed form is “classified” according to social, contextual, or linguistic criteria; for example, they may classify speech as Yorkshire English, casual, or Cockney. Once speech has been classified, it is “imbued” with evaluative information. This information is drawn from the listener’s stored cognitive representations of the classification: a Yorkshire accent may be associated with friendliness and similar attitudes. In a final step, there is a reaction.
Preston (2010, 2011), crucially, argues that both noticing and classifying are dynamic processes; in other words, they depend on situations, tasks, and properties of individuals. This contrasts significantly with a fixed view of noticing a form (usually discussed under the heading of salience) as an inherent and stable property of a variant. Similarly, the next step, classifying a linguistic form, which is often discussed under the heading of social salience, has been assumed to be fixed and stable (see Levon and Fox’s [2014:194] discussion of social salience and how it is moderated by contextual, cognitive, and personal factors). However, both noticing and classifying are best seen as dynamic processes. There is variation in how a linguistic form is noticed and classified. Thus, a specific social meaning is not a fixed property of a linguistic form.
The evaluation of a linguistic feature may depend on a variety of factors, including listeners’ expectations, experiences, beliefs, attitudes toward relevant groups and categories, and various contextual conditions (including eliciting conditions); for example, the information available to the listener about the speaker (e.g., Campbell-Kibler 2007; Hay, Drager & Warren 2010; Levon & Fox 2014). Indeed, if it is the case that situations, tasks, and individual attitudes influence social meanings, so it follows that groups of individuals that repeatedly find themselves in specific situations and are subjected to particular attitudes are likely to develop specific social meanings for variable linguistic features. Eckert (2008:467) argues this very strongly: People at different places in the political economy see the world differently, do different things, have different preoccupations, and say different things. [. . .] While the entire population might agree on first-order indexicality—who uses what variant—the evaluation of that differentiation can differ across the population.
Eckert (2008:467) makes a precise suggestion for such meaning variation concerning social class, formality, and displays of education when discussing social meanings of (ing): If I am correct in my assumption that class differences involve ideological differences about formality and displays of education, then one might expect working-class speakers to have the more positive evaluations of this form and middle-class speakers to have the more negative ones.
Eckert (2008) assumes that the frequently documented social stratification of variables reflects different ideological orientations and that the selection of one variant over another is an ideological choice, at least to some extent (linguistic and interactional factors must play a role as well). For example, speakers may use [ɪn] to dissociate themselves from what Eckert (2008:467) calls, “institutions of legitimacy and the power they represent,” while [ɪŋ] may index association.
Social class cannot be the only social category in which we anticipate such differences. Based on the aforementioned arguments, we may expect social meanings to differ somewhat in the age and gender categories (e.g., Sankoff et al. 1989; Eckert 1997:163; Eckert 2000:14-15; Chambers 2009:166, 182; Cameron 2010:311-312); especially when individuals enter a stage in their life that features a large degree of variability and change in their social networks and in their ideas about who they are and who they associate with. This is, of course, the case for adolescents; and certainly, at least regarding production, it has been suggested that T-glottalling, alongside TH-fronting and R-labialization, forms part of a set of youth norms toward which young people orient in many parts of the UK (Williams & Kerswill 1999; Milroy 2007). The question is whether this variation in social circumstances results in particular social groups evaluating variable features differently.
Some evidence has been found of adolescents holding different attitudes from adults. When investigating the social perception of (ing), Labov et al. (2011) explored the effect that the frequency of [ɪn] may have on a speaker’s perceived degree of professionalism. There was a clear and statistically significant trend for speakers with higher proportions of [ɪn] in their speech to be rated as less professional-sounding. Age was a major factor in differentiating respondents. While responses to the evaluation of variation in (ing) conformed to a logarithmic function for respondents above the age of twenty-three, all of whom demonstrated a reaction to variation in (ing), this was not the case for the group of younger respondents. Labov et al. (2011) take a developmental perspective when they conclude that sensitivity to the frequency of (ing) develops through adolescence and is mediated by social class.
In an experimentally based perception study that compared different variants of (ing), Schleef and Flynn (2015) too found variation in (ing) to be associated with different social meanings by two age groups in Manchester, England: a group of adolescents and those in very early adulthood, and an older age group. They argue that this is due in part to life-stage experiences rather than exclusively to a continually developing sensitivity to social meaning through adulthood. Schleef and Flynn (2015) also document sex- and class-specific evaluations of (ing).
Thus, it is reasonable to assume that not only the linguistic form is subject to variation, but the meaning of form and its underlying ideologies also vary (e.g., Johnstone & Kiesling 2008). It seems that people with similar expectations and experiences, groups of individuals who were socialized into similar beliefs and attitudes toward not only formality and displays of education, as Eckert (2008) argues, but also notions of “age-appropriate” masculinity and femininity, evaluate variation in (ing) similarly, at least to an extent.
Although Labov et al. (2011) and Schleef and Flynn (2015) have demonstrated an age-based difference for the stable variable (ing), this does not necessarily entail that adolescents and adults also associate (t) with different social meanings. The variables (ing) and (t) differ in many respects: T-glottalling is a change in progress in Manchester (Baranowski & Turton 2015), and it is frequently the subject of social comment (cf. Levon & Fox 2014 on the much weaker social salience of (ing) in the UK when compared to the US). The assumption that different age, class, or gender groups associate a feature with different social meanings simply may not hold for (t). The social meaning of variables that are subject to much social comment may be acquired faster and more homogeneously, resulting in a lack of group differences. This may be particularly the case for (t) in intervocalic position as the perceptibility of acoustic cues to place of articulation are maximized here. Conversely, Eckert (2008:471) speculates that indexical fields of sound changes in progress “may be less well defined than those of stable variables.” Such an assumption would predict the exact opposite, namely, weak and possibly very heterogeneous social meanings for T-glottalling when compared to the stable (ing).
In this paper, I will test the hypothesis that different large-scale social groups evaluate variable features differently through an experimentally based perception study in Greater Manchester, England. I will first present some background information on the status of T-glottalling in Britain, outline methods used, and then explore the social meanings of (t) among different age groups, sexes, and social classes. The results reveal that (t) is a highly salient variable in Greater Manchester and that most of its social meanings are shared by all respondents. Thus, the results differ from those made for (ing), not only in the number of significant scales but also in the extent to which respondents agree on the social meanings associated with (t).
2. Social Meaning and T-glottalling
This study focuses exclusively on glottal replacement of /t/ (also known as T-glottalling), at the exclusion of glottal reinforcement of /t/, often referred to as preglottalization (Wells 1982:260) or glottalization (Docherty & Foulkes 1999). T-glottalling can occur variably after a preceding sonorant in coda (e.g., what) or non-foot-initial onset position (e.g., water). 1
Within the possible manifestation sites, previous research has revealed variation in the variable (t) to be constrained by a small set of internal and external factors. These include the position of (t) within the word (e.g., Roberts 2006; Schleef 2013), preceding and following context (Milroy et al. 1994; Fabricius 2000; Roberts 2006; Straw & Patrick 2007; Schleef 2013), word frequency and morphological compositionality (Schleef 2013). Several social constraints have also been found to be relevant:
Style (e.g., Romaine & Reid 1976; Holmes 1997; Stuart-Smith, Timmins & Tweedie 2007). T-glottalling is usually less frequent in formal style.
Age (e.g., Macaulay 1977; Fabricius 2000; Roberts 2006; Stuart-Smith, Timmins & Tweedie 2007). T-glottalling is more widespread and possibly less stigmatized (Trudgill 1988:44) among younger speakers.
Gender (e.g., Holmes 1997; Fabricius 2000; Roberts 2006). Findings for gender often differ across studies. In an attempt to account for gender-related findings in previous research, Mees (1987) and Milroy et al. (1994) suggest that women tend to favor glottal replacement when it is associated with supralocal as opposed to local norms.
Socioeconomic class (e.g., Milroy et al. 1994; Williams & Kerswill 1999; Stuart-Smith, Timmins & Tweedie 2007). Generally, more T-glottalling occurs as one travels down the social scale; however, social class often interacts with gender and the prior history of T-glottalling in an area to generate unique patterns in different locales (cf. Mees 1987).
In mid-twentieth-century England, T-glottalling may have been quite limited to a particular geographic area. In the Linguistic Atlas of England, for which fieldwork was conducted mostly during the 1950s and which prioritized rural areas, it appears only in a small area in and around London and East Anglia (Orton, Sanderson & Widdowson 1978: map Ph239, quoted in Wells 1982:261). In London, T-glottalling may have been in use since at least the early 1900s (Andrésen 1968:18, quoted in Fabricius 2000:14).
In the Greater Manchester area, too, certainly word-final T-glottalling may have been used since the 1900s. Lodge (1984:39) conducted a study of a 77-year-old and a 16-year-old living in Stockport (southern Greater Manchester). He found the former using T-glottalling in syllable-final position before non-syllabic consonants, while the latter occasionally also glottalled (t) in intervocalic position. Shorrocks (1998:319-321) confirmed that T-glottalling is more widespread among younger speakers in Bolton, northern Greater Manchester, although he found the phenomenon to occur only rarely. Thus, it appears that in the Greater Manchester area, at least, T-glottalling in intervocalic position is a relatively recent phenomenon that has been more prevalent among younger speakers since at least the 1980s.
The general timeline suggested by Lodge (1984) and Shorrocks’s (1998) findings is roughly in line with results of a recent study conducted by Baranowski and Turton (2015). Based on an auditory analysis of interview data yielded from eighty-six speakers, Baranowski and Turton (2015) demonstrate that T-glottalling continues to be a change in progress in Manchester, in both word-final and intervocalic position. T-glottalling is highly sensitive to stylistic variation in Manchester, as it occurs more frequently in interviews than in wordlist style. Young speakers (11-30 years of age) glottal (t) significantly more than middle-aged speakers (31-54), who, in turn, glottal more than the oldest age group (55+). Across all age groups, T-glottalling is far more prevalent in word-final than intervocalic position. For all speakers, T-glottalling amounts to 86 percent in word-final and 47 percent in intervocalic position. While gender and social class are not significant for word-final position, which is in accordance with it being an advanced change nearing completion, gender and class are significant factors in intervocalic position: female speakers disfavor the non-standard variant, as do middle-class speakers. Yet these social categories interact heavily with speaker age, such that glottalling rates range between 45 percent and 73 percent for the different classes and sexes among young people, compared with 8 percent to 56 percent for the middle-age group and 11 percent to 24 percent for the oldest participants. Intervocalic T-glottalling is, thus, the more socially relevant linguistic context, as significant production differences have been found that may relate to the perception results of the current study.
Minimal work has been conducted on the social perception of T-glottalling. Fabricius (2000:140-141) is a notable exception, and her study confirms the difference in evaluation of (t) in different linguistic contexts. She conducted acceptability tests of word-final (t) among upper middle-class speakers in Cambridge, all of whom she assumed to be speakers of RP. Listeners heard two versions of the same sentence with one word-final occurrence of (t): in one of these sentences, (t) was glottalled, while, in the other, it was not. Different speakers heard these in a different order. Listeners were then asked to judge the acceptability of these pairs of sentences “according to which pronunciation they considered to be ‘standard, good, correct’” (Fabricius 2000:91). They were allowed to choose either, both or neither pronunciation by selecting the “don’t know” option.
Fabricius (2000) found that T-glottalling was judged as most “standard, good, correct” in the pre-consonantal environment with a mean rate of 63.6 percent. T-glottalling was less acceptable in the pre-pausal environment (13.3 percent) and least acceptable in prevocalic position (4.7 percent). This is crucial for the current study, as it suggests that it would be unwise to conflate all occurrences of T-glottalling: one is considered acceptable in what Fabricius calls “modern RP,” while others, especially prevocalic occurrences, are not.
Our assumed knowledge of the social meaning of T-glottalling is derived largely from macro-sociological production data. Micro-analyses of (t) that explore more nuanced social evaluation in production data, as in Kirkham and Moore’s (2016) study of (t) in Ed Miliband’s speeches or Schleef’s (forthcoming) study of immigrant teenagers in London, remain the exception. Based on the summary above, I could expect T-glottalling to be associated with working-class speech as well as that of young people. However, more indirect associations or even their strength are difficult to predict without administering perception tests. Similarly, uncovering whether different social groups evaluate (t) differently requires an attitude survey. I have outlined above that previous research could be read as making contradictory predictions in how (t) may be evaluated. Based on this discussion, this paper has two goals: to determine some of the social meanings of (t), and to test how different large-scale social groups of listeners in the Greater Manchester area evaluate variation in (t), as used by speakers in Manchester, England. 2 I will do this by conducting an experimentally based perception study applied to the sociolinguistic variable (e.g., Campbell-Kibler 2007; Labov et al. 2011).
3. Methods
3.1. Data Collection
Data collection was structured into four phases: (1) interviews; (2) voice tests; (3) focus groups; and (4) perception tests. The final category is the primary focus of this article.
Interviews were conducted with ten local students in a recording studio at the University of Manchester. These interviews were used to create perceptual stimuli for inclusion in focus groups and perception tests. A selection of short perceptual stimuli of, on average, twenty seconds was generated for every speaker. Speakers wore head-mounted microphones. Audio data were recorded using a Zoom H4 recorder at a sampling rate of 44.1 kHz. Interviews were structured to encourage participants to speak on a specific range of topics to ensure they all produced speech that was comparable in themes and topics. Speakers were all between 18 and 25 years old.
In order to avoid confounding factors in the perception experiments, an initial perception survey (or “voice test”) was conducted with fifty undergraduates. These initial surveys were used to determine how the voices of these ten speakers were rated in terms of the speakers’ perceived attractiveness and social and regional background. This was done on scales ranging from 1 to 6. Two female and two male voices were then selected for the next stages of the study; these four had received comparable age, attractiveness, and social-class scores. For example, only those voices were selected whose attractiveness mean rating was between 2.7 and 2.9, as these were closest to the median of ordered mean ratings. Thus, these voices were heard as belonging to Manchester speakers of average attractiveness and a lower middle-class background (where 1 was working class and 6 was upper class). They were also heard as between 18 and 25 years old, which was their actual age. The remaining speakers were excluded from further study. I use the synonyms Marie, Mandy, Morgan, and Max for these four in the remainder of the paper.
In preparation for the next two stages of the investigation, perceptual stimuli were created using the interviews with these four speakers. Based on the occurrence of (t), for every speaker one extract of circa twenty seconds was selected that contained four tokens of prevocalic (t); in particular, the focus was on the phonetically salient environment of intervocalic (t).
In order to create perceptual stimuli, the four speakers attended a second recording session. During these sessions, each speaker was shown the orthographically transcribed extract from their interview on a computer. They also heard each original extract and were asked to re-enact the text, matching speed and prosody of the original, by first substituting highlighted prevocalic occurrences of (t) with [t], then with [ʔ]. All speakers used (t) variably in their natural speech and had no problems producing both variants. Using these two versions of the same extract, the perceptual stimuli were then created in Praat (Boersma 2001).
Realizations of [t] and [ʔ] from the re-enactments were cross-spliced into the original recording by first deleting the original (t)-realization and then replacing it with [t] or [ʔ], ensuring audible perturbation was at a minimum. This resulted in two new versions of the original, each of which contained one of the variants under investigation. To ensure these manipulated versions sounded natural and were comparable, their intensity and intonation contours were also modified, if necessary. Example extracts are provided below: Morgan: People just have these (.) paint colors around them. They, they, they throw them on people. It’s a day of celebration really. It doesn’t ma Max: It looks like uhm a shop. Sor
In terms of topic and voice quality, the four excerpts differ quite substantially; however, the statistics allows us to explore these differences between speakers and discover their pertinence to the evaluation of (t). There are also some minor accent differences between speakers. All four speakers are clearly recognizable as speakers of Northern English. They use the typical Northern /ʊ/ and /a/ sounds in words such as but and bath respectively. All four speakers are somewhat nasal. Vernacular features that are used occasionally are h-dropping, [ɪn] for [ɪŋ], and intervocalic T-glottalling. All tokens of intervocalic T-glottalling have been manipulated and either occur as [t] in one version of the excerpt or [ʔ] in the other.
The [ɪn] variant is the majority variant of these speakers in conversational speech. The use of (ing) of the speakers used in these stimuli was analyzed in Schleef, Flynn, and Ramsammy (2015:202) and was found to amount to a total of 81.6 percent of [ɪn] use. Both occurrences of (ing) in Morgan’s guise are [ɪn], and Marie uses [ɪn] in reminding. Marie also drops an [h] in the auxiliary has. H-dropping in function words is not uncommon in most varieties of English, and this should not be considered a particularly striking feature in her excerpt. Max is the only speaker who may be using the discourse particle like in his excerpt: “Coz it’s like metal shutters.” Whether or not he does hinges on what precisely it refers to. This is not completely clear, and if it refers to an object the occurrence of like may in fact be an adverb, like all the other occurrences of like in his excerpt, and thus, not a vernacular but a standard grammatical feature.
Realizations of (t) in final, pre-consonantal position were not manipulated; rather, they were left as in the original. Since all four speakers glottalled (t) habitually in word-final, pre-consonantal position, both versions of the experimental stimuli appeared natural. This is because T-glottalling is more likely to occur in word-final, pre-consonantal than intervocalic position (Baranowski & Turton 2015). Glottalling rates of all four speakers are in line with this finding; however, considering Fabricius’s (2000) argument that pre-consonantal T-glottalling is acceptable in modern RP, it would be unwise to categorize this phenomenon with vernacular features such as h-dropping in nouns. Both Max and Morgan appear to glottal or delete seven tokens of pre-consonantal (t), while Marie has five such tokens and Mandy has four. These amount to about half the pre-consonantal (t) tokens being realized as [t] and the other half as deleted, glottalled, or unclear. It was essential to maintain this mix of [t] and non-[t] realizations in these excerpts as this is a crucial feature of Manchester speech.
Again, it is important to keep in mind that these non-standard and variable features occur in exactly the same way in the [t] as well as the [ʔ] guise. While they may influence the evaluation of guises, it is unclear whether they will do so in similar manners with each [t] and [ʔ] guise or whether the [t] or [ʔ] in different guises may interact with other features in these four base guises in various different ways. More detail on how we can separate out voice and variable will be provided in section 4.
In making a decision on how many tokens to manipulate, I took my cue from Labov et al. (2011:438-439). When manipulating the frequency of [ɪn] in an excerpt, they found that significant differences in evaluation between token frequencies disappear as the token frequency increases. While ratings ranged from 1.65 to 4.13 on their Likert scale when 1 versus 3 [ɪn] tokens were used, the range of 3 versus 5 tokens was only between 4.13 and 4.70. Moreover, the difference loses significance completely after 5 tokens; that is, evaluation changes only minimally. As the number of stimuli that could reasonably be used was limited, frequency manipulation was not an option. This is why I decided to probe that attitudinal spectrum where evaluation stabilizes and an increase of vernacular tokens makes only small differences in evaluation. This was crucial as pre-consonantal tokens were not manipulated and left as in the original—in both versions of each speaker guise.
Pairs of stimuli were first played to small focus groups of undergraduate students at the University of Manchester, and impressions of the excerpts were invited based on a structured set of questions. This was done in order to collect relevant terms used by listeners to express their evaluations, to collect qualitative information on links of a particular feature to certain ideologies, social meanings or identities, and to test stimuli reliability. During these discussions, participants appeared to remain unaware of the manipulated nature of the stimuli, although this was not tested directly. Nonetheless, this suggests that the stimuli were suitable for the next stage of the investigation; the large-scale online perception study.
Five online surveys were generated using Fluidsurveys with the goal of eliciting information on the evaluation of the stimuli pairs. Four doublets (one from each speaker) were distributed across these five surveys to keep the length manageable. These surveys also included stimuli for the variable (ing), which is not part of the current study. Table 1 presents the stimuli distribution across these five subsurveys. Those considered in this paper are marked in bold. Note that Marie, Morgan, Max, and Mandy talk about different things; therefore, a participant who took subsurvey 1 heard not only four different speakers, but also heard them speak on four different topics and using four different variants of interest. 3
Stimuli Overview
Since (t) is the focus of this paper, for each of the four speakers, I compare only four stimuli with [t] realizations with the exact ones with [ʔ] realizations. Apart from this one variable, everything else in each pair was completely identical. The (ing) stimuli do not concern us here. The intended side effect of including stimuli for two studies in these subsurveys was to eliminate the need for further distractor stimuli. This was convenient because each subsurvey could include a maximum of four stimuli, one from each speaker; otherwise, the survey would have become too lengthy. Thus, stimuli were structured in such a way so that each participant heard at least one (t)-stimulus and two (ing)-stimuli. This ensured no participant heard the same extract twice, and it made it impossible for listeners to identify the target segment, as each listener heard only one part of a stimulus pair. This between-subjects design contrasts with the within-subjects design (where every respondent hears both pairs of a stimulus). Studies that compared both methods (e.g., Greenwald 1976; Keren & Raaijmakers 1988; Charness, Gneezy & Kuhn 2012) found the within-subjects design more suited for the elicitation of direct subjective opinions and other experimental types; however, in experimental setups where the same speaker repeats the experiment, they create undesirable practice and demand effects, sensitization, and carry over (Greenwald 1976:314). The between-subjects design appeared more suited for the current study as it reduced survey time, carry over, practice, and boredom effects, while also circumventing the problem of the listener recognizing the speaker when hearing a stimulus twice with only slight modifications.
Access to one of the surveys was randomized in order to balance uptake. To participate in the survey, respondents were required to have lived in Greater Manchester for at least fifteen years; therefore, the survey consists of people from Greater Manchester hearing Manchester stimuli. Consequently, I am not testing how evaluations may be different if respondents hear (t) in a variety that differs radically from their own.
Respondents were recruited in two ways: flyers and emails were sent to schools and universities for distribution to pupils and students; and the survey was advertised on a social networking site. Once a respondent completed the survey, they were sent a thank you email asking them to pass on the survey to their family and friends. Respondents were reimbursed for their efforts with a five pound gift certificate.
A set of social attributes, social types, and local personae was generated based on focus-group responses and previous research (Zahn & Hopper 1985; Campbell-Kibler 2007, 2011). Responses from focus groups enabled the selection of relevant terms actually used by listeners when evaluating speech excerpts, for example, “thick” or “uptight.” Simultaneously, we were interested in using other terms that turned out to be relevant in similar studies, for instance, “articulate.” Scales were randomized in each individual survey based on a procedure within Fluidsurveys in order to prevent item order affecting the results. A new item order was created for each individual participant accessing a survey. After hearing each stimulus, listeners were invited to rate it on a selection of features. A list of words was provided in the form of seven-point semantic differential scales, checkboxes, or rating scales, and informants were asked to indicate the degree to which these words applied to the voice they heard (see Appendix B for a sample survey and the social attributes included in the survey).
Furthermore, a map task was included with the purpose of eliciting data pertaining to the perceived origin of the speaker. Participants were asked to guess from which area the speaker hailed. We were interested in finding out whether one of the two variants may be more strongly associated with Manchester than the other when embedded in the same Manchester guise. Since it was impossible to ask participants to draw maps online, we provided a map on which a circle was drawn around six areas within England, one of which was Greater Manchester. For the purposes of this study, a circle around Greater Manchester and all of England would have sufficed, but would have made the task too transparent. Therefore, further distractor options were added: the Northwest, northern England, London, southern England and all of England. In the statistical analysis, we contrasted only Greater Manchester with other areas. 4
Finally, a small selection of locally relevant social types and personae were included in a tick-box format: school teacher, show-off, ring-leader, conformist, geek, student, snob, Mancunian (an inhabitant of Manchester), West Didsbury, Moss Side, Chorlton, Stockport, and Salford. UK deprivation indices show that West Didsbury is fairly affluent, Chorlton and Stockport take a middle position, and Salford and Moss Side are ranked at the lower end of the socioeconomic spectrum (Manchester City Council 2011:32; Department for Communities and Local Government 2010). There was a limit to how many scales and checkboxes could be included, as long surveys tend to reduce response rate and data accuracy. Therefore, when selecting scales, social types, and places, priority was given to those mentioned during focus groups, as they were more likely to be relevant to the features of interest. At the very end, the survey included questions about the respondents, such as self-reported sex, age, and social class.
3.2. Data Analysis
A total of 732 responses were received for (t) across the five surveys. Incomplete surveys and respondents with non-UK IP addresses were removed. Surveys with response times of less than five minutes were also disregarded (as it would take at least five minutes to take the survey). This left a total of 527 responses: 175 were male and 352 were female; 198 considered themselves middle class and 329 working class. The age distribution of respondents was as follows: 15-19 (187), 20-25 (164), 26 and above (176). The latter was broken down into smaller groups of 26-35 (36), 36-45 (79), and 45+ (61). I must note that age was not entered into the statistical analysis in these categories, but continuously, while the data collection focus was on contrasting teenagers with those in their lower twenties, as this is the age when developments should be most pronounced (see section 1).
Respondent ratings were subjected to statistical testing using R (R Core Team 2014). Data from matched guise studies are usually first subjected to a factor analysis, a technique that uncovers whether response patterns on a number of scales can be explained by a smaller number of underlying factors (Streiner 1994:135). 5 Factor analysis is unsuitable for dichotomous data (Streiner 1994:140); therefore, it was not applied to checkbox responses.
In the first step of a factor analysis, the number of factors is determined that best match the data set. I conducted a parallel analysis using a function available in the psych library in the statistics program R. This reveals the ideal number of factors for a particular data set; in other words, the data set is searched for patterns and similarities in evaluations and broken down into a number of factors. The number of factors is then entered in the instructions to run the factor analysis, which is the second step. This yields an output similar to Table 2. In the final step, it must be decided which scales truly load together, that is, most strongly. Scales that only load weakly on a factor or that load on several factors should not be conflated with other scales. I considered a scale to be loading on a factor if it has a primary loading of at least .55, selecting a middle value between those recommended by Zahn and Hopper (1985:117) and McCroskey and Young (1979:380). Moreover, I added the extra criterion that, if a scale loads on more than one factor, the secondary loading must not exceed .30 to consider a particular scale to be loading at all. Scales that met these criteria were conflated to a smaller number of factors in any further statistical investigation. For example, Table 2 indicates that the scales
Results of Factor Analysis for the Greater Manchester (t) Data Set
Note: Test of the hypothesis that 6 factors are sufficient: the chi square statistic is 657.22 on 247 degrees of freedom. The p value is 5.52e-39. Six factors explain 45 percent of the variation.
This scale was intended to access information on the perceived “correctness” of an accent, but many participants may have understood it as a character trait. Although an expected statistical result emerged, it should be regarded with caution.
All voices were provided by university students. The pilot survey revealed that participants heard voicesas quite young, which is why narrow age categories were selected up to thirty-five, with wider categoriesabove this age (see Appendix B).
Evaluations for each scale or group of scales (= factors) were subjected to statistical testing using mixed-effects linear regression in R for the rating responses. As random effects, I had intercepts for respondent, as well as by-respondent random slopes for the effect of variant. Logistic regression was used for checkbox answers (social types and personae) and perceived voice origin. The ratings for social attributes, social types and perceived origin were treated as the response variable with the following contrast-coded, fixed-effect predictors:
Variant present in the stimulus: [t], [ʔ]
Respondent sex: male, female
Respondent class: working class, middle class
Respondent age, with a minimum age of 15: continuous
Speaker: speaker 1, speaker 2, speaker 3, speaker 4.
I chose to include speaker as a fixed, rather than a random effect, in order to check in a transparent, reader-friendly manner whether any patterns detected hold across all speakers. I also checked for interactions between variant and all other predictors. A step-down method was used to construct the most efficient model. All factors and interactions were included initially in the baseline model, after which the non-significant factors were removed one-by-one. ANOVAs were used to test the improvement of successive models.
4. Results
4.1. Factor Analysis
Following the procedure outlined in the methods section, I first ran a factor analysis on the whole data set, followed by an analysis of small data sets of single speakers’ ratings. Factor analyses based on individual speakers are roughly in line with the results for the whole data set: all reveal that six factors are sufficient to describe the data. Table 2 provides an overview of factors with a loading above .30 and below −.30 for the whole data set. All six factors include more than one scale with a primary loading of at least .55, and no secondary loading that exceeds .30. These scales were conflated and the factors were named as follows:
Regression models have then been calculated for the results of perception tests for all conflated and unconflated social attributes and social types. These models consider the main as well as the interaction effects. They reveal that the variable (t) is a highly salient and socially relevant variable in Greater Manchester, as the evaluations for the two variants differ with regard to a total of twenty-three social attributes and social types at a statistically significant level.
In this study, the risk of cumulative error is somewhat increased since many of the scales access similar evaluative domains. In addition, I felt it was important to conduct multiple comparisons with different speaker reference levels in order to check to what extent the speaker influences results. All data provided here are based on Max as the reference level. However, in order to check all possible speaker comparisons, two additional models, with Mandy and Marie as reference levels, were constructed. I used the Bonferroni correction (Baayen 2008:106) to address the risk of cumulative error by dividing the alpha value of .05 by the number of comparisons, three, which results in a significance level of .017, below which predictors were considered significant and included in the statistical model. Additional speaker comparisons resulted in an alternate result only once: regarding the scale of perceived age.
Despite this low p value, there are a very high number of significant results. Because of this, I am unable to discuss in detail all social attributes and social types; nor am I able to present all models. Instead, I will use the results of the factor analysis to structure data presentation and provide summaries and graphs of significant results. Full models are available from the author.
Tables 3 to 6 list relevant scales/categories (column 1), intercept and significant factor levels (columns 2), the estimates for [ʔ], and the p values. For example, the third attribute in Table 3 reveals how respondents in Manchester rated the two guises of the four speakers on their perceived level of articulateness. The numbers indicate that
Summaries of Best Mixed-Effects Models for Social Prestige Factor Group and Similar Scales (N = 527)
Note: Reference levels are [t], middle class, female, and Max (GS = glottal stop).
Summaries of Best Mixed-Effects Models for Formality, Dynamism, Urbanity and Class Scales (N = 527)
Reference levels are [t], middle class, female, and Max (GS = glottal stop).
Summary of Best Mixed-Effects Models for Perceived Age Scale (N = 527)
Note: Reference levels are [t], middle class, female, and Max (GS = glottal stop).
Summaries of Best Mixed-Effects Models for Social Types and Places (N = 527)
Note: Reference levels are [t], middle class, female, and Max (GS = glottal stop). variant is not part of the final model for the following scales: snob, show-off, conformist, geek, ring-leader, student, mancunian, stockport, west-didsbury, and chorlton.
There is a significant main effect for
It is possible for the evaluation of a voice to influence how [t] or [ʔ] is evaluated, i.e., voice and variant may interact, and I am seeking to identify such cases. Such effects have been documented (e.g., Campbell-Kibler 2009), and I would expect interactions between
I will now turn to the results for the remaining scales. As already intimated, I observe the following three types of effects:
Main effects for
Interaction effects between
Main effects for respondent
This study focuses on (1) and (2), as a social group main effect does not indicate a difference in evaluation of the two variants.
4.2. Social Prestige
Table 3 lists all significant factors for the social prestige factor group and related scales. These “related” scales are connected insofar as the factor analysis assigned these scales to the social prestige factor group; however, the factor values for these scales do not rise above .55. In other words, their association with this scale is weak. The scales

Shared Social Meanings I
4.3. Informality, Dynamism, Urbanity, Gender, Age, and Gregariousness
As I move to the remaining simple and conflated scales, only about half achieve significance. These are listed in Table 4 and Figure 2. Respondents rated speakers in [ʔ] guises as significantly more informal (casual, down-to-earth, laidback) and gregarious (confident, outgoing). Speakers in [ʔ] guises also appeared more working-class. Respondents appear to also associate [ʔ]-guises with a certain northern urbanity. Note that, so far, there is agreement among social groups in terms of the relationship between [t] and [ʔ]. Not a single interaction effect has emerged so far. Table 7 in Appendix A also illustrates that these results hold for all four voices.

Shared Social Meanings II
When we reach
The situation is clearly more complex and requires some additional analysis. Figure 3 visualizes the mean age results for the four speakers. It highlights that evaluations follow a clear trend in that the glottal variant is always heard as coming from a younger speaker than the [t]; however, this trend is much more pronounced for Mandy than for Max, the reference value. In fact, Max stands out as the difference in evaluation for his guise is minimal. Generally, he is heard as quite young, as indicated by his values in Figure 3, which appear on the “younger” half of the zero line. The key to how this youth is indexed must lie in his excerpt or voice quality. While what he says does not sound particularly youthful (see above), the fact that he is the only person who may be using a token of the discourse marker like, may make him appear younger (see Tagliamonte and D’Arcy 2009 on the high use of discourse marker like by adolescents). Conversely, this result may simply be due to his higher pitched voice when compared to the other male. This too may index adolescent youth. No matter what the reasons are, the appropriate generalization is that if a voice sounds very young, youth effects for [ʔ] may be blocked. This blocking is limited to this particular scale as no other speaker-related interaction effects emerge.

Evaluations of Perceived Age
However, extreme speaker-specific evaluation may obscure other effects. When Mandy and Marie are entered as reference values,
Thus, there is evidence of a continued, yet conditional, link of T-glottalling to a young, urban, laidback, and outgoing youth culture. Other scales and concepts that might have been expected to be significant in regards to youth culture, are not, for example,
4.4. Social Types and Places
Finally, thirteen social types and places, in addition to the map task (only contrasting Greater Manchester and other than Greater Manchester), were included in the analysis. Respondents rated speakers in [ʔ] guises as significantly more likely to come from Moss Side, Salford, or Greater Manchester than those in [t] guises. They further rated speakers in [t] guises as significantly more teacher-like (see Table 6 and Figure 4). All these concepts were binary; thus, logistic regressions were performed.

Shared Social Meanings III
Results for location provide evidence of an interesting relation between class and place. I have demonstrated above that the places considered could be ordered as follows in relation to their social prestige, with Moss Side the least prestigious: West Didsbury > Chorlton > Stockport > Salford > Moss Side. This is reflected in significant results for (t), in that glottal stop guises are associated with Salford and Moss Side, while [t] guises are connected to no particular place. The results also reveal that participants (all of whom came from Greater Manchester) associate their home region with T-glottalling—far more so than with other parts of the English North and South. T-glottalling seems to authenticate the accent; it makes it more Greater-Manchester-sounding, in combination with the other features exhibited by the four speakers. Alternatively, or additionally, this could be the result of an ideological extension of the idea of “Manchester” to the working classes.
6
The fact that working-class patterns with the
Finally, I would like to comment briefly on the few scales in Tables 4 to 6 where respondent
5. Discussion and Conclusion
Motivated by documented group differences in the evaluation of the variable (ing), this study set out to explore whether social group differences can also be found in the evaluation of (t); that is, I was particularly interested in interaction effects between the variants and any of the social groups. However, not a single interaction effect between social group and variant was uncovered. This contrasts sharply with findings for (ing).
Schleef and Flynn (2015) found variation in (ing) to be associated with different social meanings by two age groups in Manchester, England: a group consisting of adolescents and early adults, and an older age group. In Manchester, a third variant of (ing) exists alongside the usual [ɪŋ] and [ɪn]: [ɪŋɡ]. Social meanings differ between age groups on three scales: articulateness, poshness, and reliability. When compared with the youths, those in the older age group consider [ɪŋ] to sound substantially more articulate than [ɪn] (p = .016), as well as posher (p < .01) and more reliable (p < .01) than [ɪŋɡ]. For the latter two scales evaluations are reversed for the younger respondents who consider [ɪŋɡ] more reliable and posher-sounding than the older respondents.
In the same study, sex and class-specific evaluations of (ing) are documented for the
Of course, one must exercise caution when interpreting null results and when trying to make generalizations. Nonetheless, considering that data for (ing) were collected in the same study and social group differences were detected, I believe some tentative generalizations are certainly possible. The appropriate conclusion seems to be that, when it comes to variation between [ʔ] and [t], different social groups appear to share attitudes toward both variants, in that one variant is ranked higher or lower than the other by all social groups. Thus, regarding (t) in Greater Manchester, the meaning of form varies only minimally, and attitudinal differences overlap only to a limited degree with large-scale social groups. Nonetheless, some valuable and important results have emerged in this study. I have established a shared set of social meanings for the variable (t) in Greater Manchester, and uncovered some minor differences in the evaluation of specific speakers, which may point toward a blocking effect when a voice is heard as very young. I will also be able to explore what results can tell us about the nature of perception, as social categories do seem to matter for some variables, notably (ing), but not for others.
This is the point upon which the remainder of this discussion will focus. Variation in (t) is associated with a large number of potential social meanings, and it appears to be very socially salient. Levon and Fox (2014) define social salience as “the relative ability of a linguistic variant to evoke social meaning.” This ability of (t) is very pronounced, particularly when compared with the results of the (ing) data. Not only are there many more significant scales for (t) than for (ing) (14 versus 23), significant social attributes for (t) are also more socially relevant. More social types and places (e.g., teacher, Salford) are significant for (t) than for (ing). Class and several of its place stand-ins are significant for (t) but not for (ing). Eckert (2008:471) speculates that indexical fields of sound changes in progress “may be less well defined than those of stable variables.” Baranowski and Turton (2015) have demonstrated that (t) is indeed a change in progress in Manchester. Thus, if one assumes that “less well defined” means (1) fewer significant scales and (2) more heterogeneous evaluation across social groups, it seems that these are traits we have uncovered for the stable (ing) rather than the changing (t). However, as one of the reviewers of this article pointed out, one could argue that the diversity of significant scales associated with (t) are in fact an indication of a low degree of indexical definition. This is plausible but unlikely. Considering that these different scales tap into similar concepts (see section 4.1), the indexical field of (t) is less diverse than it looks. In addition, less definition and more diffusion is likely to result in a reduced degree of statistical significance rather than the kind of significant results observed for (t).
Why is it then that evaluation of (t) is so homogeneous and does not differ between social groups? The strong reaction of respondents to (t)-stimuli suggests that attitude strength is very pronounced for intervocalic (t), while attitude strength toward variation in (ing) is somewhat weaker. I use the term “attitude strength” in the psychological tradition (e.g., Fazio 1986, 2007), as this line of research represents promising interconnections to attitude research in the sociolinguistic tradition. The notion of attitude strength is based on the idea that situations in which listeners have encountered a specific form impact the relative attitude strength that listeners maintain toward this form. Here, attitudes are viewed as associative relationships between an object and a summary evaluation of the object as it is stored in a listener’s memory (Fazio 2007; Bohner & Dickel 2011). When we encounter a relevant object, for example, T-glottalling in intervocalic position, these summary evaluations become activated. In this framework, different types of attitudes are differentiated.
There are attitudes that become activated automatically, whenever the object is encountered. However, there are other attitudes that are not processed in an automatic, but rather a more deliberate fashion (Fazio 2007). Attitude strength refers precisely to this variability in attitude activation. Levon and Fox (2014) have previously applied the concept to variation in (ing). They argue that listener attitudes toward (ing) are more automatically accessible in the US than they are in the Britain; in other words, they are stronger. Based on data from a London-based experimental study that focused on the scale of professionalism, they argue that accessing the link between (ing) and professionalism requires a lot more deliberate processing from their British respondents than was possible in their experimental setup. British listeners, Levon and Fox (2014) argue, are less certain of an association between (ing) and levels of professionalism, and they back up their contention with indirect data comparing the social distribution of (ing) in Britain and the US. This move is based on the notion that “it is the social distribution of a variable that ultimately determines the strength of listeners’ attitudes towards a given linguistic form” (Levon & Fox 2014:200). In regards to (t) in Manchester, I have shown in the literature review that the variable is stratified quite heavily in intervocalic position, demonstrating significant differences for each age, social class, and speaker sex. There are no comparable (ing) data for Manchester but Levon and Fox (2014) have already made the point that stratification of (ing) is not very pronounced in Britain. For example, for the Northern city of York, only social class emerges as a social factor, and even here, the difference between working and middle classes amounts to only 15 percentage points (Tagliamonte 2004:397). If stratification in production influences attitude strength, one would expect the attitude strength to be more pronounced for (t) than that for (ing).
The current findings support the view that (ing) in England, Manchester in particular, is a variable with somewhat weaker attitude strength than that of intervocalic (t); that is, variation in (ing) may not be processed in an automatic fashion for most listeners. On the other hand, T-glottalling in intervocalic position may in fact be the kind of form that is processed in an automatic fashion for the majority of listeners. This would account for the large number of significant scales. Participants are more certain about their attitudes (this is often referred to as “attitude commitment”), but T-glottalling may also be viewed as important to the sense of self of Mancunians (this is often referred to as “attitude centrality”), as results for the map task reveal that listeners associate this form with Manchester. Attitude commitment and centrality are two important factors that influence the automaticity of the response (e.g., Holland, Verplanken & van Knippenberg 2003). However, I would like to take these thoughts one step further.
While this argumentation simply matches the current findings to a theoretical model post hoc, I believe that it is still useful as it allows me to elaborate on these findings in a theoretical framework and make further predictions. It was the main goal of this paper to come to a better assessment of what kind of variables may be linked to group-specific evaluations. Based on a comparison of (ing) and intervocalic (t) in Greater Manchester, it would seem that it is in variables with strong attitude strength where group-specific variables might not occur. For variables with weaker attitude strength, more heterogeneous evaluation may occur because attitude commitment is lower: listeners are less certain about what precisely a variant may mean so different social groups may come to slightly different conclusions about the same variable. Based on a similar background, members in a social group may gravitate toward similar meanings. This hypothesis must be tested against additional variables. If it turns out to be correct, we must conclude that it is not primarily the fact that a variable is changing or not that determines how well-defined and homogeneous its social meanings are; rather, it is the degree of automaticity with which the link between a variable and attitudes are processed, which may of course interact with the status of a variable with respect to change.
In conclusion, it would make sense to assume the following meaning-making pathway:
The social distribution of (ing) and intervocalic (t) influences attitude strength; that is, the automaticity of attitude activation, particularly their attitude commitment: how certain listeners are about their attitude. The social distribution appears more “blurred” for (ing) (following Levon & Fox 2014:209) and more pronounced for intervocalic (t), which then results in different degrees of attitude strength.
As a result of listeners’ lower or higher attitude strength, listeners’ evaluations of guises are less or more pronounced, which is reflected in my statistical results.
As a further result of listeners’ attitude strength, group-specific evaluations of a variable may be more or less likely: high attitude strength may result in fewer group-specific evaluations, whereas lower attitude strength may result in more.
The last point, in particular, demonstrates the usefulness of a model that can distinguish varying degrees of automaticity, in addition to the importance of keeping production and perception distinct. Of course, production and perception are related, as indicated clearly in point 1 above; however, the precise nature of that relation may be rather complex. Both (ing) and (t) are stratified in production; nevertheless, perception is stratified only for (ing), not for (t). Campbell-Kibler (2012) and Levon and Fox (2014) have made similar points on the distinctness between production and perception. The current study adds to this discussion by providing evidence for a potential relation of group-specific evaluations to the concept of attitude strength. There is no doubt that significant further research is required to elaborate on and refine these tentative claims, which are based exclusively on macro-sociological data. However, I believe that the current study has been successful in specifying in greater depth how social background and social meanings may interact and, additionally, what theoretical challenges lie ahead.
Footnotes
Appendix A
Mean Evaluation by Speaker
| Speaker | Variant | Teacher | Moss Side | Salford | Greater Manchester |
|---|---|---|---|---|---|
| Marie | GS | −0.87 | −0.49 | −0.31 | 0.44 |
| T | −0.82 | −0.82 | −0.74 | 0.12 | |
| Mandy | GS | −0.87 | −0.78 | −0.59 | 0.05 |
| T | −0.50 | −0.96 | −0.73 | −0.38 | |
| Morgan | GS | −0.74 | −1.00 | −0.87 | −0.19 |
| T | −0.63 | −1.00 | −0.87 | −0.20 | |
| Max | GS | −0.93 | −0.90 | −0.70 | 0.30 |
| T | −0.90 | −0.97 | −0.81 | 0.08 |
Appendix B
Representation of sample survey excluding practice question and map task. Note that scales were randomized for each individual participant and that layout, font, etc. differed somewhat as this was automatically adjusted in Fluidsurveys for different computers and internet platforms.
Acknowledgements
I am grateful to Miriam Meyerhoff and Danielle Turton for their expert advice and patience in answering my questions, and I would like to acknowledge the work of Michael Ramsammy, who set up the survey for this study and composed an initial report of data collection procedures.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the UK Economic and Social Research Council (ESRC, grant RES-000-22-4490).
