Abstract
Archaeological and molecular data suggest that horses were domesticated comparatively recently, the genetic evidence indicating that this was from several maternal haplotypes but only a single paternal one. However, although central to our understanding of how humans and environmental conditions shaped animals during domestication, the phenotypic changes associated with this idiosyncratic domestication process remain unclear. Using geometric morphometrics on a sample of horse teeth including Pleistocene wild horses, modern Icelandic and Thoroughbred domestic horses, Przewalski’s wild horses of recent age and domestic horses of different ages through the Holocene, we show that, despite variations in size likely related to allometry (changes to bone size in proportion to body size), natural and artificial selective pressures and geographic and temporal heterogeneity, the shape of horse teeth has changed surprisingly little over thousands of years across Eurasia: the shapes of the premolars of prehistoric and historic domestic horses largely resemble those of Pleistocene and recent wild horses. However, this changed dramatically after the end of the Iron Age with an explosive increase in the pace and scale of variation in the past two millennia, ultimately resulting in a twofold jump in the magnitude of shape divergence in modern breeds. Our findings indicate that the pace of change during domestication may vary even within the same structure with shape, but not size, suggesting a ‘long-fuse’ model of phenotypic modification, where an initial lengthy period of invariance is followed by an explosive increase in the phenotypic change. These observations support a testable model that is applicable to other traits and species and add a new layer of complexity to the study of interactions between humans and the organisms they domesticated.
Introduction
Horse domestication had a profound effect on humans (Anthony, 2007; Clutton-Brock, 1999; Olsen, 2006). The timing of horse domestication around 3500 BCE (Outram et al., 2009), and the subsequent extirpation of wild progenitors, potentially correlates with important environmental shifts, that is, expansion and contraction of forest cover (Warmuth et al., 2011). However, reconstructing the history of this event has proved challenging (Lippold et al., 2011; Steiner et al., 2013; Wade et al., 2009); a deeper knowledge of the processes that underpinned horse domestication has important implications for our understanding of both the human groups that first undertook this particular domestication event (or events) and the ecological setting within which domestication was initiated. Critically, with the expansion of domestic populations, our ability to distinguish between discrete groups of wild and domestic horses diminishes to nil (Bendrey, 2012).
Maternally inherited mtDNA of modern horse breeds demonstrates a high degree of variability, with an exponential population expansion starting 6000–8000 years ago (Lippold et al., 2011). Genomic analyses confirm that horses did not undergo strong genetic bottlenecks and indicate close relationships among breeds and even among domestic horses and their closest living relative, the Przewalski’s wild horse, Equus przewalskii (Wade et al., 2009). The domestication process is strongly sex-biased (Wade et al., 2009): differences in paternal-inherited DNA are small and all modern breeds share the same Y chromosome haplotype (Lindgren et al., 2004). Overall, the genetic evidence suggests that mares from multiple populations contributed to the gene pool but that only a few stallions were domesticated.
Archaeological data supporting western central Eurasia as a likely centre of early horse domestication (the presence of mares’ milk and carcass residues in ceramic vessels, alongside pathological markers and bit-wear indicative of horse riding c. 3500 BCE: Outram et al., 2009) are now corroborated by autosomal genotype data (Warmuth et al., 2011, 2012). These same archaeological lines of evidence point to an established and well-developed economy that included the exploitation of a range of secondary products (Bendrey, 2011; Outram et al., 2009), not just the horse’s ability to transport people and goods. Thus, despite strong consensus that the horse was domesticated for riding, we still need a better assessment of the overall process of domestication and how both the socio-economic and environmental setting influenced this event over the course of time and across space.
Investigating morphological responses
As selection acts on phenotypes, the manner in which behaviour and morphology have been shaped by human–animal interactions, and with the environment, is fundamental to our understanding of the process of domestication. Although genetic approaches have elucidated phenotypic features such as coat colour (Ludwig et al., 2009) and gait variability (Andersson et al., 2012), we have yet to make strong connections between the behavioural and anatomical correlates of genetic changes. Furthermore, at least some aspects of caballine equid morphology potentially reflect adaptation: larger phalanges in glacial horses might be an adaptation to heavy grounds (Bignon and Eisenmann, 2002); shorter muzzles in cold environments follow the prediction of Allen’s rule (Eisenmann and Baylac, 2000).
Teeth offer an excellent starting point to address adaptive phenotypic responses. They are one of the most frequently recovered faunal elements within an archaeological setting, they are shaped by natural selection to meet functional demands in relation to diet and they often mirror important features of a species’ environmental setting (Evans, 2013; Kaiser and Schulz, 2006). Indeed, the term ‘dental ecology’ has been used to illustrate the power of teeth in informing responses to the environment (Cuozzo and Sauther, 2012). Using geometric morphometrics (GMM; Adams et al., 2013), we explored premolar variation in a unique data set, in terms of spatio-temporal variation. The samples span Eurasia, with the majority of the archaeological specimens dating to c. 4500–2000 years ago (Figure 1a; sample details in Supplementary Information, available online). This period is well after initial domestication but during a time when populations of wild horses were still present (Warmuth et al., 2011). GMM provides a robust framework within which to study both size and shape data, resulting in crucial phenotypic evidence that can serve as an important complement to molecular techniques such as DNA and isotope analysis. The method offers other advantages: for example, it is non-destructive, and large samples can be assessed with minimal expense, as the technique itself is low-cost. GMM offers a significant, and relatively new, addition to the arsenal of techniques that capitalizes on an abundant primary archaeological resource, faunal remains, to address key issues about human–animal–environmental interactions. Thus, GMM offers a powerful toolkit to assess how much a biometric marker (Houle et al., 2010), such as the premolar, has been shaped by interactions with humans and the environment during domestication.

(a) Geographic distribution and age of samples, (b) premolar landmark configuration and (c) number (N) of individuals per sample; box plots of premolar centroid size and icons (not to scale) emphasizing premolar size clusters: small (light grey), medium (grey) and large (black).
Materials and methods
Sample summary
Our comparative framework for the archaeological samples consisted of wild horses and two modern breeds, Icelandic (ICE: n = 50) and Thoroughbred (THB: n = 18). The wild specimens belong to an extinct Late Pleistocene population from Šandalja, Croatia (CRO: n = 19), dated to 40,000–8200 yr uncal. BP and potentially Equus ferus – although there is no consensus on the number of wild horse species from this time period – and a small sample of modern Przewalski’s horses (PRZ: n = 4). The latter is the only living wild horse and likely retains primitive traits (Groves and Ryder, 2000). The remaining archaeological samples were predominantly of domestic individuals. These include the following: Bronze Age specimens from Berel, in the Katonkaragaray Eastern Kazakh Oblast, excavated from Kurgan burial mounds (KAZ: n = 22); Bell-Beaker period materials from Szigetszentmiklós, Hungary (HUN: n = 10); samples from China dated to the East Zhou Dynasty (alongside two Pleistocene specimens from this region; CHI: n = 16); and Iron Age materials from the site of Slepushka (and one Bronze Age specimen from Ust’e), Russia (RUS: n = 17). We use the term ‘archaic’ to refer to the archaeological specimens, the Pleistocene wild horses and the Przewalski’s horses. Anatomical landmarks are shown in Figure 1b. Overall, 156 individual horses were studied, with an average sample size of 20 specimens per sample (Figure 1c and Supplementary Information, available online). Sex differences were negligible (Seetah et al., 2014). All animals were adult with a similar degree of tooth wear. Because of variation in the enamel folding between the apex of the crown and the cervical margin, an individual tooth will show an apparent change in shape of the folding as the tooth wears. Tooth wear also affects the proportions of the teeth themselves (Gidley, 1901). For the latter, this situation is compounded by the fact that tooth wear affects the individual teeth differently; thus, tooth wear affects the proportions of a P4 differently to an M3. As we only used P4 teeth, modifications to the proportions of individual teeth, consequential to wear, should be uniform across our assemblage. Furthermore, to the best of our ability, we minimized variations in enamel folding across the assemblage by selecting teeth with a similar crown height (as per Gidley, 1901: 97) and rejecting those that showed excess wear.
Methods
Size and shape variables were computed using a Procrustes superimposition (Rohlf and Slice, 1990) on the raw landmark data (Seetah et al., 2014) digitized on high-resolution digital images in TPSDig (Rohlf, 2015). Groups were tested using 10,000 permutations for mean differences in size (absolute difference – that is, the sign is not considered and the test probability is two-tailed) and shape (Procrustes distance between mean shapes; two-tailed test). Results were double-checked using t-tests for samples with unequal variance (size) or their multivariate equivalent (shape) using James’ statistics (Dryden, 2013). Tests were performed in MorphoJ (Klingenberg, 2011), PAST (Hammer et al., 2001), NTSYSpc (Rohlf, 2013) and R (R Development Core Team, 2005). Allometry was tested in MorphoJ using a multivariate regression of shape onto the natural logarithm of centroid size and significance assessed with 10,000 permutations for the percentage of variance explained by size. Variation within and among samples was summarized in PAST with box plots for size and between-group principal component analysis (BG-PCA) scatterplots for shape (Seetah et al., 2012). A BG-PCA projects the specimen data onto the eigenvectors of the mean shape variance–covariance matrix.
Focusing exclusively on mean shapes, variation was summarized in R using principal component analysis (PCA) with 95% confidence envelopes computed using 1000 bootstraps in NTSYSpc. For clarity, as two types of PCA were used (BG-PCA and ‘standard’ PCA), we stress here that every time we refer to the between-group analysis, abbreviations are preceded by the acronym BG- (i.e. BG-PCA or BG-PCs); if this acronym is missing, however, PCA and PCs simply refer to a PCA summarizing total variance in a sample regardless of groups. Size and shape similarity relationships among sample means were also summarized in NTSYSpc with distance-based trees (neighbour joining using Euclidean distances for size and Procrustes distances for shape). To take into account uncertainties in estimates of means, node repeatability was assessed by bootstrapping each sample, computing the corresponding pseudo-means and the resulting tree and finally computing a 50% majority rule consensus tree from all 1000 bootstrapped trees. Group separation was also assessed by estimating cross-validated classification accuracy using the first eight PCs of premolar shape (91.7% of variance) in a discriminant analysis (DA). The general methodological framework used in this study is described in Cardini (2013) and exemplified in Viscosi and Cardini (2011). The bootstrap procedures used to compute the confidence envelopes and the percentages of node repeatability are detailed in Cardini and Elton (2008). The protocol used to perform the DA, including the estimate of random chance thresholds and the sensitivity analysis using a balanced design, is described in Evin et al. (2013).
Disparity (morphological variance in modern vs archaic horses) was tested using methods modified from the protocol described by Drake and Klingenberg (2010; for an extensive review on disparity, see also Foote, 1997). For univariate data, this implied a simple test for the similarity of variances, which is the permutational version of Levene’s test used by Nagorsen and Cardini (2009). The same test can be adapted to test multivariate shape variances estimated by the mean squared Procrustes distance of each individual to its group mean (D1) or equivalently the sum of variances of all shape coordinates (Drake and Klingenberg, 2010; Nagorsen and Cardini, 2009). Informally, this is akin to measuring the sum of the squared sides of the multivariate ‘box’ which contains the data. An alternative test statistics (D2) is based on estimates of the squared volume of the ‘box’ occupied by the data. The product of the eigenvalues of the within-group shape variance–covariance matrix, which is the same as its determinant if all eigenvalues are used, provides one way to do this estimate. However, similar to Drake and Klingenberg (2010), who used an alternative method based on convex hulls, we estimated the volume occupied by either modern or archaic individuals within the subspace of the first three PCs of the total shape data set. These PCs account for two-thirds of the total variance, and by using them to estimate the volumes of the shape space occupied by each group, we reduced issues with numerical precision using very small numbers, such as those generated by the product of eigenvalues from Procrustes shape data (Drake and Klingenberg, 2010). D2 is, therefore, computed in a subspace of the total shape space, whereas D1 uses all available shape information (i.e. there is no dimensionality reduction). Thus, to summarize D2, its computation meant performing two eigen-decompositions on the shape data: (1) the first one used all individuals regardless of group and was performed on shape coordinates (from a common superimposition), as it simply aimed at building a subspace of the total shape space in which to compute disparity as a ‘volume’; (2) the second one took the shape variables corresponding to the subspace built in (1) (i.e. PC1, PC2 and PC3 scores); it was performed within group and used the product of the three eigenvalues of one group to estimate its disparity in that subspace.
Results
Moderate premolar size differences were present despite large overlaps across samples (Figure 1c; Table S1, available online). Using means, the resulting tree was well supported with three main size subdivisions suggested by the longest branches of the dendrogram (Figure 2): small (ICE, HUN and RUS), medium (THB and KAZ) and large (CHI, CRO and PRZ). Bonferroni-corrected tests were significant (with c. 37% of variance explained on average) only when they involved groups of small-toothed versus large-toothed animals, with the largest premolars being about 15% bigger than the smallest ones. Size groups were made of a mix of samples from different periods and localities. There was no clear pattern in terms of time or geography: for instance, modern ICE were similar in size to Iron Age RUS and Bronze Age HUN. With the exception of the large-toothed CHI sample, however, the largest premolars belong to the two groups of wild horses (Pleistocene CRO and modern PRZ). For CHI, the large mean size was not due to a bias in relation to the two Pleistocene individuals in the sample, as their size was actually very close to but slightly (c. 1 mm) less than the mean size of Iron Age CHI.

Summary of mean size variation: neighbour joining tree of mean sizes with percentages of support for branches. Three main size subdivisions are emphasized using different grey tones, as in Figure 1. Arrows help visualizing pairwise comparisons significant after a Bonferroni correction.
Shape had little covariation with size (Table S2, available online). Thus, allometry seems to have had minor and generally negligible effect both within and across samples. In contrast, in the scatterplots capturing most of the shape variance (Figure 3, 82.7% of between-group variance, corresponding to 50.7% of total sample variance; Figure 4a, 91.0% of variance in means), variation was highly structured and the pattern looked unexpectedly simple: modern breeds showed very clear differences between each other when compared with all archaic horses; archaic horses largely overlapped with the four archaeological samples having similar premolar shape and also resembling extinct (CRO) and living (PRZ) wild horses. Shape components other than those shown in the scatterplots did not suggest any group separation. In terms of the shape features that characterized the three main clusters (Figure 3), ICE had a relatively short and thick premolar, THB was comparatively long and narrow and archaic samples had a somewhat intermediate shape.

Between-group principal component analysis (BG-PCA) of shape. Scatterplots of the first two components (BG-PC1, 48.7%, and BG-PC2, 34.0%, of between-group variance) with groups emphasized using convex hulls. Mean shapes of ICE, THB and archaic samples are shown using rendering of contours and deformation grids with expansion factors, as in Viscosi and Cardini (2011) and Seetah et al. (2012).

Summary of mean shape variation: (a) 3D scatterplot of the first three principal components (91.0% of total variance) with 95% confidence ellipsoids around means and (b) neighbour joining tree of mean shapes (using Procrustes distances and rooted at PRZ) with percentages of support for branches and arrows to emphasize pairwise comparisons significant after a Bonferroni correction.
Tests of mean shape differences (Table S3, available online; Figure 4b) provided results in agreement with the pattern suggested by the scatterplots. Virtually, all Bonferroni-corrected comparisons involving ICE and THB were significant. In contrast, out of 15 possible pairwise comparisons between pairs of archaic samples, only 5 were significant. The variance explained by differences with modern breeds (on average 27.7%) was almost twice as much as that found in archaic horses (on average 15.6%). The average shape distance of either ICE or THB to any of the archaic horses was more than double that which was noted among pairs of archaic samples (⩾0.0859 vs 0.0425 units of Procrustes shape distance, respectively).
Because sample size was heterogeneous and the PRZ sample was particularly small, results may have been affected by unbalanced sampling and differences in statistical power. To explore the sensitivity to sampling, we excluded PRZ and repeated all tests on shape using a perfectly balanced design with subsamples of 10 random individuals from each original sample. (N = 10 was chosen as this was the sample size of HUN, the second smallest sample in the study.) Results from the balanced design are briefly summarized here, as they show no appreciable difference compared with the analysis including all specimens: even with a Bonferroni correction, all 11 tests involving modern breeds (ICE and THB) were significant; however, with the same correction, only 4 of the 10 comparisons between pairs of archaic samples showed significant differences.
Figure 4b shows a neighbour joining tree rooted using PRZ as an ‘outgroup’. Phylogenetic inference using Procrustes shape data is problematic (Adams et al., 2013) and was not our aim. The tree is better seen as another type of graphical summary of mean shape similarity relationships, a summary with a very high cophenetic correlation (0.971, an index of high accuracy in reproducing the pattern of full multivariate shape distances) and the potential of rooting the tree to explore directional change. When this was done, the tree topology suggested a progressive trend in change from wild horses to modern breeds with archaeological samples in between and modern breeds at the tips of very long branches. However, only two clusters were well supported when inaccuracies in estimates of mean shape were taken into account. The two clusters suggest the same sharp contrast between modern and archaic animals as in the scatterplots and pairwise tests.
The magnitude and direction of group shape differences were further explored using cross-validated discriminant analyses (Table S4, available online). The baselines to assess classification accuracy were the estimated average and the 95th percentile of correct classification by random chance, which was, respectively, 26.3% and 30.1%. Modern breeds with the highest accuracies (>83% of correctly classified individuals) were thus about three times better than chance. Classification accuracy varied in archaic samples, but it was lower and averaged 43%, with a small increase to 56% if the two smallest samples (HUN and PRZ) were not considered. Thus, although on average better than chance, about half of archaic individuals were misclassified into other archaic samples; however, none were misclassified as modern. With a perfectly balanced design (not shown) to control for sample size heterogeneity (the same design as in the balanced permutation tests), modern breeds would have a slightly lower average classification accuracy (75%) and archaic samples a slightly higher one (58%), but the general outcome was similar and no archaic individuals were misclassified as modern breeds. To put these percentages into context, if they were interpreted using an arbitrary but common criterion employed by taxonomists for wild populations, modern breeds would be consistently above the arbitrary 75% threshold in classification accuracy for a valid subspecies (Patten and Unitt, 2002) and archaic samples would be mostly below it.
The overwhelming signal from shape data analysed and summarized with a multiplicity of approaches is that modern breeds are highly distinctive with long and narrow premolars (THB) or short and thick ones (ICE), whereas archaic populations tend to have an intermediate, probably more primitive, shape that has varied little over time and across localities. These observations lead us to predict an increase in disparity before and after modern breeds were selected.
With only two modern breeds in the data set, our samples likely underestimate their disparity, and any test has to be considered preliminary and largely exploratory. Bearing this caveat in mind, we performed a disparity analysis. First, we pooled modern breeds and tested disparity differences between them, together and the pooled sample of all archaic individuals. Then, we compared disparity between archaic horses (pooled) and each of the two modern breeds (one at a time). In this second series of tests, we expected similar amounts of disparity if variation in archaic horses, regardless of geographic origin and period, was comparable with that found within a single modern breed. For comparative purposes, disparity was also tested, using the same design, on premolar size.
We found that disparity in modern breeds (pooled), estimated by the sum of all eigenvalues (D1, Table S5, available online), is 1.2 times larger than in archaic individuals. This is not significant or only marginally so (p = 0.09). Disparity estimated using the product of the first three eigenvalues (D2, Table S4, available online) is, however, highly significantly different (p = 0.0016) and 4.7 times larger in modern horses. As expected, when archaic horses are compared with one or the other modern breed (one sample at a time), disparity is generally larger in the archaic group but never reaches significance (p > 0.05). Results from shape data were at odds with those on size, which indicated that archaic individuals vary in size three times or more than modern breeds even when pooled (p < 0.0001; D1, Table S4, available online). Thus, disparity analysis strongly suggests a discordant pattern of change in size and shape, with size varying in the archaic horses and shape being constrained within a relatively small region of the morphospace until the appearance of highly derived modern breeds.
Potential sources of error
Taxonomic and especially palaeontological analyses can be strongly affected by sampling error because of the limited availability of individuals in one or more samples. In our study, sample size is very heterogeneous, with ICE (N = 50) and PRZ (N = 4) falling at opposite extremes in terms of number of specimens. We assessed the impact of sampling error in a previous study using rarefaction analyses and resampling methods in this same horse teeth data set (Cardini et al., 2015). In that work, we concluded that ‘likely, centroid size and shape variance require no less than 15–20 specimens to achieve a reasonable degree of accuracy’ (p. 149). In the same study, largely in agreement with a previous one on monkeys (Cardini and Elton, 2008), we also showed that estimates of mean shapes may require 20 or more specimens, while those of mean size can be fairly accurate with just 10 individuals. Thus, although most of our samples have sizes above or close to this minimum putative requirement, a degree of caution must be exercised in the interpretation of results, and this is especially true for the smallest samples. Crucially, however, in terms of what the populations we sampled allow us to say (see also below), the main conclusion of our study seems robust. This is suggested by Figure 7a of Cardini et al. (2015) that indicates how, despite a remarkable inflation of differences in the smallest random samples from the rarefaction analysis of ICE, there is no overlap between ICE means and those of all other groups. This is because differences are so large that small samples do not change the general pattern, and this most likely holds also for THB, whose sample is relatively big and whose shape is as distinctive as that of ICE. As for the analysis of mean shapes, results from the disparity analysis are robust because archaic samples are pooled, thus strongly mitigating the problems with small samples.
Another source of inaccuracy is related to the inclusion of three allochronic individuals (compared with the others from the same region) in two archaeological samples: two Pleistocene specimens in CHI and one Bronze Age individual in RUS. This occurred because, at the time analyses were performed, we had no accurate information to age all individuals and assumed that they were of similar age as the others from the same geographical region. After we found out that CHI and RUS included one to two allochronic individuals each, we investigated whether their inclusion in the analyses could have altered results. To this aim, as detailed in the Supplementary Information (available online), we showed that in terms of both size and shape, the allochronic individuals were well within the range of the corresponding samples and they did not appreciably alter the pattern of population differences in means and variances (whose correlations before and after excluding those three specimens are virtually 1). Thus, although the best choice would have been to leave those three specimens out, we acknowledged the problem and demonstrated that it has no practical consequences and kept them in the study. This avoided redoing all analyses, which is a requirement when data are analysed in a common Procrustes shape space, which is specific to the sample (and configuration) being analysed.
A third source of uncertainty is the limited number of samples in the analysis. Our samples are representative of the whole Palearctic range, include the only living population of true wild horse, the Przewalski’s horse, as well as an extinct one from the Pleistocene, and also include two modern breeds and archaeological material from four different localities, over a time span that corresponds to early and later stages of horse domestication. However, clearly, our results will have to be corroborated by future studies on other modern and extinct populations. Our hypothesis will be corroborated if the majority of extinct populations still have small differences among them compared as well as with Przewalski’s horse, and modern species have more distinctive shapes, when samples are added from more regions, time periods and modern breeds. If not, it will be rejected.
Discussion
With the provisos discussed above, our data indicate that three complementary lines of evidence (mean differences, classification accuracy and morphological disparity) suggest that the size of horse premolars has changed to a variable degree over time and space in both archaic and modern horses (Figures 1c (box plots) and 2 (size phenogram) and Supplementary Information, available online), but changes to shape have been modest, with variation mostly overlapping among the archaic samples (Figures 3 and 4) until the development of modern breeds in recent centuries, when shape has become hugely distinctive (Figures 3 and 4). With an estimated time since the divergence of PRZ and the lineage leading to domestic horses of >100,000 years (Goto et al., 2011; Steiner et al., 2013; but see Der Sarkissian et al., 2015, for an alternative perspective), the data suggest that conservativeness may have characterized premolar shape for almost 99% of the history of Equus caballus, whereas in the last 1000–2000 years, under conditions of strong selective breeding, shape differences have more than doubled compared with those observed in archaic horses (Figure 3). Thus, we propose, for premolar shape, a hypothesis of a ‘long-fuse’ model of phenotypic change in domestication, whereby a long initial period of small variation was followed by an explosive acceleration in the magnitude of shape change.
We anticipated large phenotypic changes in modern breeds consequential to artificial selection, although not specifically targeting tooth morphology. Differences in premolars are in fact small in size but remarkably large in shape when ICE and THB are compared (Figure 3). Although of smaller magnitude, a degree of variation in premolar size across all groups was also predictable across all samples based simply on variability in body mass (e.g. the small PRZ and ICE vs the medium-large THB), geographical distances, time heterogeneity and likely genetic differences. These factors should, however, also affect shape. In contrast, we found a clear disconnection between patterns of size and shape variation: while size varies over time and space, showing no clear trend except a moderate degree of reduction in most domesticated samples compared with wild horses, the amount of shape change of archaic horses in almost 100,000 years is overall modest and especially small when compared with the massive differences shown by modern breeds. This relative phenotypic conservativeness in shape, despite size changes, is unexpected and potentially might provide insight into both ecological and anthropological factors.
Dramatic ecological changes occurred after the end of the last glaciation c.11,500 years ago. Over the next 2000–3000 years, during the early-Holocene, temperature rose, forests expanded and open habitats favoured by horses shrank (Warmuth et al., 2011). Humans hunted horses and, with the expansion of agriculture in the mid-Holocene, c. 8000–5000 years ago, contributed to modifying the environment. Regardless of when and how domestication occurred, horses had to cope with a variety of selective pressures, whose combined effect was profound for the wild counterpart, resulting in the extinction of those populations. Furthermore, mtDNA diversity was greater before c. 2800 years ago rather than after (Lippold et al., 2011). Thus, as we anticipated, both environmental and genetic factors suggest that change would be expected, an expectation that is in agreement with our findings regarding size but at odds with those for shape. In fact, as premolar size varies in several archaic samples, allometry on its own, with its typically pervasive effect on bone shape (Klingenberg, 1998), would predict concomitant shape differences, which we have not found.
Large shape changes seem to appear suddenly (in geological terms) in populations dated to after the first millennium BCE. This might be consistent with a refining phase during domestication that profoundly affected breeding practices because of important socio-economic developments in horse husbandry and greater management of individual populations. The almost complete extinction of wild horses in much of Eurasia may have been approximately concomitant with this stage: a reduction in wild populations, resulting from increased habitat encroachment by humans and a warming climate, potentially catalysed a reassessment of the symbolic and economic value of the domesticated form of the horse. Technological developments may also have played a role in promoting change, for example, with the development of the metal bit (Bendrey, 2011). In this complex scenario of cultural innovation and environmental change, highly distinct modern morphological types emerged with a major contribution, according to genetic evidence, likely due to an independent second wave of domestication in Western Europe (Achilli et al., 2012; Cieslak et al., 2010; Lindgren et al., 2004).
Conclusion
Domestic animals offer some of the best examples of how strong directional selection can change morphology (Drake and Klingenberg, 2010). Our results support the notion that domestication is a highly dynamic process (Larson et al., 2014; Marshall et al., 2014), with morphological changes occurring over an extended period of time (Dobney and Larson, 2006). However, the amount of change can vary remarkably over time and more importantly may vary sharply depending on the trait or even just the aspect of the character being studied, as shown by our tooth size and shape data. An ‘explosion’ of morphological disparity is expected as artificial selection became more intense in the last few centuries. Interestingly, however, we also show that if horses were first domesticated around 6000 years ago (Outram et al., 2009), the shape of horse teeth then likely remained unchanged for over half of the subsequent history of the domestic horse, until transformations that took place over the past 2000–3000 years. To our knowledge, this has never been reported and suggests a new layer of complexity, in which shape variation is decoupled from size, in the study of human–animal interactions. These outcomes reinforce the singular nature of horse domestication per se and support the notion that the special relationship between horses and humans has existed for some considerable time. Within a well-established and powerful statistical framework (Adams et al., 2013), our work provides a testable hypothesis, which we term the ‘long-fuse’ model of phenotypic change in domestication and which can be verified in new samples and on other anatomical structures and different species.
Footnotes
Acknowledgements
We are extremely grateful to the following for facilitating access to materials and providing essential input for specific samples: ICE and PRZ: F Mayer (Museum für Naturkunde, Berlin), THB and PRZ: R Sabin and L Tomsett (Natural History Museum, London), CHI: T Yaqi and H Songmei (Shaanxi Archaeological Research Institute, Xi’an), KAZ: Z Samashev and K Kashkanbajev (Institute of Oriental Studies, Almaty), RUS: P Kotsinsev (Institute of Ecology of Plants and Animals, Yekaterinburg) and B Hanks (Department of Archaeology, Pittsburgh), CRO: D Brajkovic (Institute for Quaternary Palaeontology and Geology, Zagreb) and P Miracle (Department of Archaeology, Cambridge) and HUN: A Choyke (Department of Medieval Studies, Budapest) and A Endrődi (Budapest History Museum, Budapest). We thank N Vibla and J Li for assistance in recovering samples and translation, A Evin for providing a set of duplicate samples from Berlin and T Cucchi and K Dobney for essential analytical input. We are deeply grateful to I Dryden (University of Nottingham), P Gunz (Max Planck Institute for Evolutionary Anthropology, Leipzig), L Monteiro (Universidade Estadual do Norte Fluminense), S Schlager (University of Freiburg) and other MORPHMET subscribers for help and suggestions with R scripts and packages; to A Drake for discussing with us methodological and conceptual issues about morphological disparity in dogs; and to P Mitteroecker (University of Vienna) for his important feedback on how to estimate multivariate variance; to PD Polly (Indiana University, Bloomington) for discussions on teeth evolution and shape analysis which greatly improved our interpretations. Finally, we are sincerely grateful to PD Polly and an anonymous referee for insightful comments on an early draft of this manuscript.
Author contributions
KS and AC designed the study. GB oversaw the study. KS coordinated and collected samples and performed initial landmarking. AC performed statistical analysis and validation. KS and AC wrote the article with essential input from GB. All authors gave final approval for publication.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
Funding was provided to GB from the Leverhulme Trust project grant scheme (F/09 757/B) and to KS and AC from the Lang Fund for Human-Environmental Anthropology, Department of Anthropology, Stanford.
