Abstract
The usual practice of presenting and considering age estimates derived from pipe stem bore diameters as individual point estimates is incorrect, misleading, and ignores the true nature of the data set. Consideration of the age estimates as a range of values described by the mean and standard deviation correctly and more accurately reflects the nature of the data set, the probabilistic characteristics of the calculations, and allows for explicit evaluations of the similarities and differences among age estimates using inferential statistics.
Keywords
Introduction and research context
The purpose of this paper is to describe the appropriate use of pipe stem bore measurements to derive age estimates using the standard regression method described by Binford (1962; 1971). This paper is not a discussion of the overall validity of the pipe bore stem dating method in general (for exceedingly thorough discussions of the method's basic assumptions, problems, and relative credibility see Shea 1991; Shott 2012; Walker 1967; Wesler 2014). It is assumed here that the pipe stem bore diameter measurement method can produce useful age estimates if applied in an appropriate manner.
Shott (2012:18) and Wesler (2014) have noted that the initial studies of pipe stem bore diameter measurement dating methods extensively discussed the role of the standard deviation measure of dispersion in producing meaningful estimates of age ranges (see for example Binford 1971; Hanson 1971, 1972; Heighton and Deagan 1971). However, almost all archaeologists using the method do not calculate and report standard deviation statistics. The overwhelming majority of historical archaeologists erroneously present and discuss age estimates based on pipe stem bore diameter measurements as if they were single points in time, not as ranges of dates. This standard procedure in historical archaeology is as incorrect and misleading as treating a radiocarbon date as a point estimate rather than as a probability estimate of a range of dates, and just as likely to lead to what Robert Stuckenrath (1977:182), an expert in radiocarbon dating, called “banana peel” interpretations by prehistoric archaeologists. Comments by J. Gordon Ogden (1977:173), another radiocarbon dating specialist, are also applicable to pipe stem dating: “I find myself increasingly distressed that users of radiocarbon dates fail to understand or appreciate what the quoted figures really mean.”
Theory and methods of pipe stem date calculations
Working from the basic level of descriptive statistical methods, if you can calculate a mean, you can calculate a standard deviation, even if the data are measured on an interval scale and subject to calculation on a grouped data basis, as is the case for pipe stem bore diameter data (see Parsons 1974:81–90; Thomas 1976:76–82 for the statistical theoretical basis and computational procedures; Hanson 1971:3; Shott 2012:19 for specific discussions related to pipe
stem bore diameter measurements and calculations). There are calculation methods for grouped data that produce truly unbiased estimators of various measures of central tendency and dispersion including a distribution's variance and standard deviation (for example see discussions in Bierman, et al. 1961:101–143; Blalock 1960:70–73; Parsons 1974:86–90).
Age estimates from pipe stem bore diameter data are similar to radiocarbon dates at an abstract level. It is critical to realize that neither is really a “date” at all. Rather, they are “only a statistical probability of an object's age and not a simple fact” (Haynes 2002:13, also see discussion in Stuckenrath 1977). Time is not directly measured in either case. Instead, “proxy data” are used. Proxy data are defined as “an entity or variable used to model or generate data assumed to resemble the data associated with another entity or variable that is typically more difficult to measure; or an indirect measurement inferred from another direct measurement” (
The standard radiocarbon date format of “X years BP s “the statistics of nuclear disintegration” (Ogden 1977:173; see also Parsons 1974:241–247; Stuiver and Polach 1977; Thomas 1976:246; Walanus 2006:5–6; Weninger, et al. 2011). The key point is that the standard deviation is first derived for the radioactivity measures, and then converted to age estimates. The absolute value of the difference in years between the date based on the mean (X) and the dates at either end of the one standard deviation range comprises the Y value (Taylor 1987:13–14). Thus, the Y value in the standard radiocarbon date format is an approximation of the standard deviation of the measurements of radioactivity converted to calendar years. This epistemological and methodological gap between the proxy data and the “radiocarbon date” has always been noted with radiocarbon date values often identified as counts of “radiocarbon years” rather than “calendar years” (Taylor 1987:5, 20, 133–134).
Calculating pipe stem date standard deviations
The logic and process of deriving standard deviations for radiocarbon dates should be used to generate age ranges based on pipe stem bore diameter measurements. First, the mean bore diameter is calculated using standard methods (Binford 1962). Then, the standard deviation of the bore measurements was calculated as follows using exactly the same grouped data that were used to calculate the mean: standard deviation = square root of the (sum of f(M-X)2)/N-1) where: f = the frequency of each size category (the count of pipes of each size category in 1/64- inch increments) M = the midpoint of each size category (diameter of the pipe bore measurements in 1/64- inch increments) X = the mean of the pipe bore diameter measurements N = the total sample size (the total number of bore fragments measured)
This method of calculating standard deviations for pipe stem date estimates is seen as preferable to the method described by Hanson (1971) because Binford (1971) convincingly showed that the Hanson method was based on a series of incorrect assumptions about the original data set used to derive the original regression equation (Binford 1962) and incorrect methods of calculation. The method used here is also preferable to Binford's (1971:237–244) own method of calculating standard deviations because it is far less cumbersome than the Binford method and does not need to rely on interpolation from graphs. Most importantly, the method presented here is preferable because it follows the approach used for radiocarbon dating age range calculations, with which it shares the same basic epistemological and theoretical bases, and fully accounts for the fact that the underlying data are in a grouped form. Another, but more cumbersome, method of calculating means and standard deviations of the sets of pipe stem bore diameter measurements is to enter each individual bore diameter measurement into an EXCEL worksheet and use its built-in calculation functions to determine the mean and standard deviation values.
After the mean and standard deviation of the bore diameter measurements are calculated, the Binford conversion formula is applied to the mean, the mean plus one standard deviation, and the mean minus one standard deviation. The pipe stem “date” can then be expressed as A
Issues of sample size
Pipe stem date calculations should only be undertaken if the number of measured pipe stems from the relevant archaeological spatial or depositional context is greater than or equal to 50, because, in general, a minimum sample of 50 items is needed for useful calculations of measures of central tendency and dispersion for normal distributions. Optimal sample size is on the order of 350 stem measurements with samples larger than 350 showing minimal gains in accuracy and precision (see discussions in Blalock 1960:165–167; Parsons 1974:338–341, 437–439 to realize these parameters for sample sizes are not some kind of “statistical cookbook” conventions). Also, note that these sample size conventions are a result of the mathematics of the calculations and parameters of the data distributions, not some exclusive aspect of archaeological data sets as is incorrectly indicated by the statistically irrelevant and very misleading discussions of sample size requirements for pipe stem analyses presented by Audrey Noel Hume (1963) and Ivor Noel Hume (1978:300–301), which are nevertheless still cited in some discussions of pipe stem age estimates (e.g. - Plumley 2002:90). These discussions of the effects of sample size variation on pipe stem bore diameter ignored the probabilistic nature of the date estimate calculations, relied on “arbitrary samplings” of the same large collection of nearly 12,000 measured pipe stems in a “quasi-simulation” analysis of various sized samples. Therefore, these studies violate nearly every convention of sample simulation studies that had been well established in other disciplines at the time of these articles’ publication (see Custer 1979, 1983, 1992 for review and application of appropriate sample simulation methods in archaeological contexts).
Use of pipe stem samples that are too small is probably a more significant source of erroneous estimates and interpretations of pipe stem dates than the use of the calculated dates as point estimates. The relevant literature abounds with dates derived from inadequate samples, sometimes even including dates derived from a single pipe's bore measurement (Kent 1984:272). Furthermore, the fact that a date calculated from a small sample is consistent with age estimates derived from other sources does not obviate the effects of sample size on the standard error of the mean, as incorrectly implied by Mallios (2001:117–19). A mean date calculated from a small sample is always unreliable, whether you happen to like the date or not.
Example pipe stem dates and standard deviation statistics – “My Lord's Gift”
Table 1 shows a series of pipe stem date estimate calculation results which use the standard deviations of the bore diameter measurements to specify the statistically relevant date ranges for pipe assemblages recovered from a variety of contexts from “My Lord's Gift” (18QU30), a large late17th - early 18th plantation site on the Chester River in Queen Anne's County, Maryland (Custer 2019; Custer, et al. 2019). All samples included more than 50 pipe fragments. No doubt most readers will find the date ranges shown in the last two rows of Table 1 to be quite horrifying, especially if they are used to considering only the mean dates generated from pipe stem bore measurements as the chronological “fact” derived from the pipe stem analysis. Comforting point estimates of ages are lost forever in date ranges that span 57 to 89 years for a single standard deviation on either side of the mean and 114 to 178 years for two-standard deviation ranges, even though some of the samples are quite large. And if we only consider the single standard deviation range, as large as it is, there is approximately one chance in three that the actual estimate falls outside that large range as opposed to the 2-standard deviation ranges where there is only one chance in twenty that the actual estimate falls outside the even larger (twice as large) range. These same kinds of date-estimate-range issues also plague unrealistic considerations of radiocarbon dates where single standard deviation ranges are considered rather than using larger, but more reliable and realistic, two standard deviation ranges, despite rather pointed admonitions on the issue by radiocarbon dating specialists which have been available in the literature for some time (Ogden 1977:171–173; Stuckenrath 1977:181–183; Taylor 1987:123–126). Point estimates and single standard deviation ranges may be comforting in the cases of both pipe bore stem measurements and radiocarbon age estimates, but their use will not lead to accurate and responsible insights concerning the chronology of the archaeological record. Furthermore, if we are tempted to calculate a more comforting median age estimate value for the data, we are still ignoring the inherent dispersal of the age estimates based on the probabilistic nature of the process of calculation and inference of dates. We are only fooling ourselves in such analyses.
Example pipe stem date estimates with standard deviation calculation results from 18qu30 - early occupation area.
Note: All dates rounded to the nearest year.
Source: Custer (2019:44).
Although the comfort of point estimates, misleading as they may be, is lost when standard-deviation-based date ranges are considered, calculation of the standard deviation statistics allows explicit comparisons of mean pipe stem dates using a standard difference-of-mean test (Parsons 1974:441–445). In the case of pipe stem age estimates, this procedure is especially appropriate because the sample sizes of the two populations whose means are being compared (the number of pipe stems measured) is known. This parameter for radiocarbon dates (the number of counts of radiation measurements) is not generally reported and considered. Therefore, the process of comparing the means of two radiocarbon dates is quite complicated unless potentially unwarranted assumptions about sample sizes are made (see discussions in Shott 1992; Spaulding 1958; Thomas 1976:244–251; Walanus 2006; Ward and Wilson 1978; Weninger, et al. 2011).
There were two distinct occupation areas at 18QU30. Based on a variety of data, including temporally diagnostic European ceramics, dated window leads, archival data, comparative patterns of building construction, pipe bowl shapes and maker's marks, and relative frequency of locally made terra cotta smoking pipes, one occupation area was dated to ca. 1660–1680 and the other was dated to ca. 1680–1740 (Custer, et al. 2019:4–7). Table 2 shows the results of two applications of the difference-of-mean test to pipe dates from 18QU30. In both cases the archaeological contexts of the pipe samples, and dates of artifacts other than the pipes from those contexts, indicate that the pipe samples should date to different points in time even though the calculated pipe date ranges are quite large and show significant overlap. The fact that the actual mean values are quite different has little statistical significance here because it is the range of values that most accurately describes the potential dates. In both cases the results of the difference-of-mean test statistics and their associated probability values indicate that the dates are indeed statistically significantly different with a confidence levels exceeding 95%. Some might argue that the same conclusion could be reached by simply observing the large differences in the mean dates compared and not even bothering to think about the ranges of the age estimates. However, in this case you would be reaching the correct conclusion for the wrong reasons and intuitive examination of the actual ranges of dates produced by the method would not necessarily always lead to the correct conclusion. In these cases, using the correct procedure of examining date ranges calculated by using the distributions’ standard deviations yields a potentially ambiguous set of data regarding the comparative ages of the samples. However, that ambiguity is resolved by application of the difference-of-mean test Resolution of such ambiguity is indeed the goal of this kind of inferential statistics. Furthermore, the probability of ambiguous results would increase as the differences in mean dates decreased making application of the difference-of-mean tests even more useful and important.
Example difference-of-mean tests of pipe stem date estimates from 18qu30.
*Value is statistically significant at least at the 1% level (p < .01).
Source: Custer (2019:45).
Conclusion
In conclusion, the misuse of pipe stem bore diameter dating in archaeology is probably just as misleading and prevalent as the misuse of radiocarbon dates. Nevertheless, the misuses of both kinds of dates are significant errors in archaeological chronological analyses. Based on the past behavior of archaeologists, even in the face of explicit admonitions, the misuses are likely to continue.
Footnotes
Acknowledgments
I thank Andrea Anderson for her work helping to compile the individual pipe data and Henry Miller, Michael Stewart, and Dennis Curry for their comments on early drafts of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
