Abstract
The simplified viscoelastic continuum damage model has been widely accepted as a tool to predict fatigue performance of asphalt concrete. One key component in the model is the damage characteristic curve that results from a cyclic fatigue test. This curve characterizes the relationship between material integrity (stiffness) and the level of damage in the material. As with any experimental measurement, it is important to know and quantify the variability of the damage curve, but traditional statistical methods are ill-suited for experiments that yield functional data as opposed to univariate data. In this study, a variance index of the damage characteristic curve is first proposed and compared with the expert judgment of the variance of a set of nine different asphalt mixtures. Then, an example analysis for establishing the repeatability limit of a specific mixture as the application of the variance index is presented using the resampling method and hypothesis test. The major findings are as follows: 1) the proposed variance index can match the expert judgment of variability; 2) the shape of the damage characteristic curve can affect the performance of the variance index; 3) the resampling method and hypothesis test can be applied to flag inconsistent data in multi-user or multi-laboratory results; and 4) the resampling method can also be used to construct the repeatability limit of the variance index.
The mechanical properties of a material, such as the modulus of elasticity, tensile strength, elongation, and so on, are always, if not all, characterized by experiments. Multiple test replicates are usually required in the experiment procedure to reduce the effect of measurement error, which is the difference between a measured value of a quantity and the true value. The variance of these test results is usually an indicator of the measurement error and can be used to determine the familiarity of the users conducting the test, identify the sensitivity of the experiment procedure, compare the equality of two test results, establish Monte Carlo sampling, or perform error propagation in uncertainty analysis. The current way to calculate sample variance for univariate data is to calculate the average of the squared differences from the mean. It is a standard way to measure how far a set of numbers is spread out from their average value.
Asphalt concrete, a material used widely on pavement, exhibits viscoelastic behavior. Thus, the complex modulus of the material is highly temperature/time dependent in the range that the material is subjected to in the field. The simplified viscoelastic continuum damage (S-VECD) model incorporates the dynamic modulus property and characterizes the changes of the material property in the constitutive relationship as the fatigue damage accumulates. The experiments related to the model produce two results. One is the dynamic modulus characterized at different temperature and frequency in accordance with the AASHTO TP 132 or AASHTO T 378 protocol. The other is the damage characteristic curve that captures the change of material stiffness as the damage accumulates as a result of the AASHTO TP 133 or AASHTO TP 107 protocol.
If these results are considered as univariate data, then statistical analysis, such as the analysis of variance (ANOVA) test, must be applied at the same temperature and frequency for the dynamic modulus or the same damage condition for the damage characteristic curve, as conducted on univariate data of specific gravity, mixture volumetrics, and so on (1–3). This approach has been taken in past research studies. For the dynamic modulus data, researchers have generally focused on analyzing one temperature and frequency combination (4–6), which ignores information about the inherent relationship between different temperature and frequency. Others have performed an ANOVA test on each measured temperature and frequency (7, 8), which generates lots of information and can lead to contradictory conclusions. Bonaquist ( 9 ) studied the precision of the dynamic modulus conducted with the Asphalt Mixture Performance Tester and implemented ASTM E691, the standard practice for conducting an interlaboratory study to determine the precision of a test method, on the data for each combination of temperature and frequency ( 10 ). For the damage characteristic curve, little analysis has been conducted to quantitatively characterize the variation of the results or develop a precision statement because of the lack of theory. Researchers have focused mainly on the variation of fatigue life prediction of asphalt material from the S-VECD model incorporating the dynamic modulus and damage characteristic curve, ignoring that the damage characteristic curve is the fundamental mechanical property of the material (11–13).
While univariate approaches prevail, these data can be treated as multivariate data for the purposes of analysis if each combination of temperature and frequency or damage condition is treated as one variable ( 14 ). However, this treatment ignores that the underlying measured material property is a curve and can encounter difficulties if the data points are not evenly distributed, are not the same across different subjects/specimens, or are larger than the number of subjects/specimens ( 15 ).
Here, it is argued more broadly that these data can be treated as “functional data,” defined as data obtained by observing several subjects over time, space, or other continua. In this case, functional data analysis (FDA), a branch of statistics that analyzes data that provide information about curves, surfaces, or anything else over a continuum, offers a more accurate framework from which to assess variation in the test outcomes ( 16 ). However, direct adoption of FDA is not well suited to dynamic modulus and damage characteristic curve data. First, the core technique of FDA lies in its nonparametric smoothing of the data points and thus adds the complexity of ANOVA. The dynamic modulus and damage characteristic curve data, on the other hand, usually are represented by fitted functions for further analysis. Second, the way of calculating average for different replicates is different. In FDA, the average of multiple replicates is calculated as the mean value at different sampling points, whereas in the S-VECD analysis, the average is a fitted curve including data points of all the replicates. Third, there is no developed framework of a precision statement in the FDA theory.
In this paper, damage characteristic curve data are adopted to demonstrate how to quantitatively characterize variation and build a precision statement of the test results for functional data.
Objectives
The first objective of this study is to propose a variance (v) index of the functional data and validate it by comparing with expert judgment on data variation. The second objective is to present the application of the v index with an example of a framework to analyze the repeatability limit of the test results as functional data. This framework is established following ASTM E691 using the resampling method and hypothesis test. This paper aims to bridge the gap between a robust statistical analysis and functional data from some mechanical test results.
Background
S-VECD Modeling
The S-VECD model is a mechanistic model that relates material integrity to the amount of damage in the material ( 17 ). This model predicts the stiffness of asphalt mixtures, denoted by the variable C, as a consequence of the damage growth, S, under cyclic loading. This C versus S relationship, the so-called damage characteristic curve, is independent of mode of loading, loading history, and test temperature. Consequently, prediction of the damage response to any given loading conditions is possible from limited tests, thus it makes the fatigue cracking performance prediction of asphalt mixtures efficient compared with empirical methods.
The S-VECD model incorporates the time–temperature superposition principle, the elastic–viscoelastic correspondence principle, and the work potential theory. The linear viscoelastic properties of asphalt mixture are required to implement the S-VECD model. In this study, the 2S2P1D model, the combination of two springs, two parabolic creep elements, and one dashpot, is used to characterize the dynamic modulus as a function of loading frequency and temperature ( 18 ). Equations 1–3 represent the storage modulus from the 2S2P1D model.
where E′(ω,T)=storage modulus at a particular temperature and angular frequency (kPa or pounds per square inch [psi]); E′(ω R )=storage modulus at a particular reduced angular frequency (kPa or psi); ω R =reduced angular frequency (rad/s); max E′=maximum storage modulus; b, d, g=fitting coefficients; E0=maximum storage modulus value (kPa or psi); κ, δ, γ, h, β, τ E =fitting coefficients; and E′2S2P1D=storage modulus from the 2S2P1D model.
The damage characteristic curve, one of the key material functions, is represented by the power law model in this study based on the results of Lee et al. ( 19 ).
where C=pseudo stiffness, S=damage parameter, and C11 and C12=fitting coefficients for power law model.
The other key material function of the S-VECD model is the pseudo strain energy-based fatigue failure criterion (D R ). The D R is determined as the slope of the line formed by the relationship between the summation of (1–C) up to failure versus the number of cycles to failure (Nf), as shown in Equation 5. The Nf in this study is defined as the cycle where the product of peak-to-peak stress and cycle number reaches a peak value.
where D R =failure criterion, C=pseudo stiffness, N=cycles, and Nf=number of cycles to failure.
Hypothesis Test
A hypothesis test is used to determine whether to reject the null hypothesis with the evidence from a study at a pre-specified level of significance. Commonly, two sets of data are compared. The null hypothesis (H0), initially assumed to be true, proposes no relationship between these two datasets. The alternative hypothesis (H1), the assertion contradictory to H0, proposes a statistical relationship between these two datasets. If the sample data suggest that H0 is false, the null hypothesis will be rejected; if the sample data do not strongly contradict H0, the plausibility of the null hypothesis remains.
There are two equivalent testing processes for a hypothesis test. Both use the concept of test statistic in the process. A statistic is a quantity computed from sample data and used for statistical purposes (e.g., average, variance of sample data). One test process is more traditional and was advantageous in the past when only tables of test statistics at common probability thresholds were available (e.g., t-test and F-test). It requires certain statistical assumptions to fit into a well-known distribution (e.g., t-distribution, normal distribution) so that a decision can be made without the calculation of a probability. The other test process requires more computational support and the explicit calculation of a probability.
In this paper, the latter process is adopted with the following detailed steps:
Compute from the data observations the observed value of the test statistic.
Calculate the p-value, which is the probability of sampling a test statistic at least as extreme as the one observed under null hypothesis.
Reject the null hypothesis if and only if the p-value is less than the pre-specified significance level α.
Resampling Method
Resampling methods have become more practical with increases in computing power and the availability of built-in functions in many commercial software packages. They can be used to quantify the uncertainty by calculating standard errors and confidence intervals and by performing a hypothesis test. These modern methods have several advantages: (1) they have fewer assumptions than the theoretical sampling methods as they do not assume the normal distribution nor do they require large sample sizes; (2) they are more accurate than the classical sampling methods and can address questions that cannot be answered with traditional parametric or nonparametric methods, such as comparison of medians or ratios; (3) they are usually similar for a wide range of statistics and do not require new formulas for every statistic; and (4) they can be analogized to the theoretical sampling method and intuitively understood ( 20 ).
In this paper, permutation resampling and bootstrap resampling are introduced and utilized. The permutation test is usually used to conduct a hypothesis test of “no effect”. The bootstrap test is usually used to construct confidence intervals ( 21 ).
Permutation Resampling
The permutation resampling method randomly redistributes or redraws from all the observed data without replacement. It can be described in detail as follows. First, a test statistic is defined and computed for the experiment data that can potentially reflect the deviations from the null hypothesis. Then, the data are randomly sampled to fill the first comparison group; then, the data are resampled to fill the second comparison group. Both groups have the same number of samples as the experiment. In this way, data in the original experiment sample are replaced by resampled data. As the resampling is done without replacement, the new resampled data are just a permutation of the original data. Finally, the test statistic is recomputed using the new resampled data to construct a distribution of the test statistic. This procedure is repeated until all the possible permutations are accounted for.
Bootstrap Resampling
The bootstrap method randomly redistributes or redraws from all the observed data with replacement. The idea is to mimic the process of selecting many samples from an infinite-sized population to create sampled data of the original size. Then, the distribution of the test statistic can be used to construct the confidence interval.
Material and Data
Nine mixtures of various nominal maximum aggregate sizes (NMAS), binder types, and recycled asphalt pavement contents were used in this study. Table 1 provides details of the mixtures used. Mixture A results are obtained from multiple users but targeted at the same mixture design and test conditions. Results of mixtures B–F include different aging levels for the same mixture and of mixtures G–I include different test conditions. The term “dataset” is used to describe data from one test for a specific material that has multiple replicates. The term “individual data” is used to describe data from one replicate. The number of replicates of one dataset is two, three, or four. A total of 123 datasets are used, where nine of them have a sample size of four, three of them have a sample size of two, and 104 of them have a sample size of three. The total number of individual functional data is 354. The 104 datasets with a sample size of three are used in developing the variance index. The eight datasets in mixture A are used in the example analysis to establish the repeatability limit. The 91 datasets from mixtures A–G are used to verify the validity of the repeatability analysis.
Mixture Information
Note: NA = not available (do not exist in data); NMAS = nominal maximum aggregate sizes; RAP=recycled asphalt pavement.
Number of replicates of each test.
Figure 1 presents a typical damage characteristic curve result with three replicates. Each replicate has an individual data result, and the failure point of this specimen depends on the test conditions (input strain level, etc.), and thus is specimen dependent. The failure point of each specimen is marked as Cfail and Sfail. The individual data has one fitted curve to represent the result of each replicate for variance index development. All the datapoints from the three specimens are used to generate one fitted curve as the final material property for further analysis. The general equation of the damage characteristic curve is shown in Equation 4.

Typical test result of the damage characteristic curve.
Method
A summary of the overall methodology followed in the analysis presented in this paper is shown in Figure 2, where it can be seen that the analysis consists of two parts: (1) variance index development and (2) an example analysis of repeatability limit establishment as the application of the variance index.

Schematic of the overall methodology.
The variance index development process includes three steps. First, each dataset is classified using expert judgment into categories (1–5, with 1 being very good and 5 being very poor) based on the variation of the individual specimens in the test set. Second, a general form of a variance index is defined for the functional data based on the standard variance of univariate data. Finally, specific aspects of this variance index form are performed, and the results of each variant are compared with the expert classification.
The example analysis includes four steps. First, a consistency test is performed to identify users that are inconsistent compared with results from other users. Second, the inconsistent data are investigated to decide whether they should be deleted for further analysis. Third the variance index distribution function is constructed using the accepted data. Finally, the repeatability limit is selected at a predefined threshold.
Variance Index Development
Initial Classification of Data
The variance of each dataset is subjectively grouped into a 5-point scale category—“very good,”“good,”“fair,”“poor,” and “very poor” from the collective expert judgment of the authors of this paper. As it is not very easy to visually distinguish the variance of two datasets using different sample sizes, only the 104 datasets that have a sample size of three are selected. The category grouping is determined by checking the dataset area between the top-most and bottom-most damage characteristic curve and visually comparing them within the same range. First, all the datasets are examined to collectively define what a “good” and “poor” variation is such that the “good” represents the acceptable test results and the “poor” represents the unacceptable test results. They are plotted individually with each dataset, as shown in Figure 3. Then, each dataset is examined by drawing a vertical line and measuring the width of the given dataset, labeled as tc, and comparing that with the width of the “good” and “poor” datasets, labeled as tg and tp, respectively. The details of the grouping rules are as follows:
If tc<tg, and the area of the compared dataset is smaller than the good dataset, it is in the “very good” category, as shown in Figure 3, a and b.
If tc≈tg, and the area is very close to the good dataset, it is in the “good” category, as shown in Figure 3, c and d.
If tg<tc<tp, and the area is bigger than the good dataset but smaller than the poor dataset, it is in the “fair” category, as shown in Figure 3, e and f.
If tc≈tp, and the area is very close to the poor dataset, it is in the “poor” category, as shown in Figure 3, g and h.
If tc>tp, and the area is bigger than the poor dataset, it is in the “very poor” category, as shown in Figure 3, i and j.

Grouping by expert judgment: (a) a “very good” example; (b) a “very good” example zoomed in; (c) a “good” example; (d) a “good” example zoomed in; (e) a “fair” example; (f) a “fair” example zoomed in; (g) a “poor” example; (h) a “poor” example zoomed in; (i) a “very poor” example; and (j) a “very poor” example zoomed in.
The Form of Variance Index
The form of the variance index proposed in this paper, Equation 6, is motivated by the variance function for univariate data, Equation 7.
where v=the variance index of functional data, f(x) i =the general fitted function of individual data, =the general fitted function of all the data in the dataset, n=the number of individual datasets, and xend=the selected end point of the integration.
where s2=the sample variance, xi=the value of one observation,
For Equation 6, two variations are considered: (1) the choice of x and (2) the choice of xend.
The choice of x: As shown in Equation 4, the choice of x in the v index of the damage characteristic curve could be either the pseudo stiffness C or the damage parameter S. If it is S, then f(x) takes the form in Equation 8. When this choice is used, the corresponding v index is referred to as vs. This v index quantifies the cumulative difference in the dataset of C with the change of S. If the choice is C, then f(x) takes the form in Equation 9. The corresponding v index is referred to as vc. These two methods of calculating the v index can have very different results in characterizing the variance of functional data depending on the shape of the dataset.
2. The choice of xend: In the v index calculation, it is also important to select the range of integration because different mixtures or even specimens of the same mixture can have different failure points. The xend should not be so big that it requires too much extrapolation from the data, nor should it be so small that it always does not represent the spread of data. For the initial development of these candidate indices, the frequency distributions of C and S at failure from all the tests in the database were evaluated. The frequency of 1–Cfail and Sfail for all the individual data used in this study are demonstrated in Figure 4. 1–Cfail is mostly concentrated in the range between 0.7 and 0.8, while Sfail is more spread across different S values. In the end, xend of 2 × 105 and 0.7 are selected for vs and vc calculations because they: (1) represent most of the datasets and (2) are in the relative middle of the whole distribution range.

Histogram of failure point: (a) 1–Cfail and (b) Sfail.
Example Analysis
The example analysis demonstrates the application of the v index to build the repeatability limit of v following ASTM E691, which is the standard practice for conducting an interlaboratory study to determine the precision of a test method for univariate data. The three main steps in the ASTM E691 procedure are as follows:
Consistency test: Statistical theory with a normality assumption is used to determine whether the collected data are adequately consistent to form the basis for a test method precision statement. That is, whether one laboratory has significantly different test results or much higher variance than other laboratories.
Investigation of the inconsistent data: If there are inconsistent data, those data should be investigated carefully to decide whether they should be deleted to form the precision statement.
Obtain the precision statistics: The repeatability limit and reproducibility limit statements are developed using the mean and standard deviations of consistent data measured from multiple users, laboratories, or both.
The resampling method and hypothesis test are used to build the framework of statistical analysis for functional data under the direction of ASTM E691. The repeatability limit used in this paper has a slightly different meaning from the term used in ASTM E691. If a test procedure requires n replicates and takes the average as the test results, then the repeatability limit identifies the variance of different test results. However, this paper considers the repeatability limit more practically as the variance of different replicates because the users are expected to conduct one test result with replicates rather than multiple test results to characterize the repeatability considering the time and cost required for the damage characteristic curve result.
Consistency Test Statistic
The consistency test consists of a between-user test and a within-user test. The between-user consistency test is to identify whether the data obtained by one operator is significantly different than the others (t test in ASTM E691). The within-user consistency test is to identify whether the data obtained by one operator has a significantly higher variance than the others (F-test in ASTM E691). Accordingly, a t-statistic and F-statistic for the two tests are constructed for functional data as shown in Equations 10 and 11 and Equations 12–14.
where
where
Resampling Method
The resampling method is used to conduct the consistency test and construct the v index distribution function. The null hypothesis (H0) for the between-user consistency test is that all the datasets are equivalent across different operators. The hypothesis for the within-user consistency test is that all the datasets have the same variance across different operators. The resampling method is to test whether there is enough evidence from the data that H0 can be rejected. The general steps for the resampling method are as follows:
Step 1: Pool the data together under the null hypothesis or equality assumption.
Step 2: Determine the test statistic X (X=t, F, or v).
Step 3: Randomly select n individual data results with replacement (bootstrap resampling for v distribution function) or without replacement (permutation resampling for consistency test).
Step 4: Calculate the test statistic for the selected sample as X1.
Step 5: Repeat N times Step 3 and Step 4 and get X1, X2, …XN.
Step 6: Calculate the test statistic for each operator and get Xop1, Xop2, …Xop8.
Step 7: Calculate p-value for each operator using Equation 15.
where k is the kth operator.
The consistency test uses permutation resampling, Steps 1–7 above, and N is all the possible permutations of the dataset. In the between-user consistency test, three individual data results are selected for each resampling. In the within-user consistency test, all the individual data are sampled to eight groups with the number in each group corresponding to the eight operators for each resampling. The v distribution function uses bootstrap resampling with three individual data selected for each sampling, and in this case only Steps 1–5 are performed, and N equals 5,000 after different trials.
If H0 is true, there is no evidence of statistical inconsistency between the operators. After all the datasets are pooled together and individual data results repeatedly selected to calculate test statistic Xi, all those values should be similar to Xop1, Xop2, …Xop8. However, if one operator has different results than others, its X value will be significantly bigger than the resampled Xi values because pooling and resampling reduces Xi. In the end, if p-value is smaller than a certain level (0.05, corresponding to 95% certainty, for the current study), the H0 is rejected and there is a difference.
The v Index Distribution Function
The inconsistent data from the consistency test are investigated to determine whether they should be deleted in constructing the v index distribution function. Then, all the accepted data are used following Steps 1–5 in the resampling method. The 95% repeatability limit is thus selected from the distribution function.
Results of v Index Development
Classification of Data
The distribution of data quality as assessed by expert judgment is shown in Figure 5 for mixtures of varying NMSA. As expected, the test results for the 9.5-mm mixture has a higher overall quality than the 19-mm and 25-mm mixtures. Note that the 19-mm mixture has a higher proportion of very poor cases than the 25-mm mixture. This is because the 19-mm mixture cases were measured using AASHTO TP 133, which uses a specimen of 38 mm × 110 mm (diameter × height). The 25-mm mixture, on the other hand, was tested using AASHTO TP 107, which uses a specimen of 100 mm × 130 mm.

Frequency of subjectively ranking the category of all the datasets.
Comparison of v Index
The results of variance ranking by expert judgment, vc index and vs index, are presented in Figure 6. All the 104 datasets are grouped by expert judgment into five categories from “very good” to “very poor,” with the color coding as presented in Figure 6a. Then, each dataset is calculated with the variance index using vc or vs and sorted according to the smallest to largest variance index. Therefore, Figure 6, b and c, represents the variance ranking calculated using the corresponding index and the color coding represents the category determined by expert judgment. The “best” method is the one that mostly matches with the color mapping in Figure 6a.

Ranking comparison using different methods: (a) expert judgment; (b) vc index, and (c) vs index.
Based on this color map, the vs index has a better performance in matching the expert judgment than the vc index, but neither perfectly match the expert judgment. The cases where vs results give contradictory conclusions to those of the expert judgment have been investigated and examples are shown in Figure 7. The vs index of the dataset in Figure 7a is 566 and in Figure 7b is 475. However, the dataset in Figure 7a seems to have smaller variance than that in Figure 7b. It is noticed that these two datasets have very different Sfail. This difference means that the vs is determined using extrapolated fitting of the damage characteristic curve.

Cases in which the vs index gives contradictory conclusions to those of the expert judgment: (a) higher vs with “fair” category and (b) lower vs with “very poor” category.
To solve this issue, a correction factor is applied for both vc and vs so that each dataset can use its own xend. The basis of the correction factor is to reduce the effect of extrapolation by assuming the ratio between the v index in Equation 6 at two different xend is universal across all the mixtures.
The data from mixture A have the most stable results and late failure points. They are used to construct the correction factor with the bootstrap resampling method. Steps 1–5 of the resampling method are used to resample 5,000 v index using different xend. Then, the ratio is calculated using Equation 17 for each resampled data.
where v(xend)=the variance index calculated at xend; x r end =the reference failure point, which is 0.7 for vc corr and 2 × 105 for vs corr ; and x k end =the selected xend. The average β(x k end ) of all the 5,000 samples are calculated and fitted as a universal correction factor function. Finally, the corrected variance index can be calculated using Equation 18.
where

Ranking comparison with different methods: (a) expert judgment; (b) vc corrected; and (c) vs corrected.
The results indicate that vs with correction matches expert judgment very well. The datasets that do not match were investigated, and it was found that their variance was hard to justify by simply observing the data.
Discussion
The variance index is related to either the vertical or the horizontal difference between individual damage characteristic curves. The vertical difference, vs, amplifies the difference when the curve has a high slope compared with the reference “good” or “poor” dataset. The horizontal difference, vc, amplifies the difference when the curve has a low slope compared with the reference “good” or “poor” dataset. Two datasets that both have small variance are presented in Figure 9, a and c, to show the difference in the vs and vc calculation. Figure 9b shows the zoomed-in version of Figure 9a, in which the vertical line indicates that the dataset has visually similar variance to that of the “good” reference and the horizontal line indicates that it has much less variance. Thus, the vc index acts as a relatively better indicator of the variance for this test. Figure 9d shows the zoomed-in version of Figure 9c, in which the vertical line shows that the dataset is less variant than the “good” reference and the horizontal line shows that it is more variant. Thus, the vs index performs a better result.

The effect of the shape of the datasets on the variance index: (a) an example of a “good” dataset; (b) dataset in (a) zoomed in; (c) an example of a “very good” dataset; and (d) dataset in (c) zoomed in.
A review of the datasets used in this paper shows that most have a damage characteristic curve with a low slope, so the vs is better suited in capturing the variance of test results. This also seems consistent with other datasets that were not evaluated specifically in this paper and seems to be a basic characteristic of damage characteristic curves. This effect is also the reason why the correction factor increases the performance of vs well but not of vc. Datasets with vs that overestimates the variance compared with expert judgment usually have an early Sfail, whereas datasets in which vc overestimates the variance usually already have a late 1–Cfail. In sum, these observations suggest that vs corr presents as a reasonable variance index for the damage characteristic curve results.
Example Analysis
Eight datasets of mixture A are used for the example calculation and to demonstrate the development of a repeatability assessment methodology for functional data. They are obtained from different operators conducting the test targeted for the same mixture.
Consistency Test
Between-User Consistency (t-Statistic)
The between-user consistency test results are shown in Figure 10. The corresponding t-statistic takes the form in Equations 19 and 20 with vs corr . The p-value for each operator is presented in Table 2. The data from Operators 5 and 8 have p-value smaller than 0.05, which indicates that the damage characteristic curve results from these two operators are significantly different from the others (either higher or lower). Actually, these two operators were found to have used a different compactor in fabricating the material specimens than the others.
p-Value for Consistency Test

A t-statistic distribution.
where Sref = 2 × 105.
Therefore, these data are flagged and deleted in resampling of the variance distribution function. It should be noted that this particular example is evaluating multi-user, single laboratory repeatability. For multi-laboratory repeatability, the gyratory compactor may not singularly flag inconsistent data, and the test method should not be interpreted as being overly sensitive to the compactor.
Within-User Consistency (F-Statistic)
The within-user consistency test results are shown in Figure 11. The corresponding F-statistic takes the form in Equations 21–23 with vs corr . The p-value for each operator is presented in Table 2. The data of Operator 6 have a p-value smaller than 0.05, which indicates that the damage characteristic curve results from Operator 6 have a significantly higher variance than the others. A review of the testing record from Operator 6 shows that it does not have much testing experience with asphalt mixture. Therefore, this dataset is also deleted for further resampling of the variance distribution function.

An F-statistic distribution.
The v Index Distribution Function
Data that meet the requirements of the consistency test are used to establish the variance index (vs corr ) distribution function using the resampling method. The results of the raw and fitted distributions are shown in Figure 12. The fitted function was a gamma probability distribution with the form in Equation 24, and the result is v∼Gamma (1.01, 40). The 95% repeatability limit of the v index is 121.

A v index distribution.
Datasets of mixtures A–G with 9.5-mm NMAS are used to identify the validity of the v distribution function and the repeatability limit. First, the values of vs corr at different percentile (10%, 30%, 50%, 70%, 90%, 95%, and 99%) in the gamma distribution are calculated. Then, the corresponding percentile of these vs corr values in the datasets are computed. The results are presented in Figure 13. The two percentiles match each other closely and indicate that the repeatability limit is valid.

Percentile comparison between the v index distribution and the datasets.
where k,θ=parameters in probability density function of gamma distribution.
Conclusions
This paper proposes a variance index for assessing the variation of the damage characteristic curve from AASHTO TP 133 and AASHTO TP 107. The application of the variance index is presented with an example to establish the repeatability limit of the test results as functional data. The following conclusions can be drawn.
Of the variation indices evaluated, vs corrected can be used to calculate the variance of damage characteristic curve results of asphalt mixtures. It matches the best with expert judgment.
The shape of the damage characteristic curve data can affect the performance of the variance index.
The resampling method and hypothesis test can be used to flag inconsistent data of damage characteristic curve data from multi-user or multi-laboratory results.
The resampling method can also be used to construct the repeatability limit of the variance index for a specific material.
More analysis on how the variance of the damage characteristic curve affects the pavement cracking calculation and whether the correction factor can be eliminated need to be studied. The variance index and framework of repeatability limit analysis will be used in the interlaboratory study as the next step. Finally, while this methodology was applied using the S-VECD damage characteristic curve, the authors believe that it is generally applicable to any functional relationship. As such, this method can be applied to other test methods that yield functional relationships, such as dynamic modulus, empirical fatigue functions, permanent deformation models, and creep compliance standards.
Footnotes
Acknowledgements
The authors acknowledge the test data support from Zhe Zeng, Felipe Pivetta, Lei Xue, Mukesh Ravichandran, Yongchang Wu, and Nooralhuda Saleh in North Carolina State University, Raleigh, NC.
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: J. Ding, B. S. Underwood, C. Castorena, Y. R. Kim; data collection: K. C. Lee, J. Ding; analysis and interpretation of results: J. Ding, B. S. Underwood, C. Castorena, Y. R. Kim; draft manuscript preparation: J. Ding, B. S. Underwood, K. C. Lee. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
