Abstract
In order to determine the potency of the test preparation relative to the standard preparation, it is often important to test parallelism between a pair of dose-response curves of reference standard and test sample. Optimal designs are known to be more powerful in testing parallelism as compared to classical designs. In this study, D-optimal design was implemented to study the parallelism and compare
Introduction
It is often important for scientists to determine the parallelism between sets of dose-response data, typically to compare potency of a test preparation relative to a standard preparation. For example, decrease or increase in Biomarker concentration is only reliably in case parallelism between endogenous Biomarker and concentration-response curve is demonstrated. What’s more, parallel testing can increase throughput and reduce test execution.
The method of testing parallelism will influence the efficiency of the whole process, and even the rate of success. Traditionally parallelism test methods are intended for testing equality between pairs of parameters between the two dose-response curves. To test the equality, one way is to compute the joint confidence region. But it’s complicated when a nonlinear curve model is used. Therefore, approximations are made to simplify the task, such as using the intersection of marginal confidence intervals as an approximate confidence region (Callahan & Sajjadi, 2003; Lansky, 2003). However, this approximation causes the confidence interval to be much larger than it should be, resulting in curves being labeled as parallel when they are not.
A approach, referred to as equivalence tests, assumes lack of similarity, and seeks evidence to prove similarity (see Callahan & Sajjadi, 2003; and Hauck et al., 2005). Based on the equivalence tests, Berger (1982), Casella and Berger (1990), and Berger and Hsu (1996) gave a more complete discussion of Intersection-Union Test (IUT) theory. IUT for practical parallelism is often used in bioequivalence testing and can be easily implemented by using a sequence of one-sided approximate
One important factor for the success of testing parallelism is how to design the experiment. An optimal design specifies how to distribute resources over doses in the most efficient manner and the dose levels at which to take observations. Optimal designs, by facilitating the data-collection process and subsequent data analysis in a cost-effective manner, are more flexible and efficient, while classical designs require a greater number of experimental runs to estimate interesting parameters with the same precision as an optimal design.
In practical terms, optimal designs can provide accurate statistical inference with minimum cost. They minimize the variances of estimating interesting parameters and make prediction without bias. Optimal designs use different criteria based on the goal of the experiment. In general, searching optimal designs for linear models with normal error is not so complicate. The Fisher information matrix for a linear model is independent from the model parameters, so the optimal designs are obtained in an explicit form. In practical situations, however, many natural phenomena follow nonlinear models. Efficient designs for nonlinear models are needed in a multitude of application areas. Under nonlinear models, the Fisher information matrix depends on the unknown model parameters (Chernoff, 1953).
Even though there are so many optimal design methods we can use to test parallelism, we just study D-optimal design in this paper. This type of optimal design is constructed to minimize the generalized variance of the estimated regression coefficients. D-optimality is a powerful experimental design for the determination of parallelism in biological applications, because it minimizes the variance of estimating the model parameters and this helps to increase the power of IUT test (Fedorov, 1972; Silvey, 1980; Atkinson & Donev, 1992; and Pukelsheim, 1993). D-optimal design for logistic models with four parameters was introduced by Li and Majumdar (2007). Proper modification is made on the D-optimal design to use it for the IUT test. In order to check the performance of the modified D-optimal design, we conduct simulation studies to compare the power of IUT test under several scenarios with the design used in the paper (Jonkman & Sidik, 2009).
Section 1 gives introduction about the definition of D-optimal design and the motivation of using it for testing parallelism. In Section 2, we suggest a simple method of testing the hypothesis based on the IUT and explain the algorithm we used to search D-optimal design. Two examples that illustrate the procedure of D-optimal design and contrast it with classical design are considered in Section 3. A simulation study involving cases based on one of the examples is presented and some of the implications of the simulation results are discussed in Section 4. A brief discussion and summary are presented in Section 5.
Background
Parallelism is observed where the dose-response curve of the test sample is a horizontal shift of that of the reference standard on the logarithmic dose axis. Mathematically, two functions are parallel if one function can be obtained from the other by a scaling of the dose axis.
Mathematical function
Consider the 4-parameter logistic (4PL) model
where
Considering a test group and a standard group, if
In the experiments, the goal of optimal design is to make the variances of interesting parameters and prediction as small as possible. To minimize the variances, the levels of doses and the distribution of subjects over the doses must be decided, depending on how many subjects are available and the range of dose levels.
Suppose the design space is denoted by
The Fisher information matrix for the 4PL model is
where the function
The D-optimality criterion is to choose a design maximizing the information on
where
A state-of-the art algorithm (YBT algorithm) was proposed to find locally optimal designs for a single objective and showed that it outperformed to other current algorithms. Starting from a randomly selected initial design, the YBT algorithm selects the dose that maximizes the sensitivity function and adds to the previously selected designs. At the same time, their optimal weights are obtained directly using the Newton-Raphson method (Quinn, 2016). However, the problem in YBT is that if the selected initial design points far from the optimal design points, then the YBT requires a lot more time to converge to an optimal design and sometimes it failed to do so. In this paper, the modified YBT algorithm is employed to obtain the D-optimal designs (See Hyun et al., 2018). The procedure was modified by selecting better starting design points via the V-algorithm, and this improved the search speed to obtain the optimal designs (S.W. Hyun, and W.K. Wong. Yang, 2013). The modified algorithm performs greatly to obtain all the optimal designs in this paper.
In the context of the four-parameter logistic curve, establishing practical equivalence enables the assessment of relative potency via the parameters
versus
where
According to IUT theory (Berger, 1982; Casella & Berger, 1990; and Berger & Hsu, 1996), this method is useful when the null hypothesis is expressed as a union, and the alternative hypothesis is expressed as an intersection. The null hypothesis is true if any of component parts are true since it is a union, and if any component of the intersection in the alternative is false, the hypothesis is false. Thus, by constructing the test so that each component of
Let
Propose
If some design points between
We statistically test the assumption of parallelism prior to estimating the relative potency, since the potency of the test sample compared to the standard is defined simply in terms of the parameter
versus
For
Reject
The remaining four tests proceed in analogous fashion. Reject
Examples were given by Dykstra (1971) to illustrate the difference between classical design with D-optimal design. One experiment is ten-dose bioassay whose dose-response curves are parallelism, with 3 reps, 10 doses and
In the ten-dose bioassay, the parameters from Jonkman and Sidik (2009) are
This is the classical design, in which 10 doses are equally spaced with equal replication.
Designs for parallel and non-parallel examples
The simulated data and fitted curves for ten-dose bioassay (parallel).
For the standard line, also use
Set
We used the nonlinear least-squares estimates method to get the estimated parameters based on the simulated data. In least square regression, a regression model was established, in which the sum of the squares of the vertical distances of different points from the regression curve is minimized. Estimated parameters are as follows:
Test the parallelism using this simulated data. For parameter
For parameter
For parameter
Then D-optimal design was motivated for testing parallelism. In Table 1, the first two lines show the D-optimal design for the test line, the second two line show the D-optimal design for the control line, and the last two lines show the modified D-optimal design for testing the parallelism that can be used for fitting both test and control lines. In each design, the first row,
For the test line, use the modified D-optimal design
The simulated data and fitted curves for toxicity (non-parallel).
Test the parallelism using this simulated data. For parameter
For parameter
For parameter
Classical design was applied to test parallelism, in which 12 doses are equally spaced with equal replication.
For the standard line, also use
Draw the simulated data to create the Fig. 2.
For this classical design, we also used nonlinear least-squares to estimate the parameters based on simulated data, which are shown as follows:
Then, the parallelism was also tested with classical design by using these new simulated data. For parameter
We got the D-optimal design for the two lines and the modified D-optimal design in Table 1. In this example, total of 36 responses are used to fit both lines, so the number of replicated responses of modified D-optimal design
Then, D-optimal design was applied to test parallelism. For the test line, use
Test the parallelism using this simulated data. For parameter
As seen from Table 2, based on the results of the proposed test, the toxicity assay with two designs both do not provide compelling evidence that the response profiles are in fact parallel. While comparing the
In first example, inspection of Fig. 1 suggests that the response profiles are approximately parallel. Testing parallelism with the two design methods both indicated that response lines are parallel. In contrast to the preceding example, the plot does give some indication of non-parallel the second example response profiles, as the horizontal distance between the test observations and the standard observations appears to decrease somewhat as the response level increases. In order to validate the outperformance of the modified optimal design on the IUT test, we conduct simulation studies under several different scenarios.
Previous section shows some insight of the benefit of using D-optimal design for the IUT test. To assess more precise properties of the D-optimal design over the classical design method for testing the parallelism, we performed a simulation study based on the same simulation set-up in Jonkman and Sidik, 2009. The paper conduct simulation studies for testing parallelism under 5 different scenarios:
A case set A case set A case set A case set A case set
At the first part, we considered the classical design in Table 1 to simulate the responses. Next, tested whether the dose-response curves are parallelism or not separately by IUT. Then we run this 10 000 times to calculate the power of detecting the parallelism via IUT. The investigation was conducted in the same way for the above five cases.
At the second part, repeated all the steps in the previous part using the modified D-optimal design instead of the standard design. The “VNM” package in R was performed to obtain multiple-objective optimal design. The MOPT function was used to maximize the optimality criterion and verify the optimality of the generated design using the General Equivalence Theorem (See Hyun, Wong and Yang). All the obtained D-optimal designs and the modified D-optimal design for each case are given in Tables 3 and 4.
Modified D-optimal design for two lines
Modified D-optimal design for two lines
The results of the simulations are shown in Table 5. For each test and each simulation case, the table value is power of detecting parallelism, which is the proportion of times among the 10,000 replicates that each test resulted in a declaration of parallelism. In case 1, the response curves are not exactly parallel, but they are well within the equivalence limits, so we argue that they are approximately parallel. In this case, when the standard deviation of the two preparations was high, the classical design declared parallelism relatively rarely (16.89%), while the D-optimal design with the same test method, declared parallelism a majority of the time (85.14%). Also, it’s true that the null hypotheses are easily rejected when the standard deviation is small. That is simulated data were closely distributed around the mean value and didn’t change a lot. However, we could note that for a given value of
The power for D-optimal design and classical design
In case 2, the response curves are exactly parallel, and only the potency differs. In this case, the test with D-optimal design declared parallelism between 88.14% and 99.98% of the time. Because the null hypothesis for the test is true in this case, this means that the test maintained a rejection rate in a high level. For the both design approaches, the percentage of declaring parallelism increased as the standard deviation decreased. However, for standard design, the proportion of rejecting hypothesizes decreased more greatly (from 87.62% to 20.53%). Overall, the results from cases 1 and 2 suggest that tests to establish parallelism with classical design may be more sensitive to the standard deviation of the data than the tests with D-optimal design. Even for small
In case 3, the ratios of the slopes and the lower and upper plateaus were all set on the boundary of the equivalence limits. For this case, the tests with D-optimal design never declared parallelism for any of the 10,000 simulation replicates, regardless of the value of standard deviation. Similarly, the tests with classical design almost never declared parallelism in this case. Since the null hypothesis is true in this case, this represents an empirical type I error rate of zero. The results confirm that changing design approach would not affect the rejection ratio when the dose-response curves are not parallelism.
In case 4, the ratio of the lower plateaus was set just outside the equivalence limits (
Overall, the simulation results indicate that the IUT with the modified D-optimal design for equivalence works better in all cases than with the classical design. The modified D-optimal design appears to be clearly more effective in the cases illustrated by the examples: precise assays where the true response curves are approximately or exactly parallel, and assays where the standard deviation is relatively high. It is more likely that the true curves will be only approximately parallel with high standard deviation in practical situations, and thus it will be much more efficient to use the modified D-optimal design for testing parallelism in practice.
In this article, the problem of testing the parallelism of the response curves for a test preparation and a standard preparation using the 4PL model was discussed. we argued that the problem may be more accurately detected by IUT with implementing the D-optimal design. This result is obtained through comparing the ratio of rejecting null hypothesis, which can be readily constructed using the output from standard nonlinear regression software. The IUT was easily implemented by using a sequence of one-sided approximate
D-optimal design minimizes the variance of parameter estimated, so it appeared to be clearly more effective. The simulation results suggested that the modified D-optimal design indeed provides precise results of testing parallelism in the situations that the true response curves are indeed exactly or approximately parallel. The classical design does not help very much for reducing the variance of estimating the parameters while the D-optimal design does. The classical design becomes more problematic when there is large variance in the dataset. In the simulation cases where the true response curves were clearly nonparallel, both designs failed to support parallelism the vast majority of the time. As we have noted before, approximately parallel lines with high standard deviation are more likely to be tested in practice and this study suggests using that the modified D-optimal design can be employed for testing parallelism in real bioassay.
A potential question is that we just used one test method to compare the D-optimal design and the classical design. Thus, further study is recommended to contrast the two designs in other testing methods, such F test or chi-square test.
