Testing parallelism for the four-parameter logistic model with D-optimal design

Abstract

In order to determine the potency of the test preparation relative to the standard preparation, it is often important to test parallelism between a pair of dose-response curves of reference standard and test sample. Optimal designs are known to be more powerful in testing parallelism as compared to classical designs. In this study, D-optimal design was implemented to study the parallelism and compare $+$ its performance with a classical design. We modified D-optimal design to test the parallelism in the four-parameter logistic (4PL) model using Intersection-Union Test (IUT). IUT method is appropriate when the null hypothesis is expressed as a union of sets, and by using this method complicated tests involving several parameters are easily constructed. Since D-optimal design minimizes the variances of model parameters, it can bring more power to the IUT test. A simulation study will be presented to compare the empirical properties of the two different designs.

Keywords

D-optimal design dose-response curves intersection-union test four-parameter logistic model

1. Introduction

It is often important for scientists to determine the parallelism between sets of dose-response data, typically to compare potency of a test preparation relative to a standard preparation. For example, decrease or increase in Biomarker concentration is only reliably in case parallelism between endogenous Biomarker and concentration-response curve is demonstrated. What’s more, parallel testing can increase throughput and reduce test execution.

The method of testing parallelism will influence the efficiency of the whole process, and even the rate of success. Traditionally parallelism test methods are intended for testing equality between pairs of parameters between the two dose-response curves. To test the equality, one way is to compute the joint confidence region. But it’s complicated when a nonlinear curve model is used. Therefore, approximations are made to simplify the task, such as using the intersection of marginal confidence intervals as an approximate confidence region (Callahan & Sajjadi, 2003; Lansky, 2003). However, this approximation causes the confidence interval to be much larger than it should be, resulting in curves being labeled as parallel when they are not.

A approach, referred to as equivalence tests, assumes lack of similarity, and seeks evidence to prove similarity (see Callahan & Sajjadi, 2003; and Hauck et al., 2005). Based on the equivalence tests, Berger (1982), Casella and Berger (1990), and Berger and Hsu (1996) gave a more complete discussion of Intersection-Union Test (IUT) theory. IUT for practical parallelism is often used in bioequivalence testing and can be easily implemented by using a sequence of one-sided approximate $t$ -tests, which can be readily constructed using the output from standard nonlinear regression software. It may provide more reliable and satisfying inference than the other methods, in the sense that rejecting the null hypothesis establishes evidence in favor of practically parallel response curves. So IUT is considered for testing parallelism in this research.

One important factor for the success of testing parallelism is how to design the experiment. An optimal design specifies how to distribute resources over doses in the most efficient manner and the dose levels at which to take observations. Optimal designs, by facilitating the data-collection process and subsequent data analysis in a cost-effective manner, are more flexible and efficient, while classical designs require a greater number of experimental runs to estimate interesting parameters with the same precision as an optimal design.

In practical terms, optimal designs can provide accurate statistical inference with minimum cost. They minimize the variances of estimating interesting parameters and make prediction without bias. Optimal designs use different criteria based on the goal of the experiment. In general, searching optimal designs for linear models with normal error is not so complicate. The Fisher information matrix for a linear model is independent from the model parameters, so the optimal designs are obtained in an explicit form. In practical situations, however, many natural phenomena follow nonlinear models. Efficient designs for nonlinear models are needed in a multitude of application areas. Under nonlinear models, the Fisher information matrix depends on the unknown model parameters (Chernoff, 1953).

Even though there are so many optimal design methods we can use to test parallelism, we just study D-optimal design in this paper. This type of optimal design is constructed to minimize the generalized variance of the estimated regression coefficients. D-optimality is a powerful experimental design for the determination of parallelism in biological applications, because it minimizes the variance of estimating the model parameters and this helps to increase the power of IUT test (Fedorov, 1972; Silvey, 1980; Atkinson & Donev, 1992; and Pukelsheim, 1993). D-optimal design for logistic models with four parameters was introduced by Li and Majumdar (2007). Proper modification is made on the D-optimal design to use it for the IUT test. In order to check the performance of the modified D-optimal design, we conduct simulation studies to compare the power of IUT test under several scenarios with the design used in the paper (Jonkman & Sidik, 2009).

Section 1 gives introduction about the definition of D-optimal design and the motivation of using it for testing parallelism. In Section 2, we suggest a simple method of testing the hypothesis based on the IUT and explain the algorithm we used to search D-optimal design. Two examples that illustrate the procedure of D-optimal design and contrast it with classical design are considered in Section 3. A simulation study involving cases based on one of the examples is presented and some of the implications of the simulation results are discussed in Section 4. A brief discussion and summary are presented in Section 5.

2. Background

Parallelism is observed where the dose-response curve of the test sample is a horizontal shift of that of the reference standard on the logarithmic dose axis. Mathematically, two functions are parallel if one function can be obtained from the other by a scaling of the dose axis.

2.1 Mathematical function

Consider the 4-parameter logistic (4PL) model

$\displaystyle y=a+{\displaystyle\frac{d-a}{1+{\rm e}^{[b(c-x)]}}}+\varepsilon,$ (1)

where $\varepsilon\lx@stackrel{{\scriptstyle\text{idd}}}{{\sim}}N\left(0,\sigma^{2}\right)$ with unknown ${\sigma}^{2}$ , which we have generally found useful in working with drug discovery scientists. we typically work in the log base 10 scale for the dose level. $y$ is the response of interest and $x$ is the logarithm of dose of a given preparation. The model has 4 parameters, $\theta=(a,b,c,d)$ , where $a$ is the lower asymptote of the curve, $d$ is the upper asymptote of the curve, $b$ represents the slope of the regression line, and finally $c$ is the logarithm of the dose corresponding to a mean response midway between the lower and upper plateaus. Let $a_{1},b_{1},c_{1},d_{1}$ are the model parameters for a test group and $a_{2},b_{2},c_{2},d_{2}$ are the model parameters for a standard group.

Considering a test group and a standard group, if $a, b, d$ are equivalent i.e., $a_{1}=a_{2}$ , $b_{1}=b_{2}$ , $d_{1}=d_{2}$ , then the horizontal distance between the response curves on the $\log X$ scale is a constant $c_{1}-c_{2}$ . Hence the relative potency of the test sample compared to the standard may be estimated as $e^{c_{1}-c_{2}}$ . If the two response curves are not parallel, then the relative potency changes depending on the level of response, since the distance between the two curves is not constant. Thus, it is standard practice to statistically test the assumption of parallelism prior to estimating the relative potency. In this paper, we are interested in accessing the equivalence involving three pairs of parameters, and suggest a simple method of testing the hypothesis based on the intersection-union principle (IUT) (see Berger, 1982; Berger & Hsu, 1996; Casella & Berger, 1990).

2.2 D-optimal design for four-parameter logistic model

In the experiments, the goal of optimal design is to make the variances of interesting parameters and prediction as small as possible. To minimize the variances, the levels of doses and the distribution of subjects over the doses must be decided, depending on how many subjects are available and the range of dose levels.

Suppose the design space is denoted by $\chi$ . Let $\mathcal{H}$ be the class of probability optimal designs on Borel sets of $\chi$ (Kiefer, 1974), then a design ${\xi}=\begin{pmatrix}x_{1},x_{2},\ldots x_{s}\\ p_{1},p_{2},\ldots p_{s}\end{pmatrix}\in\mathcal{H}$ contains dose levels $x_{1},x_{2},\ldots x_{s}$ and corresponding weights $p_{1},p_{2},\ldots p_{s}$ , where ${s}\geqslant{4}$ , $p_{t}>0$ and $\sum_{t=1}^{s}p_{t}=1$ . When the total number of doses in the experiment is $N$ , the number of replication for dose $x_{{t}}$ is $n_{t}$ , which is the nearest integer of $p_{t}*N,t=1,2,\ldots,s$ .

The Fisher information matrix for the 4PL model is

$\displaystyle{\rm I}(\theta,\upxi)=-E\left[{\displaystyle\frac{\partial^{2}}{% \partial\theta\partial\theta^{\rm T}}}\log L(\theta,\upxi)\right],$ (2)

where the function $L\left({\theta},{\upxi}\right)$ is the likelihood function for the data. The information matrix defined above is very important in the traditional optimality criteria. This matrix not only depends on the design matrix ${\upxi}$ but also the unknown parameters ${\theta}$ . The design matrix ${\upxi}$ is usually chosen to optimize certain function of the Fisher information matrix.

The D-optimality criterion is to choose a design maximizing the information on ${\theta}$ by minimizing the generalized variance of its estimate. Denote $\hat{{\theta}}$ by the maximum likelihood estimate of ${\theta}$ , then the asymptotic variance matrix of $\hat{{\theta}}$ is the inverse of the Fisher information matrix $I({\theta},{\upxi})$ . The commonly used D-optimality criterion, which minimizes the generalized variance of $\hat{{\theta}}$ , is equivalent to maximize the determinant of the Fisher information matrix, i.e.,

$\displaystyle\upxi^{*}=\max\limits_{\upxi\in\mathcal{H}}\left|{\displaystyle% \frac{I(\upxi,\theta)}{N}}\right|,$ (3)

where $\mathcal{H}$ is the set of all possible designs. Since $N$ is the number of doses in the experiment, the D-optimality criterion is just to maximize the determinant of the Fisher information matrix. Using such an idea, ${{\xi}}^{*}$ is the true D-optimal design.

A state-of-the art algorithm (YBT algorithm) was proposed to find locally optimal designs for a single objective and showed that it outperformed to other current algorithms. Starting from a randomly selected initial design, the YBT algorithm selects the dose that maximizes the sensitivity function and adds to the previously selected designs. At the same time, their optimal weights are obtained directly using the Newton-Raphson method (Quinn, 2016). However, the problem in YBT is that if the selected initial design points far from the optimal design points, then the YBT requires a lot more time to converge to an optimal design and sometimes it failed to do so. In this paper, the modified YBT algorithm is employed to obtain the D-optimal designs (See Hyun et al., 2018). The procedure was modified by selecting better starting design points via the V-algorithm, and this improved the search speed to obtain the optimal designs (S.W. Hyun, and W.K. Wong. Yang, 2013). The modified algorithm performs greatly to obtain all the optimal designs in this paper.

2.3 Intersection-union test

In the context of the four-parameter logistic curve, establishing practical equivalence enables the assessment of relative potency via the parameters $c_{1}$ and $c_{2}$ . We test whether the lower and upper plateaus a and d, and the slope factor b, are equivalent for the test and control group. Thus, stating the hypotheses in terms of unions and intersections, we are interested in testing

$\displaystyle H_{0}:\begin{aligned} &\displaystyle a_{1}-D_{L}a_{2}\leqslant 0% ∼{}∼{}\text{or}∼{}∼{}a_{1}-D_{U}a_{2}\geqslant 0∼{}∼{}\text{or}\\ &\displaystyle b_{1}-D_{L}b_{2}\leqslant 0∼{}∼{}\text{or}∼{}∼{}b_{1}-D_{U}b_{2% }\geqslant 0∼{}∼{}\text{or}\\ &\displaystyle d_{1}-D_{L}d_{2}\leqslant 0∼{}∼{}\text{or}∼{}∼{}d_{1}-D_{U}d_{2% }\geqslant 0\end{aligned}$ (4)

versus

$\displaystyle H_{0}:\begin{aligned} &\displaystyle a_{1}-D_{L}a_{2}>0∼{}∼{}% \text{and}∼{}∼{}a_{1}-D_{U}a_{2}<0∼{}∼{}\text{and}\\ &\displaystyle b_{1}-D_{L}b_{2}>0∼{}∼{}\text{and}∼{}∼{}b_{1}-D_{U}b_{2}<0∼{}∼{% }\text{and}\\ &\displaystyle d_{1}-D_{L}d_{2}>0∼{}∼{}\text{and}∼{}∼{}d_{1}-D_{U}d_{2}<0\end{aligned}$ (5)

where $D_{L}$ represent lower limits and $D_{U}$ represents upper limits of the ratios of upper and lower plateaus, and the ratio of slopes. The limits of the ratios of upper plateaus, lower plateaus, and slopes are equivalent (Yang et al., 2012).

According to IUT theory (Berger, 1982; Casella & Berger, 1990; and Berger & Hsu, 1996), this method is useful when the null hypothesis is expressed as a union, and the alternative hypothesis is expressed as an intersection. The null hypothesis is true if any of component parts are true since it is a union, and if any component of the intersection in the alternative is false, the hypothesis is false. Thus, by constructing the test so that each component of $H_{0}$ is tested separately at level $\alpha$ , and $H_{0}$ is rejected only if all the components tests are reject, the IUT will have at most level $\alpha$ without requiring any multiplicity adjustment.

2.4 Modified D-optimal design for the IUT test

Let ${{\xi}}_{T}$ and ${{\xi}}_{C}$ denote the designs for test group (T) and reference standard group (C). It works best for estimating model parameters for test and control lines respectively, so combing both designs and using them for testing parallelism could be useful.

$\displaystyle\xi_{T}=\begin{pmatrix}x_{{\rm T1}},x_{{\rm T2}},\ldots x_{\rm Ts% }\\ p_{{\rm T1}},p_{{\rm T2}},\ldots p_{\rm Ts}\end{pmatrix}$ (6) $\displaystyle\xi_{T}=\begin{pmatrix}x_{{\rm C1}},x_{{\rm C2}},\ldots x_{\rm Cs% }\\ p_{{\rm C1}},p_{{\rm C2}},\ldots p_{\rm Cs}\end{pmatrix}$ (7)

Propose ${{\xi}}_{\textit{IUT}}={{\alpha}*{\xi}}_{T}+\left(1-\alpha\right)*{{\xi}}_{C}$ ,where ${\alpha}$ is relative importance of ${{\xi}}_{D(T)}∼{}\text{to}∼{}{{\xi}}_{D(C)}$ . We assume that both designs are equally important, so set $\alpha=$ 0.5. The modified D-optimal design ${{\xi}}_{\textit{IUT}}$ is obtained by

$\displaystyle\xi_{\textit{IUT}}=\begin{pmatrix}x_{{T1}},&x_{T2},&\ldots,&x_{Ts% },&x_{C1},&x_{{C2}},&\ldots,&x_{Cs}\\ {\alpha}p_{T1},&\alpha p_{T2},&\ldots,&\alpha p_{Ts},&(1-\alpha)p_{T1},&(1-% \alpha)p_{T2},&\ldots,&(1-\alpha)p_{Ts}\end{pmatrix},$ (8)

If some design points between ${{\xi}}_{T}$ and ${{\xi}}_{C}$ are overlapped, put only one design point with their weight sum (i.e., if $x_{{Ti}}=x_{{Cj}}$ , them use only $x_{{Ti}}$ with corresponding weight ${\alpha}p_{Ti}+(1-{\alpha}{)}p_{Cj}$ ).

3. Test for parallelism and efficient design

We statistically test the assumption of parallelism prior to estimating the relative potency, since the potency of the test sample compared to the standard is defined simply in terms of the parameter $c_{i}$ if the response curves for the two preparations are parallel.

$\displaystyle H_{0}:\begin{aligned} &\displaystyle a_{1}-D_{L}a_{2}\leqslant{0% }∼{}∼{}\text{or}∼{}∼{}a_{1}-D_{U}a_{2}\geqslant{0}∼{}∼{}\text{or}\\ &\displaystyle b_{1}-D_{L}b_{2}\leqslant{0}∼{}∼{}\text{or}∼{}∼{}b_{1}-D_{U}b_{% 2}\geqslant{0}∼{}∼{}\text{or}\\ &\displaystyle d_{1}-D_{L}d_{2}\leqslant{0}∼{}∼{}\text{or}∼{}∼{}d_{1}-D_{U}d_{% 2}\geqslant 0\end{aligned}$ (9)

versus

$\displaystyle H_{1}:\begin{aligned} &\displaystyle a_{1}-D_{L}a_{2}>0∼{}∼{}% \text{and}∼{}∼{}a_{1}-D_{U}a_{2}<0∼{}∼{}\text{and}\\ &\displaystyle b_{1}-D_{L}b_{2}>0∼{}∼{}\text{and}∼{}∼{}b_{1}-D_{U}b_{2}<0∼{}∼{% }\text{and}\\ &\displaystyle d_{1}-D_{L}d_{2}>0∼{}∼{}\text{and}∼{}∼{}d_{1}-D_{U}d_{2}<0\end{aligned}$ (10)

$D_{L}$ , $D_{U}$ are the equivalence limits of the ratios of upper plateaus, lower plateaus, and slopes, and the choice of the two numbers should be carefully considered, based on historical data and information from scientists at the assay laboratory, etc. For example, the FDA Division of Bioequivalence (FDA 1992) uses $D_{L}=0.8$ and $D_{U}=1.25$ for bioequivalence hypotheses about ratios.

For $H_{01}$ : $a_{1}-D_{L}a_{2}\leqslant 0$ versus $H_{11}$ : $a_{1}-D_{L}a_{2}>0$ , using a statistic of the form

$\displaystyle T_{1}=\frac{{\hat{a}}_{1}-D_{L}{\hat{a}}_{2}}{\text{s.e.}({\hat{% a}}_{1}-D_{L}{\hat{a}}_{2})}$ (11) $\displaystyle\text{s.e.}\left({\hat{a}}_{1}-D_{L}{\hat{a}}_{2}\right)=\sqrt{% \text{var}\left({\hat{a}}_{1}\right)+D^{2}_{L}\text{var}\left({\hat{a}}_{2}% \right)}$ (12)

Reject $H_{01}$ : $a_{1}-D_{L}a_{2}\leqslant 0$ if $T_{1}>t_{N-8,\propto}$ , where $N$ is total sample size, and $\alpha$ is type I error rate or significance level. Similarly, for $H_{02}$ : $a_{1}-D_{U}a_{2}\geqslant 0$ versus $H_{12}$ : $a_{1}-{D_{U}}a_{2}<0$ , using $T_{2}=\frac{{\hat{a}}_{1}-D_{U}{\hat{a}}_{2}}{\text{s.e.}({\hat{a}}_{1}-D_{U}{% \hat{a}}_{2})}$ , Reject $H_{{02}}$ if $T_{2}<-t_{N-8,\propto}$ .

The remaining four tests proceed in analogous fashion. Reject $H_{0}$ only if all six one side approximate $t$ -test reject $H_{01},\ldots,H_{06}$ at the same $\alpha$ level (Berger, 1982; Berger & Hsu, 1996; Casella & Berger, 1990).

Examples were given by Dykstra (1971) to illustrate the difference between classical design with D-optimal design. One experiment is ten-dose bioassay whose dose-response curves are parallelism, with 3 reps, 10 doses and $\log_{10}(dose)\subset[1.1,3.8]$ , and it has a response $Y_{{1jk}}$ , $j=1,2,\ldots,10$ , $k=1,2,3$ . Another experiment is toxicity assay whose dose-response curves are non-parallelism, with 4 reps, 12 doses, and $\log_{10}(dose)\subset[3.5,9]$ , and it has a response $Y_{{2jk}}$ , $j=1,2,\ldots,12,k=1,2,3,4$ .

In the ten-dose bioassay, the parameters from Jonkman and Sidik (2009) are $\sigma_{1}^{2}=\sigma_{2}^{2}=0.04^{2}$ , $\epsilon_{1}\sim N(0,{0.04}^{2})$ , $\epsilon_{2}\sim N(0,0.04^{2})$ , $a_{1}=2.02$ , $b_{1}=-1.42$ , $c_{1}=2.31$ , $d_{1}=10.12$ ; $a_{2}=2.04$ , $b_{2}=-1.35$ , $c_{2}=2.59$ , $d_{2}=9.86$ .

This is the classical design, in which 10 doses are equally spaced with equal replication. $X_{{1j}}$ is logarithm of dose and $n_{t}$ is the number of replicated responses. Then, obtained simulated data by the 4PL model with the known parameter values to create the Fig. 1. For the test line, use $X_{{1j}},j=1,2,\ldots,10$ to generate $Y_{1jk}$ based on

$\displaystyle Y_{1jk}=2.02+{\displaystyle\frac{{10.12-2.02}}{1+\exp[-1.42(2.31% -X_{{1j}})]}}+{\epsilon}_{1jk}.$ (13)

Table 1

Designs for parallel and non-parallel examples

Classical design for parallel example
$X_{{1j}}$	1.1	1.4	1.7	2.0	2.3	2.6	2.9	3.2	3.5	3.8
$n_{t}$	3	3	3	3	3	3	3	3	3	3
D-optimal design for parallel example
$X$	Test line	1	1.88	1.89	2.95	4
$P_{T}$		0.25	0.08	0.17	0.25	0.25
$X$	Control line	1	1.99	3.08	4
$P_{C}$		0.25	0.25	0.25	0.25
$X$	Modified D-optimal	1	1.88	1.89	1.99	2.95	3.08	4
$P_{M}$		0.25	0.04	0.085	0.125	0.125	0.125	0.25
$n_{M}$		7	1	3	4	4	3	8
Classical design for non-parallelism example from Ding and Bailey (2003)
$X_{{2j}}$	3.5	4.0	4.5	5.0	5.5	6.0	6.5	7.0	7.5	8	8.5	9.0
$n_{t}$ 3	3	3	3	3	3	3	3	3	3	3	3
D-optimal design for non-parallel example
$X$	Test line	3.5	5.35	7.25	9
$P_{T}$		0.25	0.25	0.25	0.25
$X$	Control line	3.5	5.56	7.46	9
$P_{C}$		0.25	0.25	0.25	0.25
$X$	Modified D-optimal	3.5	5.35	5.56	7.25	7.46	9
$P_{M}$		0.25	0.125	0.125	0.125	0.125	0.25
$n_{M}$		9	4	5	5	4	9

Figure 1.

The simulated data and fitted curves for ten-dose bioassay (parallel).

For the standard line, also use $X_{{1j}}$ to get $Y_{1jk}^{\prime}$ based on the formula

$\displaystyle{Y}_{1jk}^{\prime}=2.04+\frac{9.86-2.04}{1+\exp[-1.35(2.59-X_{1j}% )]}+{\epsilon}_{2jk}.$ (14)

Set $\alpha$ be equal to 0.05 in this paper.

We used the nonlinear least-squares estimates method to get the estimated parameters based on the simulated data. In least square regression, a regression model was established, in which the sum of the squares of the vertical distances of different points from the regression curve is minimized. Estimated parameters are as follows: ${\hat{a}}_{1}=2.08$ , ${\hat{b}}_{1}=-1.45$ , $\hat{d}_{1}=10.04$ , $\hat{a}_{2}=2.06$ , $\hat{b}_{2}=-1.34$ , $\hat{d}_{2}=10.15$ , $var\left({\hat{a}}_{1}\right)={{0.09}}^{2}$ , $var({\hat{b}}_{1})={0.05}^{2}$ , $var({\hat{d}}_{1})={0.11}^{2}$ , $var(\hat{a}_{2})={0.11}^{2}$ , $var({\hat{b}}_{2})={{0.04}}^{2}$ , $var(\hat{d}_{2})={0.09}^{2}$ .

Test the parallelism using this simulated data. For parameter $a$ , make an inference as follow: $H_{01}$ : $a_{1}-D_{L}a_{2}\leqslant 0$ versus $H_{11}$ : $a_{1}-D_{L}a_{2}>0$ . $T_{1}=3.58$ , reject $H_{01}$ since $T_{1}>t_{58,0.05}=1.67$ . $H_{02}$ : $a_{1}-D_{U}a_{2}\geqslant 0$ versus $H_{12}$ : $a_{1}-D_{U}a_{2}<0$ . $T_{2}=-2.94$ , reject $H_{02}$ since $T_{2}<-t_{58,0.05}=-1.67$ .

For parameter ${b}$ , make an inference as follow: $H_{01}$ : $b_{1}-D_{L}b_{2}\leqslant 0$ versus $H_{11}$ : $b_{1}-D_{L}b_{2}>0$ . Reject $H_{01}$ since $T_{1}=6.32>t_{58,0.05}$ . $H_{02}$ : $b_{1}-D_{U}b_{2}\geqslant 0$ versus $H_{{12}}$ : $b_{1}-D_{U}b_{2}<0$ . Reject $H_{{02}}$ since $T_{2}=-3.10<-t_{{58,0.05}}$ .

For parameter ${d}$ , make an inference as follow: $H_{01}$ : $d_{1}-D_{L}d_{2}\leqslant 0$ versus $H_{{11}}$ : $d_{1}-D_{L}d_{2}>0$ . Reject $H_{01}$ since $T_{1}=14.34>t_{58,0.05}$ . $H_{02}$ : $d_{1}-D_{U}d_{2}\geqslant 0$ versus $H_{12}$ : $d_{1}-D_{U}d_{2}<0$ . Reject $H_{{02}}$ since $T_{2}=-16.77<-t_{58,0.05}$ . Since all six one side approximate $t$ -test reject $H_{01},H_{02},\ldots,H_{06}$ at the 0.05 level, the dose-response curves are parallel since all the null hypothesis components are rejected.

Then D-optimal design was motivated for testing parallelism. In Table 1, the first two lines show the D-optimal design for the test line, the second two line show the D-optimal design for the control line, and the last two lines show the modified D-optimal design for testing the parallelism that can be used for fitting both test and control lines. In each design, the first row, ${X}$ represents the dose level in a logarithmic scale and the second row, ${P}$ represents the proportion of subjects at each selected dose level. The proportions of the modified D-optimal design were obtained based on the formula $P_{M}=0.5*P_{T}+(1-0.5)*P_{C}$ , where $P_{T}$ and $P_{C}$ are the proportions of subjects in the two different lines, and the number of replicated responses of modified D-optimal design is $n_{M}$ , which is the nearest integer of ${P}_{M}*30$ , where 30 is the number of subjects that are used in test line and control line, respectively. For example, the modified D-optimal design assigns seven replications at the first dose level of 1, one replication at the second dose level of 1.88, and three replications at the third dose level 1.89, and so on.

For the test line, use the modified D-optimal design ${\xi}=(X,n_{t})$ to generate the response $Y_{1jk}$ . For the control line, also use ${\xi}=(X,n_{t})$ to get ${Y}_{1jk}^{\prime}$ . We also used nonlinear least-squares to obtain estimated parameters, which are as follows: ${\hat{a}}_{1}=1.98$ , $\hat{b}_{1}=-1.41$ , $\hat{d}_{1}=10.21$ , $\hat{a}_{2}=2.06$ , $\hat{b}_{2}=-1.36$ , $\hat{d}_{2}=9.82$ , $var(\hat{a}_{1})={0.03}^{2}$ , $var({\hat{b}}_{1})={0.03}^{2}$ , $var({\hat{d}}_{1})={{0.05}}^{2}$ , $var(\hat{a}_{2})={0.09}^{2}$ , $var({\hat{b}}_{2})={{0.04}}^{2}$ , $var(\hat{d}_{2})={{0.08}}^{2}$ .

Figure 2.

The simulated data and fitted curves for toxicity (non-parallel).

Test the parallelism using this simulated data. For parameter ${\alpha}$ , make an inference as follow: $H_{01}$ : $a_{1}-D_{L}a_{2}\leqslant 0$ versus $H_{11}$ : $a_{1}-D_{L}a_{2}>0$ . $T_{1}=4.11$ , reject $H_{{01}}$ since $T_{1}>t_{{58,0.05}}=1.67$ . $H_{02}$ : $a_{1}-D_{U}a_{2}\geqslant 0$ versus $H_{{12}}$ : $a_{1}-D_{U}a_{2}<0$ . $T_{2}=-5.02$ , reject $H_{{02}}$ since $T_{2}<-t_{58,0.05}=-1.67$ .

For parameter ${b}$ , make an inference as follow: $H_{{01}}$ : $b_{1}-D_{L}b_{2}\leqslant 0$ versus $H_{{11}}$ : $b_{1}-D_{L}b_{2}>0$ . Reject $H_{{01}}$ since $T_{1}=7.60>t_{{58,0.05}}$ . $H_{{02}}$ : $b_{1}-D_{U}b_{2}\geqslant 0$ versus $H_{{12}}$ : $b_{1}-D_{U}b_{2}<0$ . Reject $H_{{02}}$ since $T_{2}=-4.87<-t_{{58,0.05}}$ .

For parameter ${d}$ , make an inference as follow: $H_{{01}}$ : $d_{1}-D_{L}d_{2}\leqslant 0$ versus $H_{{11}}$ : $d_{1}-D_{L}d_{2}>0$ . Reject $H_{{01}}$ since $T_{1}=29.05>t_{{58,0.05}}$ . $H_{{02}}$ : $d_{1}-D_{U}d_{2}\geqslant 0$ versus $H_{{12}}$ : $d_{1}-D_{U}d_{2}<0$ . Reject $H_{{02}}$ since $T_{2}=-18.59<-t_{{58,0.05}}$ . So the dose-response curves are parallel based on the fact that all the null hypothesis components are rejected. In the toxicity assay, simulated data in terms of the formula above, and draw Fig. 2. These parameters also from Jonkman and Sidik (2009) are ${\sigma}^{2}_{1}=\sigma^{2}_{2}={1.6}^{2}$ , $\epsilon_{1}\sim N(0,1.6^{2})$ , $\epsilon_{2}\sim N(0,1.6^{2})$ , $a_{1}=16.44$ , $b_{1}=0.83$ , $c_{1}=6.35$ , $d_{1}=85.19$ ; $a_{2}=13.61$ , $b_{2}=0.82$ , $c_{2}=6.83$ , $d_{2}=93.01$ .

Classical design was applied to test parallelism, in which 12 doses are equally spaced with equal replication. $X_{{2j}}$ is logarithm of dose and $n_{t}$ is the number of dose replication (See Ding & Bailey, 2003). For the test line, use $X_{{2j}}$ to generate $Y_{2jk}$ based on

$\displaystyle Y_{2jk}=16.44+\frac{85.19-16.44}{1+\exp[0.83(6.35-X_{{2j}})]}+% \epsilon_{1jk}.$ (15)

For the standard line, also use $X_{2j}$ to obtain response ${Y}_{2jk}^{\prime}$ based on the formula

$\displaystyle{Y}_{2jk}^{\prime}=13.61+\frac{93.01-13.61}{1+\exp[0.82(6.83-X_{2% j})]}+\epsilon_{2jk}.$ (16)

Draw the simulated data to create the Fig. 2.

For this classical design, we also used nonlinear least-squares to estimate the parameters based on simulated data, which are shown as follows: ${\hat{a}}_{1}=17.51$ , ${\hat{b}}_{1}=0.85$ , ${\hat{d}}_{1}=84.17$ , ${\hat{a}}_{2}=12.96$ , ${\hat{b}}_{2}{=0.81}$ , ${\hat{d}}_{2}=92.71$ , $var({\hat{a}}_{1})={{1.60}}^{2}$ , $var({\hat{b}}_{1})={{0.06}}^{2}$ , $var({\hat{d}}_{1})={{1.75}}^{2}$ , ${var{(}\hat{a}}_{2}{)=}{{2.04}}^{2}$ , $var({\hat{b}}_{2})={{0.08}}^{2}$ , ${var{(}\hat{d}}_{2})={{3.58}}^{2}$ .

Then, the parallelism was also tested with classical design by using these new simulated data. For parameter ${\alpha}$ , make an inference as follow: $H_{01}$ : $a_{1}-D_{L}a_{2}\leqslant 0$ versus $H_{11}$ : $a_{1}-D_{L}a_{2}>0$ . $T_{1}=3.12$ , reject $H_{{01}}$ since $T_{1}>t_{{64,0.05}}=1.67$ . $H_{02}$ : $a_{1}-D_{U}a_{2}\geqslant 0$ versus $H_{{12}}$ : $a_{1}-D_{U}a_{2}<0$ . $T_{2}=0.44$ , fail to reject $H_{{02}}$ since $T_{2}>-t_{{64,0.05}}=-1.67$ . For parameter ${b}$ , make an inference as follow: $H_{01}$ : $b_{1}-D_{L}b_{2}\leqslant 0$ versus $H_{11}$ : $b_{1}-D_{L}b_{2}>0$ . Since $T_{1}=2.29>t_{{64,0.05}}$ , reject $H_{{01}}$ . $H_{02}$ : $b_{1}-D_{U}b_{2}\geqslant 0$ versus $H_{12}$ : $b_{1}-D_{U}b_{2}<0$ . Fail to reject $H_{{02}}$ since $T_{2}=-1.38>-t_{{64,0.05}}$ . For parameter ${d}$ , make an inference as follow: $H_{01}$ : $d_{1}-D_{L}d_{2}\leqslant 0$ versus $H_{{11}}$ : $d_{1}-D_{L}d_{2}>0$ . Reject $H_{{01}}$ since $T_{1}=2.98>t_{{64,0.05}}$ . $H_{02}$ : $d_{1}-D_{U}d_{2}\geqslant 0$ versus $H_{12}$ : $d_{1}-D_{U}d_{2}<0$ . Reject $H_{{02}}$ since $T_{2}=-6.61<-t_{{64,0.05}}$ . The dose-response curves are non-parallel because not all null hypothesizes are rejected.

We got the D-optimal design for the two lines and the modified D-optimal design in Table 1. In this example, total of 36 responses are used to fit both lines, so the number of replicated responses of modified D-optimal design $n_{M}$ is obtained by the nearest integer of ${P}_{M}*36$ . From Table 1, we found modified D-optimal design has 6 selected dose levels and the number of replications is varied at each dose levels.

Then, D-optimal design was applied to test parallelism. For the test line, use ${X}$ to generate $Y_{2jk}$ . For the standard line, also use ${X}$ to obtain response ${Y}_{2jk}^{\prime}$ . Estimated parameters were obtained, which are as follows: $\hat{a}_{1}=14.34$ , $\hat{b}_{1}=0.76$ , $\hat{d}_{1}=87.05$ , $\hat{a}_{2}=12.86$ , $\hat{b}_{2}=0.76$ , $\hat{d}_{2}=95.24$ , $var(\hat{a}_{1})={{1.45}}^{2}$ , $var(\hat{b}_{1})={0.05}^{2}$ , $var({\hat{d}}_{1})={{1.63}}^{2}$ , $var(\hat{a}_{2})={1.06}^{2}$ , $var({\hat{b}}_{2})={0.04}^{2}$ , $var(\hat{d}_{2})={{2.01}}^{2}$ .

Test the parallelism using this simulated data. For parameter ${\alpha}$ , make an inference as follow: $H_{{01}}$ : $a_{1}-D_{L}a_{2}\leqslant 0$ versus $H_{{11}}$ : $a_{1}-D_{L}a_{2}>0$ . $T_{1}=2.41$ , reject $H_{{01}}$ since $T_{1}>t_{{58,0.05}}=1.67$ . $H_{{02}}$ : $a_{1}-D_{U}a_{2}\geqslant 0$ versus $H_{{12}}$ : $a_{1}-D_{U}a_{2}<0$ . $T_{2}=-0.88$ , fail to reject $H_{{02}}$ since $T_{2}>-t_{{58,0.05}}=-1.67$ . For parameter ${b}$ , make an inference as follow: $H_{{01}}$ : $b_{1}-D_{L}b_{2}\leqslant 0$ versus $H_{{11}}$ : $b_{1}-D_{L}b_{2}>0$ . Reject $H_{{01}}$ since $T_{1}=2.58>t_{{58,0.05}}$ . $H_{{02}}$ : $b_{1}-D_{U}b_{2}\geqslant 0$ versus $H_{{12}}$ : $b_{1}-D_{U}b_{2}<0$ . Reject $H_{{02}}$ since $T_{2}=-2.65<-t_{{58,0.05}}$ . For parameter ${d}$ , make an inference as follow: $H_{{01}}$ : $d_{1}-D_{L}d_{2}\leqslant 0$ versus $H_{{11}}$ : $d_{1}-D_{L}d_{2}>0$ . Reject $H_{{01}}$ since $T_{1}=4.74>t_{{58,0.05}}$ . $H_{{02}}$ : $d_{1}-D_{U}d_{2}\geqslant 0$ versus $H_{{12}}$ : $d_{1}-D_{U}d_{2}<0$ . Reject $H_{{02}}$ since $T_{2}=-10.69<-t_{{58,0.05}}$ . So the dose-response curves are non-parallel based on the fact that not all the null hypothesis components are rejected. The results are summarized in Table 2.

As seen from Table 2, based on the results of the proposed test, the toxicity assay with two designs both do not provide compelling evidence that the response profiles are in fact parallel. While comparing the $t$ value with critical value ( $t_{{58,0.05}}=1.67$ ), it appears that the slopes and the plateaus are practically equivalent, and thus that the response profiles may be considered parallel. What’s more, the fact that absolute $t$ value for D-optimal design is always larger and it shows that it benefits the experimenter more for improving the precision of the assay.

Table 2

$T$ value results for the two examples

Parameter	Estimate	SE	$T$ value for test using $D_{L}$	$T$ value for test using $D_{U}$
Ten-dose bioassay(parallel) with classical design
$a_{1}$	2.08	0.09	3.58	$-$ 2.94
$a_{2}$	2.06	0.11
$b_{1}$	$-$ 1.45	0.05	6.32	$-$ 3.10
$b_{2}$	$-$ 1.34	0.04
$d_{1}$	10.04	0.11	14.34	$-$ 16.77
$d_{2}$	10.15	0.09
Ten-dose bioassay (parallel) with modified D-optimal design
$a_{1}$	1.98	0.03	4.11	$-$ 5.02
$a_{2}$	2.06	0.09
$b_{1}$	$-$ 1.41	0.03	7.60	$-$ 4.87
$b_{2}$	$-$ 1.36	0.04
$d_{1}$	10.21	0.05	29.05	$-$ 18.59
$d_{2}$	9.82	0.08
Toxicity assay (non-parallel) with classical design
$a_{1}$	17.51	1.60	3.12	0.44
$a_{2}$	12.96	2.04
$b_{1}$	0.85	0.06	2.29	$-$ 1.38
$b_{2}$	0.81	0.08
$d_{1}$	84.17	1.75	2.98	$-$ 6.61
$d_{2}$	92.71	3.58
Toxicity assay (non-parallel) with modified D-optimal design
$a_{1}$	14.34	1.45	2.41	$-$ 0.88
$a_{2}$	12.86	1.06
$b_{1}$	0.76	0.05	2.58	$-$ 2.65
$b_{2}$	0.76	0.04
$d_{1}$	87.05	1.63	4.74	$-$ 10.69
$d_{2}$	95.24	2.01

In first example, inspection of Fig. 1 suggests that the response profiles are approximately parallel. Testing parallelism with the two design methods both indicated that response lines are parallel. In contrast to the preceding example, the plot does give some indication of non-parallel the second example response profiles, as the horizontal distance between the test observations and the standard observations appears to decrease somewhat as the response level increases. In order to validate the outperformance of the modified optimal design on the IUT test, we conduct simulation studies under several different scenarios.

4. Simulation study

Previous section shows some insight of the benefit of using D-optimal design for the IUT test. To assess more precise properties of the D-optimal design over the classical design method for testing the parallelism, we performed a simulation study based on the same simulation set-up in Jonkman and Sidik, 2009. The paper conduct simulation studies for testing parallelism under 5 different scenarios:

1. 1.
A case set $a_{1}=2.02$ , $b_{1}=-1.42$ , $c_{1}=2.31$ , $d_{1}=10.12$ for the test preparation, and $a_{2}=2.04,b_{2}=-1.35$ , $c_{2}=2.59$ , $d_{2}=9.86$ for the standard preparation. The curves for the test and standard preparations are approximately parallel: that is, a case in which all three ratios are within the $D_{L}$ and $D_{U}$ . For this, $a_{1},b_{1},c_{1},d_{1},a_{2},b_{2},c_{2},d_{2}$ are set equal to the parameter estimates from the example of Section 3.
2.
A case set $a_{1}=2.0,b_{1}=-1.4,c_{1}=2.3,d_{1}=10.0$ for the test preparation, and $a_{2}=2.0,b_{2}=-1.4$ , $c_{2}=2.6$ , $d_{2}=10.0$ for the standard preparation. The curves are exactly parallel, and the only difference is the potency. Values $a_{1},b_{1},c_{1},d_{1},a_{2},b_{2},c_{2},d_{2}$ are close to those from the example of Section 3.
3.
A case set $a_{1}{=1.6},b_{1}=-1.5,c_{1}=2.3,d_{1}=8.0$ for the test preparation, and $a_{2}=2.0$ , $b_{2}=-1.2$ , $c_{2}=2.6$ , $d_{2}=10.0$ for the standard preparation. The ratios for the plateaus and the slope are all on the boundary for the equivalence test: that is, $\frac{a_{1}}{a_{2}}=D_{L}=0.8$ , $\frac{b_{1}}{b_{2}}=D_{U}=1.25$ , and $\frac{d_{1}}{d_{2}}=D_{L}=0.8$ .
4.
A case set $a_{1}=1.5,b_{1}=-1.4,c_{1}=2.3,d_{1}=10.0$ for the test preparation, and $a_{2}=2.0$ , $b_{2}=-1.4$ , $c_{2}=2.6$ , $d_{2}=10.0$ for the standard preparation. The ratio of the lower plateaus is outside the equivalence limits for the IUT ( $\frac{a_{1}}{a_{2}}<D_{L}$ ), but the slopes and the upper plateaus are equal.
5.
A case set $a_{1}=2.0,b_{1}=-1.5,c_{1}=2.3,d_{1}=10.0$ for the test preparation, and $a_{2}=2.0,b_{2}=-1.16$ , $c_{2}=2.6$ , $d_{2}=10.0$ for the standard preparation. The ratio of the slopes is outside the equivalence limits for the IUT ( $\frac{b_{1}}{b_{2}}>D_{U}$ ), but both plateaus are equal.

At the first part, we considered the classical design in Table 1 to simulate the responses. Next, tested whether the dose-response curves are parallelism or not separately by IUT. Then we run this 10 000 times to calculate the power of detecting the parallelism via IUT. The investigation was conducted in the same way for the above five cases.

At the second part, repeated all the steps in the previous part using the modified D-optimal design instead of the standard design. The “VNM” package in R was performed to obtain multiple-objective optimal design. The MOPT function was used to maximize the optimality criterion and verify the optimality of the generated design using the General Equivalence Theorem (See Hyun, Wong and Yang). All the obtained D-optimal designs and the modified D-optimal design for each case are given in Tables 3 and 4.

Table 3
Modified D-optimal design for two lines

Modified D-optimal design for two lines in case 1

$X$ Control line 1 1.88 1.89 2.95 4

$P_{C}$ 0.25 0.08 0.17 0.25 0.25

$X$ Test line 1 1.99 3.08 4

$P_{T}$ 0.25 0.25 0.25 0.25

$X$ Modified D-optimal 1 1.88 1.89 1.99 2.95 3.08 4

$P_{M}$ 0.25 0.04 0.085 0.125 0.125 0.125 0.25

$n_{M}$ 7 1 3 4 4 3 8

Modified D-optimal design for two lines in case 2

$X$ Control line 1 1.88 2.95 4

$P_{C}$ 0.25 0.25 0.25 0.25

$X$ Test line 1 2.01 3.08 4

$P_{T}$ 0.25 0.25 0.25 0.25

$X$ Modified D-optimal 1 1.88 2.01 2.95 3.08 4

$P_{M}$ 0.25 0.125 0.125 0.125 0.125 0.25

$n_{M}$ 7 4 4 4 3 8

Table 4
Modified D-optimal design for two lines

Modified D-optimal design for two lines in case 3

$X$ Control line 1 1.88 2.93 4

$P_{C}$ 0.25 0.25 0.25 0.25

$X$ Test line 1 1.97 3.1 4

$P_{T}$ 0.25 0.25 0.25 0.25

$X$ Modified d-optimal 1 1.88 1.97 2.93 3.1 4

$P_{M}$ 0.25 0.125 0.125 0.125 0.125 0.25

$n_{M}$ 7 4 4 4 3 8

Modified D-optimal design for two lines in case 4

$X$ Control line 1 1.88 2.95 4

$P_{C}$ 0.25 0.25 0.25 0.25

$X$ Test line 1 2.01 3.08 4

$P_{T}$ 0.25 0.25 0.25 0.25

$X$ Modified d-optimal 1 1.88 2.01 2.95 3.08 4

$P_{M}$ 0.25 0.125 0.125 0.125 0.125 0.25

$n_{M}$ 7 4 4 4 3 8

Modified D-optimal design for two lines in case 5

$X$ Control line 1 1.88 2.93 4

$P_{C}$ 0.25 0.25 0.25 0.25

$X$ Test line 1 1.96 3.1 4

$P_{T}$ 0.25 0.25 0.25 0.25

$X$ Modified d-optimal 1 1.88 1.96 2.93 3.1 4

$P_{M}$ 0.25 0.125 0.125 0.125 0.125 0.25

$n_{M}$ 7 4 4 4 3 8

The results of the simulations are shown in Table 5. For each test and each simulation case, the table value is power of detecting parallelism, which is the proportion of times among the 10,000 replicates that each test resulted in a declaration of parallelism. In case 1, the response curves are not exactly parallel, but they are well within the equivalence limits, so we argue that they are approximately parallel. In this case, when the standard deviation of the two preparations was high, the classical design declared parallelism relatively rarely (16.89%), while the D-optimal design with the same test method, declared parallelism a majority of the time (85.14%). Also, it’s true that the null hypotheses are easily rejected when the standard deviation is small. That is simulated data were closely distributed around the mean value and didn’t change a lot. However, we could note that for a given value of ${sigma}$ , the proportion of declaring parallelism of D-optimal design was always greater than that of classical design. So, whatever the sigma is high or low, it always showed that the D-optimal design provides the preferred inference.

Table 5
The power for D-optimal design and classical design

${\sigma}$ 0.04 0.05 0.06 0.07

Case 1: Parallel Classical 0.8554 0.6069 0.3476 0.1689

D-optimal 0.9996 0.9910 0.9481 0.8514

Case 2: Exactly parallel Classical 0.8762 0.6462 0.3913 0.2053

D-optimal 0.9998 0.9928 0.9548 0.8814

Case 3: Boundary values Classical 0 0 0 0

D-optimal 0 0 0 0

Case 4: Unequal lower plateaus Classical 0.0060 0.0045 0.0014 0.0006

D-optimal 0.0008 0.0019 0.0047 0.0036

Case 5: Unequal slopes Classical 0.0006 0 0 0

D-optimal 0.0047 0.0050 0.0029 0.0003

In case 2, the response curves are exactly parallel, and only the potency differs. In this case, the test with D-optimal design declared parallelism between 88.14% and 99.98% of the time. Because the null hypothesis for the test is true in this case, this means that the test maintained a rejection rate in a high level. For the both design approaches, the percentage of declaring parallelism increased as the standard deviation decreased. However, for standard design, the proportion of rejecting hypothesizes decreased more greatly (from 87.62% to 20.53%). Overall, the results from cases 1 and 2 suggest that tests to establish parallelism with classical design may be more sensitive to the standard deviation of the data than the tests with D-optimal design. Even for small ${\sigma}$ , still D-optimal design performs much better than classical design.

In case 3, the ratios of the slopes and the lower and upper plateaus were all set on the boundary of the equivalence limits. For this case, the tests with D-optimal design never declared parallelism for any of the 10,000 simulation replicates, regardless of the value of standard deviation. Similarly, the tests with classical design almost never declared parallelism in this case. Since the null hypothesis is true in this case, this represents an empirical type I error rate of zero. The results confirm that changing design approach would not affect the rejection ratio when the dose-response curves are not parallelism.

In case 4, the ratio of the lower plateaus was set just outside the equivalence limits ( $\frac{a_{1}}{a_{2}}=0.75$ ), while the slopes and the upper plateaus were equal. Similarly, in case 5 the ratio of the slopes was outside the equivalence limits ( $\frac{b_{1}}{b_{2}}=1.29$ ), but the upper and lower plateaus were equal. For both these cases, the both tests seldom declared parallelism regardless of the value of standard deviation, with a maximum rate of only 0.45% for classical design and 0.47% for D-optimal design. Since null hypothesis is true in this case, the proportions are less than type I error rates (0.05).

Overall, the simulation results indicate that the IUT with the modified D-optimal design for equivalence works better in all cases than with the classical design. The modified D-optimal design appears to be clearly more effective in the cases illustrated by the examples: precise assays where the true response curves are approximately or exactly parallel, and assays where the standard deviation is relatively high. It is more likely that the true curves will be only approximately parallel with high standard deviation in practical situations, and thus it will be much more efficient to use the modified D-optimal design for testing parallelism in practice.
5. Summary

${\sigma}$		0.04	0.05	0.06	0.07
Case 1:	Parallel	Classical	0.8554	0.6069	0.3476	0.1689
		D-optimal	0.9996	0.9910	0.9481	0.8514
Case 2:	Exactly parallel	Classical	0.8762	0.6462	0.3913	0.2053
		D-optimal	0.9998	0.9928	0.9548	0.8814
Case 3:	Boundary values	Classical	0	0	0	0
		D-optimal	0	0	0	0
Case 4:	Unequal lower plateaus	Classical	0.0060	0.0045	0.0014	0.0006
		D-optimal	0.0008	0.0019	0.0047	0.0036
Case 5:	Unequal slopes	Classical	0.0006	0	0	0
		D-optimal	0.0047	0.0050	0.0029	0.0003

In this article, the problem of testing the parallelism of the response curves for a test preparation and a standard preparation using the 4PL model was discussed. we argued that the problem may be more accurately detected by IUT with implementing the D-optimal design. This result is obtained through comparing the ratio of rejecting null hypothesis, which can be readily constructed using the output from standard nonlinear regression software. The IUT was easily implemented by using a sequence of one-sided approximate $t$ -tests, and rejecting the null hypothesis established evidence in favor of practically parallel response curves.

D-optimal design minimizes the variance of parameter estimated, so it appeared to be clearly more effective. The simulation results suggested that the modified D-optimal design indeed provides precise results of testing parallelism in the situations that the true response curves are indeed exactly or approximately parallel. The classical design does not help very much for reducing the variance of estimating the parameters while the D-optimal design does. The classical design becomes more problematic when there is large variance in the dataset. In the simulation cases where the true response curves were clearly nonparallel, both designs failed to support parallelism the vast majority of the time. As we have noted before, approximately parallel lines with high standard deviation are more likely to be tested in practice and this study suggests using that the modified D-optimal design can be employed for testing parallelism in real bioassay.

A potential question is that we just used one test method to compare the D-optimal design and the classical design. Thus, further study is recommended to contrast the two designs in other testing methods, such F test or chi-square test.

References

Atkinson

A. C.

, & Donev

A. N.

(1992). Optimum experimental designs. Oxford University Press, Oxford.

Berger

R. L.

(1982). Multiparameter hypothesis testing and acceptance sampling. Technometrics. 24, 295-300.

Berger

R. L.

, & Hsu

J. C.

(1996). Bioequivalence trials, intersection-union tests, and equivalence confidence sets. Statist Sci, 11, 283-319.

Callahan

J. D.

, & Sajjadi

N. C.

(2003). Testing the null hypothesis for a specified difference the right way to test for parallelism. BioProcessing J. 71-77.

Casella

, & Berger

R. L.

(1990). Statistical inference. Belmont, CA: Duxbury Press.

Chernoff

(1953). Locally optimal design for estimating parameters. Ann Math Statist. 24, 586-602.

Dykstra

(1971). The augmentation of experimental data to maximize

|X^{\prime}X|

, Technometrics, 13, 682488.

Fedorov

V. V.

(1972). Theory of optimal experiments. Academic Press, New York.

Finney

D. J.

(1976). Radioligand assay. Biometrics, 32, 721-740.

10.

Pukelsheim

(2006). Optimal design of experiments. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.

11.

Gottschalk

P. G.

, & Dunn

J. R.

(2005). Measuring parallelism, linearity, and relative potency in bioassay and immunoassay data. J Biopharm Statist, 15, 437-463.

12.

Yang

Hyun

J. K.

Zhang

Strouse

R. J.

Schenerman

, & Jiang

(2012). Implementation of parallelism testing for four-parameter logistic model in bioassays, doi: 10.5731/pdajpst.2012.00867.

13.

Hauck

W. W.

Capen

R. C.

Callahan

J. D.

De Muth

J. E.

Hsu

Lansky

, Sajjadi

N. C.

Seaver

S. S.

Singer

R. H.

,& Weisman

(2005). Assessing parallelism prior to determining relative potency. PDA J Pharm Sci Technol, 59, 127-137.

14.

Kpamegan

E. P.

(2005). A comparative study of statistical methods to assess dilutional similarity. Bio Pharm Inter, 18(10), 60-63.

15.

, & Majumdar

(2007), D-optimal designs for logistic models with three and four Parameters. Journal of Statistical Planning and Inference, 138(2008), 1950-1959.

16.

Hebble

T. L.

, & Mitchell

T. J.

(1972), I ‘repairing’ response surface designs. Technornetrics, 14, 767-779.

17.

Hyun

S. W.

, & Wong

W. K.

(2015). Multiple-objective optimal designs for studying the dose response function and interesting dose levels. Int J Biostat. doi: 10.1515/ijb-2015-0044.

18.

Hyun

S. W.

Wong

W. K.

, & Yang

(2018). VNM: An R package for finding multiple-objective optimal designs for the 4-parameter logistic model. Journal of Statistical Software, doi: 10.18637/jss.v083.i05.

19.

Jonkman

J. N.

, & Sidik

(2009). Equivalence testing for parallelism in the four-parameter logistic model. Journal of Biopharmaceutical Statistics, 19(5), 818-837.

20.

McGree

J. M.

Eccleston

J. A.

, & Duffull

S. B.

(2008). Compound optimal design criteria for nonlinear models. Journal of Biopharmaceutical Statistics, 18, 641-661.

21.

Mitchell

T. J.

(2000). An algorithm for the construction of ‘D-optimal’ experimental designs. Technometrics, 42(1), 48-54.

22.

Mitchell

T. J.

(1972). An algorithm for the construction of ‘D-optimal’ experimental designs. Applied to First-Order Models, ORNL-4777, Oak Ridge National Laboratory.

23.

Mitchell

T. J.

, & Miller

F. L.

, Jr. (1970). Use of ‘Design Repair’ to Construct Designs for Special Linear Models, in: Mathematical Division Annual Progress Report (ORNL-466l), Oak Ridge National Laboratory.

24.

Pukelsheim

(1993). Optimal designs of experiments. Wiley, New York.

25.

Silvey

S. D.

(1980). Optimal design: An introduction to the theory for parameter estimation. Chapman & Hall, London.

26.

Wynn

H. P.

(1970). The sequential generation of D-optimum experimental designs. The Annals of Mathematical Statistics, 41, 1655-1664.

27.

Fedorov

V. V.

(1972). Theory of Optimal Experiments. Academic Press.

28.

Fedorov

V. V

. & Hackl

(1977). Model-oriented design of experiments. Springer, New York.

Modified D-optimal design for two lines in case 1
$X$	Control line	1	1.88	1.89	2.95	4
$P_{C}$		0.25	0.08	0.17	0.25	0.25
$X$	Test line	1	1.99	3.08	4
$P_{T}$		0.25	0.25	0.25	0.25
$X$	Modified D-optimal	1	1.88	1.89	1.99	2.95	3.08	4
$P_{M}$		0.25	0.04	0.085	0.125	0.125	0.125	0.25
$n_{M}$		7	1	3	4	4	3	8
Modified D-optimal design for two lines in case 2
$X$	Control line	1	1.88	2.95	4
$P_{C}$		0.25	0.25	0.25	0.25
$X$	Test line	1	2.01	3.08	4
$P_{T}$		0.25	0.25	0.25	0.25
$X$	Modified D-optimal	1	1.88	2.01	2.95	3.08	4
$P_{M}$		0.25	0.125	0.125	0.125	0.125	0.25
$n_{M}$		7	4	4	4	3	8