Logistic Regression Method for Ligand Discovery

Abstract

Protein-based virtual screening is integral to the modern drug discovery process. Most protein-based virtual screening experiments are performed using docking programs. The accuracy of a docking program strongly relies on the incorporated scoring function used, which is based on various energy terms. The existing scoring functions deal with the energy terms that use the equal weight function or other weight functions, which do not depend on characteristics of the protein. To improve the existing methods, Lu and Wang proposed a protein-specific scoring function based on a regression analysis that was shown to have higher performance than the existing methods. In this study, we propose a protein-specific scoring approach to select potential ligands based on logistic regression analysis. The performance of our method was evaluated using the Directory of Useful Decoys docked data set, which contains 40 protein targets. The results showed that the proposed method can increase the enrichment factors for most of the 40 protein targets.

1. Introduction

In recent years, protein–ligand docking methodologies have been rapidly developed and play an important role in the design of new drugs. The main goals of these methods are binding affinity estimation, pose prediction, and aiding virtual screening (Jain and Nicholls, 2008). Moreover, when exploring large compound libraries, the method must be able to successfully verify binding from nonbinding proteins and to rank these ligands correctly in the database (Kolb and Irwin, 2009). At present, with advance in computational tools, the docking process can be performed using computer software, such as AutoDock (Morris et al., 1998), DOCK (Kuntz et al., 1982; Ewing et al., 2001), and GOLD (Verdonk et al., 2003), and besides, the other related programs have been summarized (Pagadala et al., 2017). In general, a useful docking process consists of two components: an efficient search algorithm and an appropriate scoring function. In this study, we focus on the scoring function of protein–ligand docking.

During the docking process, search algorithms are used to investigate numerous ligand conformations. Scoring functions are used to evaluate the quality of the docking poses and to guide the search methods toward relevant ligand conformations. A scoring function must be able to distinguish the observed binding modes and associate them with the lowest energy values of the energy landscape. The second goal of the scoring function is to classify ligands and decoys (or inactive ligands) properly. The most important goal of the scoring function is to predict the binding affinity and to rank compounds according to their estimated binding affinities.

Three main classes of scoring functions have been described in the related literature: force field-based, empirical-based, and knowledge-based functions (Wang et al., 2003; Huang et al., 2010). In general, force field-based functions are derived from a classical force field and consist of a sum of energy terms. Empirical-based functions determine the scoring function and estimate binding affinity by using a regression approach (de Azevedo and Dias, 2008). Knowledge-based scoring functions are developed based on the statistical analysis of interacting atom pairs from protein–ligand complexes with available three-dimensional structures (Velec et al., 2005; Muegge, 2006). Although many scoring functions have been studied, no universal scoring function exists with significant reliability and efficiency for all proteins. Several studies have suggested that the performance of scoring functions could be improved by changing scoring strategies (Feher, 2006; Houston and Walkinshaw, 2013). Another way to improve the accuracy of the binding affinity prediction is to use a rescoring approach for each target protein individually; for example, developing the protein-specific scoring function (Lu and Wang, 2012). The score of the protein-specific scoring function depends on the protein specifically, different proteins having different scores. Hence, this scoring function type is expected to be more efficient for determining all about the protein family (Lu and Wang, 2012).

In this study, we propose a procedure to determine a protein-specific scoring function based on logistic regression, which was developed based on some thorough studies (Lu and Wang, 2012). The Directory of Useful Decoys (DUD) data sets (Huang et al., 2006) was used to validate the proposed method.

2. Materials and Statistical Methods

2.1. The DUD data set

The DUD data set is a published data set providing active compounds and decoys for crystal structures of ligand–target complexes. The DUD is designed for evaluating docking programs. This data set contains 2950 active compounds for a total of 40 target proteins. In addition, for every ligand, the data set contains 36 decoys that have similar physicochemical properties, such as molecular weights, calculated logP, and the number of hydrogen bonding groups, but structurally dissimilar (Huang et al., 2006). The docking procedure was performed and validated on these 40 protein targets with all ligands and decoys by using the DOCK program (Meng et al., 1992).

2.2. Enrichment factor and existing statistical method

The docking enrichment factor (EF) can be applied to measure the potential of the docking calculations to determine true positives throughout the database. Especially, the EF is described with respect to a given percentage of the database screened. Let x be the percentage of the compounds screened: $x = \frac{N_{s e l e c t}}{N_{t o t a l}} \times 100,$ (1)

where $N_{t o t a l}$ represents the total number of compounds in the database and $N_{s e l e c t}$ is the total number of screened compounds. For a given x, $E F_{x}$ indicates that the EF value is calculated after the $x %$ of the database has been screened. The EF for a given x is calculated as follows: $E F_{x} = \frac{l i g a n d_{s e l e c t}}{l i g a n d_{t o t a l}} \times \frac{100}{x},$ (2)

where $l i g a n d_{t o t a l}$ represents the total number of ligands and $l i g a n d_{s e l e c t}$ represents the number of ligands among the $N_{s e l e c t}$ compounds. In this study, we set $x = 1$ or $x = 20$ .

In the DOCK program (Meng et al., 1992), four energy terms (electrostatic interaction energy, van der Waals interaction energy, polar component of the ligand desolvation energy, and apolar part of the cost of ligand desolvation energy) are typically used to build a scoring function. The four energy terms are denoted as $E_{e l e}, E_{v d w}, E_{p o l}$ , and $E_{a p o l}$ , respectively. A typical scoring function is defined as the equal weight sum of the four energy terms, demonstrated as follows: $E_{e l e} + E_{v d w} + E_{p o l} + E_{a p o l} .$ (3)

The compounds with a smaller value of Eq. (3) are chosen as potential ligands. Because of the different biological backgrounds for each protein, Lu and Wang (2012) suggested using a protein-specific scoring function by summing the four energy terms with unequal weights, demonstrated as follows: $β_{e l e} E_{e l e} + β_{v d w} E_{v d w} + β_{p o l} E_{p o l} + β_{a p o l} E_{a p o l},$ (4)

where $β_{e l e}, β_{v d w}, β_{p o l}$ , and $β_{a p o l}$ are unknown parameters and need to be estimated. Lu and Wang (2012) applied a tolerance interval (TI) method to detect outliers for the four energy terms. TI is widely used in pharmaceutical applications and can be adopted to detect outliers (Wang, 2007; Cai and Wang, 2009; Wang and Tsung, 2009, 2017; Meeker et al., 2017). After dealing with outliers, Lu and Wang (2012) employed a regression analysis to estimate $β_{e l e}, β_{v d w}, β_{p o l}$ , and $β_{a p o l}$ of Eq. (4). The compounds with a lower rescored function value are selected as potential ligands.

2.3. Logistic regression analysis

In this study, we use the form: $β_{e l e} E_{e l e} + \dots + β_{a p o l} E_{a p o l} + β_{e l e \times v d w} E_{e l e \times v d w} + \dots + β_{p o l \times a p o l} E_{p o l \times a p o l},$ (5)

as a protein-specific scoring function, where $E_{e l e \times v d w}, E_{e l e \times p o l}, E_{e l e \times a p o l}, E_{v d w \times p o l}, E_{v d w \times a p o l}$ , and $E_{p o l \times a p o l}$ are two-way interaction terms of the four energies. The weights are unknown parameters. Consequentially, the compounds with smaller values of Eq. (5) are considered as potential ligands. For each protein, we proposed a method to estimate such that the scoring function [Eq. (5)] maximizes the EF value. We adopt the logistic regression method to estimate .

In logistic regression analysis, we first define the response variables Y_i to be a decoy ( $Y_{i} = 1$ ) or a ligand ( $Y_{i} = 0$ ) for $i = 1, 2, \dots, n$ . Hence, the logistic regression is written as follows: $\begin{matrix} log (\frac{p_{i}}{1 - p_{i}}) & = Z_{i} γ \\ = γ_{e l e} E_{e l e, i} + \dots + γ_{a p o l} E_{a p o l, i} + γ_{e l e \times v d w} E_{e l e \times v d w, i} + \dots + γ_{p o l \times a p o l} E_{p o l \times a p o l, i}, \end{matrix}$ (6)

where $Z_{i} = \{E_{e l e, i}, E_{v d w, i}, \dots, E_{p o l \times a p o l, i}\}$ and $γ = \{γ_{e l e}, γ_{v d w}, \dots, γ_{p o l \times a p o l}\}$ , and $p_{i} = Pr (Y_{i} = 1 |Z_{i})$ is the probability that the response variable Y_i is a decoy given $Z_{i}$ . The maximum likelihood estimator (MLE) of $γ$ is denoted as . We define a new scoring function by replacing with in Eq. (5): $E_{t o t a l, i}^{s} \equiv {\hat{γ}}_{e l e} E_{e l e, i} + \dots + {\hat{γ}}_{a p o l} E_{a p o l, i} + {\hat{γ}}_{e l e \times v d w} E_{e l e \times v d w, i} + \dots + {\hat{γ}}_{p o l \times a p o l} E_{p o l \times a p o l, i} .$ (7)

And then we select compounds corresponding to a small value of $E_{t o t a l, i}^{s}$ as potential ligands. However, this yields unsatisfactory results for some proteins compared with the scoring function [Eq. (3)] (Table 1). Thus, we make a modification of the scoring function hereunder.

Table 1.

Enrichment Factors for Each Protein Target Listed by the Equal Weight Scoring Function, Lu and Wang's Regression-Based Scoring Function, Logistic Regression Scoring Function, and Procedure 1 for the Top 1% Subset of the Entire Database

Protein	No. of ligands	No. of decoys	Equal weight	Lu and Wang	Logistic	Procedure 1
Protein	No. of ligands	No. of decoys	$E F_{1}$	$E F_{1}$	$E F_{1}$	$E F_{1}$
AR	67	2592	17.91	22.39	17.91	22.39
ER_agonist	67	2346	5.97	22.39	14.93	14.93
ER_antagonist	38	1395	10.53	13.16	23.68	18.42
GR	65	2544	20	24.61	10.77	24.61
MR	12	497	33.33	41.67	33.33	41.67
PPARg	80	2511	0	5	0	1.25
PR	26	949	0	19.23	15.28	19.23
RXRa	20	624	15	25	5	25
CDK2	50	1549	8	12	10	12
EGFr	444	13112	3.38	6.98	17.12	14.64
FGFr1	118	4204	0	11.86	24.58	8.47
HSP90	24	782	8.33	16.67	20.83	16.67
P38 MAP	256	7824	1.17	7.81	6.25	13.67
PDGFrb	152	5008	0	1.32	3.95	13.16
SRC	155	5322	0.65	4.52	13.55	15.48
TK	22	772	0	0	0	0
VEGFr2	74	2637	2.7	6.76	10.81	9.46
FXa	142	5079	5.63	16.9	21.83	19.72
Thrombin	65	2289	4.62	10.77	15.38	10.77
Trypsin	44	1541	0	13.64	15.91	11.36
ACE	49	1711	6.12	20.41	16.33	20.41
ADA	19	809	15.79	36.84	36.84	42.11
COMT	10	428	0	10	20	30
PDE5	51	1808	5.88	13.72	11.76	13.72
DHFR	201	7017	23.38	34.33	34.33	35.82
GART	21	603	4.76	9.52	4.76	9.52
AChE	105	3226	3.81	3.81	2.86	4.76
ALR2	26	918	15.38	26.92	23.08	26.92
AmpC	21	731	0	19.04	19.04	19.04
COX1	25	824	8	16	12	16
COX2	336	10240	14.88	15.77	6.85	17.56
GPB	49	1767	6.12	28.57	18.37	28.57
HIVPR	49	1863	2.04	14.29	8.16	6.12
HIVRT	37	1400	5.41	10.81	8.11	13.51
HMGR	35	1239	25.71	34.29	34.29	34.29
Inha	85	3032	0	31.76	27.06	29.41
NA	49	1737	10.2	6.12	6.12	10.2
PARP	33	1140	6.06	12.12	9.09	15.15
PNP	23	642	17.39	17.39	21.74	17.39
SAHH	33	751	6.06	15.15	0	9.09
Average			7.86	16.49	15.05	17.81
>Equal weight				36	29	37
= Equal weight				3	5	3
<Equal weight				1	6	0
>Lu and Wang			1		12	15
= Lu and Wang			3		6	17
<Lu and Wang			36		22	8

For each protein, we suppose that the ith smallest total energy of the n compounds for the protein is $E_{t o t a l, (i)}$ and the corresponding energies for $E_{t o t a l, (i)}$ are $X_{e l e, i}, X_{v d w, i}, X_{p o l, i}$ , and $X_{a p o l, i}$ . The logistic regression model is rewritten as follows: $\begin{matrix} log (\frac{p_{i}^{*}}{1 - p_{i}^{*}}) & = X_{i} γ^{*} \\ = γ_{e l e}^{*} X_{e l e, i} + \dots + γ_{a p o l}^{*} X_{a p o l, i} + γ_{e l e \times v d w}^{*} X_{e l e \times v d w, i} + \dots + γ_{p o l \times a p o l}^{*} X_{p o l \times a p o l, i}, \end{matrix}$ (8)

where $X_{i} = \{X_{e l e, i}, X_{v d w, i}, \dots, X_{p o l \times a p o l, i}\}$ and $γ^{*} = \{γ_{e l e}^{*}, γ_{v d w}^{*}, \dots, γ_{p o l \times a p o l}^{*}\}$ , and $p_{i}^{*}$ is the probability of the compound having no opportunity to be a potential ligand for given $X_{i}$ . The MLE of $γ^{*}$ is denoted as , which is an alternative estimator of , and we replace with in Eq. (7) to calculate the EF.

The reason that we use Eq. (8) to find an alternative estimator of is that we regard compounds with lower total energy based on the scoring function [Eq. (3)] to be an active compound. Thus, instead of using the true ligand directly, we denote that the response variable is zero corresponding to the compound with lower total energy based on the scoring function [Eq. (3)]. In other words, we treat the energy terms $X_{e l e, 1}, X_{v d w, 1}, X_{p o l, 1}$ , and $X_{a p o l, 1}$ as the highest potential active compounds. And then, the set of energy terms $X_{e l e, 2}, X_{v d w, 2}, X_{p o l, 2}$ , and $X_{a p o l, 2}$ are considered as the second highest potential active compounds, and so on. As already mentioned, we can refit model Eq. (8) $n - 1$ times by changing the vector of response variables from $\{0, 1, \dots, 1\}$ , $\{0, 0, 1, \dots, 1\}$ to $\{0, \dots, 0, 1\}$ , in this way we have $n - 1$ s and the corresponding EF value. The largest EF values and their corresponding estimations of are recorded. The details of this methodology for each protein are summarized in Procedure 1.

Procedure 1

Step 1

First, we assume that the compound with energy terms $X_{e l e, 1}, X_{v d w, 1}, X_{p o l, 1}$ , and $X_{a p o l, 1}$ is potential to be a ligand, then decode the corresponding response variable as 0 and others as 1. By fitting this logistic model, we obtain , and then define $E_{t o t a l, i}^{s} = {\hat{γ}}^{*}_{e l e} E_{e l e, i} + \dots + {\hat{γ}}^{*}_{a p o l} E_{a p o l, i} + {\hat{γ}}^{*}_{e l e \times v d w} E_{e l e \times v d w, i} + \dots + {\hat{γ}}^{*}_{p o l \times a p o l} E_{p o l \times a p o l, i},$ (9)

for $i = 1, \dots, n$ . We rank $E_{t o t a l, i}^{s}$ from the smallest to the largest and select the $n x %$ compounds of lower weighted total energy as the potential ligands. Then, we calculate $E F_{x}^{}$ at $x = 1$ and $x = 20$ .

Step 2

We assume that the compounds with energy terms $X_{e l e, 1}, X_{v d w, 1}, X_{p o l, 1},$ and $X_{a p o l, 1}$ as well as $X_{e l e, 2}, X_{v d w, 2}, X_{p o l, 2}$ , and $X_{a p o l, 2}$ are potential to be ligands. The corresponding response variables are decoded as 0, and the others are decoded as 1. is determined by using this logistic model [Eq. (8)], and the EF is then calculated as described in Step 1. We repeat the aforementioned step but change the values of response variables until there are $n - 1$ response variables decoded as 0. Through these steps, the $n - 1$ EF values and corresponding s are calculated. The maximum of the EF and the corresponding are recorded. The corresponding is the desired coefficient. If more than one value is reached to the maximum value of EF, we randomly select one of them, and calculate .

3. Results and Discussion

In this study, the DUD data set is used as our data source and Procedure 1 is applied to obtain protein-specific scoring functions in virtual screening. A total of 40 proteins exist in the DUD data sets, and each protein has its own experimentally confirmed ligands. The number of ligands and decoys for each protein are listed in Table 1. To compare the performance of Lu and Wang's scoring function with the proposed method, we first evaluate the functions by using the top 1% compounds in the ranked database, as ranked by each scoring function. We compare the EFs of the equal weight scoring method [Eq. (3)], Lu and Wang's method [Eq. (4)], the logistic regression scoring method, and Procedure 1 for the top 1% compounds for each protein. The corresponding averages for the 40 proteins are 7.86, 16.49, 15.05, and 17.81, respectively. The logistic regression scoring method is slightly inferior to the Lu and Wang's method. However, Procedure 1 has a much better average EF than other methods.

In addition, we evaluate these methods for the top 20% of compounds in the ranked database (Table 2). The averages of the EFs for the top 20% compounds determined by using the equal weight scoring method, Lu and Wang's scoring method, logistic regression scoring method, and Procedure 1 are 2.21, 2.73, 3.35, and 3.36, respectively. These results demonstrate that both proposed methods provide more satisfactory results than the Lu and Wang's method.

Table 2.

Protein	No. of ligands	No. of decoys	Equal weight	Lu and Wang	Logistic	Procedure 1
Protein	No. of ligands	No. of decoys	$E F_{20}$	$E F_{20}$	$E F_{20}$	$E F_{20}$
AR	67	2592	3.21	3.43	4.03	3.96
ER_ag	67	2346	3.06	3.51	3.96	3.81
ER_an	38	1395	1.05	3.16	3.55	3.15
GR	65	2544	2.15	2.62	4.00	3.69
MR	12	497	4.58	4.17	5.00	5
PPARg	80	2511	0.13	0.75	0.06	2.38
PR	26	949	2.31	3.27	3.46	3.65
RXRa	20	624	2.75	3.25	4.75	4.5
CDK2	50	1549	1.4	1.6	2.5	2.3
EGFr	444	13112	2.15	2.41	2.56	2.51
FGFr1	118	4204	0.08	2.63	3.69	1.99
HSP90	24	782	2.08	2.29	2.71	2.92
P38MAP	256	7824	1.58	1.84	1.91	1.95
PDGFrb	152	5008	0.23	2.17	1.81	2.2
SRC	155	5322	0.48	1	2.26	3.03
TK	22	772	2.5	2.05	2.73	3.41
VEGFr2	74	2637	1.01	1.82	1.76	1.49
FXa	142	5079	2.39	4.12	4.44	4.4
Thrombin	65	2289	2.54	1.77	4.23	3.15
Trypsin	44	1541	2.27	3.98	4.66	3.98
ACE	49	1711	2.14	2.24	2.65	2.76
ADA	19	809	2.89	3.42	4.21	4.21
COMT	10	428	3.5	2.6	3.5	4.5
PDE5	51	1808	1.86	1.96	2.75	2.55
DHFR	201	7017	3.56	4.1	4.38	4.43
GART	21	603	4.05	4.28	3.57	4.76
AChE	105	3226	2.48	2.71	3.62	3.33
ALR2	26	918	2.31	2.12	3.65	3.65
AmpC	21	731	0.95	2.14	2.14	2.38
COX1	25	824	2	1.4	3.6	2.8
COX2	336	10240	3.39	3.74	3.30	4.02
GPB	49	1767	3.47	4.18	4.29	4.49
HIVPR	49	1863	0.92	1.53	3.57	2.14
HIVRT	37	1400	2.16	2.16	2.3	2.7
HMGR	35	1239	2.14	2.14	3.57	2.86
Inha	85	3032	0	2.35	3.18	3
NA	49	1737	3.16	3.47	3.98	3.98
PARP	33	1140	3.94	3.64	4.55	4.39
PNP	23	642	1.96	3.26	3.91	3.7
SAHH	33	751	3.64	3.78	3.18	4.09
Average			2.21	2.73	3.35	3.36
>Equal weight				31	35	40
= Equal weight				2	1	0
<Equal weight				7	4	0
>Lu and Wang			7		33	36
= Lu and Wang			2		1	1
<Lu and Wang			31		6	3

In Table 1, 32 of the 40 systems have EF values in Procedure 1 that are equally or much better than Lu and Wang's scoring method, although the improvement is not significant. In addition, as listed in Table 2, the logistic regression method and Procedure 1 show significant improvement compared with Lu and Wang's method. Although the logistic regression method provides inferior results compared with Lu and Wang's method for the top 1% compounds, it has the advantage of requiring less time. Therefore, the logistic regression method is a competitive method compared with Lu and Wang's method.

4. Conclusion

In this study, we adopt a method based on logistic regression analysis to increase the EF by developing a procedure for obtaining a protein-specific weight for energy terms. Our results show that this protein-specific scoring method could improve the equal weight scoring function and regression-based protein-specific scoring function for the 40 protein targets in the DUD data sets. It is also expandable to larger databases. Furthermore, this method is not limited to the DOCK scoring function. It can be applied to modify other scoring functions, such as the GOLD score and Glide score. We believe that this method can significantly elevate the hits rate, which can benefit the modern drug discovery process.

Footnotes

Author Disclosure Statement

The authors declare they have no competing financial interests.

Funding Information

This study was supported by the Ministry of Science and Technology, Grant No. 107-2118-M-009-002-MY2, Taiwan.

References

Cai

T.T.

, and Wang

2009. Tolerance intervals for discrete distributions in exponential families. Stat. Sin. 19, 905–923.

de Azevedo

W.F.

, and Dias

2008. Computational methods for calculation of ligand-binding affinity. Curr. Drug Targets, 9, 1031–1039.

Ewing

T.J.A.

, Makino

, Skillman

A.G.

, et al. 2001. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15, 411–428.

Feher

2006. Consensus scoring for protein-ligand interactions. Drug Discov. Today, 11, 421–428.

Houston

D.R.

, and Walkinshaw

M.D.

2013. Consensus docking: Improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model, 53, 384–390.

Huang

, Shoichet

B.K.

, and Irwin

J.J.

2006. Benchmarking sets for molecular docking. J. Med. Chem. 49, 6789–6801.

Huang

S.Y.

, Grinter

S.Z.

, and Zou

X.Q.

2010. Scoring functions and their evaluation methods for protein-ligand docking: Recent advances and future directions. Phys. Chem. Chem. Phys. 12, 12899–12908.

Jain

A.N.

, and Nicholls

2008. Recommendations for evaluation of computational methods. J. Comput. Aided Mol. Des. 22, 133–139.

Kolb

, and Irwin

J.J.

2009. Docking screens: Right for the right reasons?. Curr. Top. Med. Chem. 9, 755–770.

10.

Kuntz

I.D.

, Blaney

J.M.

, Oatley

S.J.

, et al. 1982. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269–288.

11.

I.L.

, and Wang

2012. Protein-specific scoring method for ligand discovery. J. Comput. Biol. 19, 1215–1226.

12.

Meeker

W.Q.

, Hahn

G.J.

, and Escobar

L.A.

2017. Statistical Intervals: A Guide for Practitioners and Researchers. Wiley, Hoboken, NJ.

13.

Meng

E.C.

, Shoichet

B.K.

, and Kuntz

I.D.

1992. Automated docking with grid-based energy evaluation. J. Comput. Chem. 13, 505–524.

14.

Morris

G.M.

, Goodsell

D.S.

, Halliday

R.S.

, et al. 1998. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 19, 1639–1662.

15.

Muegge

2006. PMF scoring revisited. J. Med. Chem. 49, 5895–5902.

16.

Pagadala

N.S.

, Syed

, and Tuszynski

2017. Software for molecular docking: A review. Biophys. Rev. 9, 91–102.

17.

Velec

H.F.G.

, Gohlke

, and Klebe

2005. DrugScore (CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J. Med. Chem. 48, 6296–6303.

18.

Verdonk

M.L.

, Cole

J.C.

, Hartshorn

M.J.

, et al. 2003. Improved protein-ligand docking using GOLD. Proteins Struct. Funct. Genet. 52, 609–623.

19.

Wang

2007. Estimation of the probability of passing the USP dissolution test. J. Biopharm. Stat. 17, 407–413.

20.

Wang

, and Tsung

F.G.

2009. Tolerance intervals with improved coverage probabilities for binomial and poisson variables. Technometrics, 51, 25–33.

21.

Wang

, and Tsung

2017. Constructing tolerance intervals for the number of defectives using both high-and low-resolution data. J. Qual. Technol. 49, 354–364.

22.

Wang

R.X.

, Lu

Y.P.

, and Wang

S.M.

2003. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 46, 2287–2303.