Trivariate generalized non linear mixed model with transformations for meta analysis of diagnostic accuracy

Abstract

In this paper, a trivariate generalized non linear mixed model (TGLMM) using probit and complementary log-log transformations, is considered. These models are helpful in studying the complex relationship among the sensitivity (SN), specificity (SP) and disease prevalence (DP). For estimation of SN, SP, DP, positive (negative) predictive values (PPV and NPV) and positive (negative) likelihood ratios, Non-linear Mixed (NLMIXED) approach has been used. Model selection techniques are used to identify the best-fitting model for making statistical inference. The proposed trivariate non linear random effects models prove to be very useful in practice for meta-analysis of diagnostic accuracy studies.

Keywords

Meta-analysis sensitivity specificity disease prevalence positive predictive value negative predictive value

1. Introduction

Diagnosing a disease condition accurately helps in its control and prevention. For deciding upon the performance of a diagnostic test, sensitivity (SN), specificity (SP), positive (negative) predictive values (PPV and NPV) and positive (negative) diagnostic likelihood ratios (Zhou et al., 2009; Pepe, 2003) plays a very important role. As SN and SP vary among studies and depend on different cut-off points, the clinical usefulness of a diagnostic test may not be truly brought out which in turn, depends upon disease prevalance (Li et al., 2007). PPV and NPV are important indicators for the prediction accuracy of a test. Sometimes non linear relationship may exist between PPV, NPV and sensitivity, specificity and disease prevalence.

There is an emphasis on evidence-based diagnosis by meta-analysis of diagnostic test accuracy studies (Egger et al., 2008) due to fast development of evidence-based medicine. Meta-analysis helps in summarizing the results quantitatively from similar diagnostic test accuracy studies. When a diagnostic test is compared with its gold standard, many methods are available to take into account the heterogeneity between studies (Rutter & Gatsonis, 2001; Song et al., 2002; Van Houwelingen et al., 2002; Macaskill, 2004; Reitsma et al., 2005; Mallett et al., 2006) which arises due to the differences in disease prevalence, study design as well as laboratory and other errors. Because of this heterogeneity, random effects models viz hierarchical summary receiver operating characteristic model l (Rutter & Gatsonis, 2001) and bivariate random effects model on sensitivities and specificities (Van Houwelingen et al., 2002; Reitsma et al., 2005) have been recommended (Chu, 2009). In meta-analysis, Riley and others (Riley et al., 2007, 2008) suggested that bivariate random-effects offer numerous advantages over univariate meta-analysis.

Heterogeneity plays a distorted role in meta-analysis and may lead to differences in indicators like disease prevalence, population stratification etc. In clinical reserach, reporting of positive and negative diagnostic odds ratio, PPV and NPV of diagnostic test accuracy is of utmost importance and superseeds sensitivity and specificity on many instances, particularly in meta-analysis studies. Classical research is sparse on positive and negative odds ratio and also on PPV and NPV whether it is concerned with univariate meta-analysis or bivariate random effects meta-analysis. In literature, only Zwinderman and Bossuyt (2008) considered bivariate random effects meta-analysis on positive and negative diagnostic likelihood ratios and suggested that these should not be pooled in systematic reviews.

In situations where studies compare a diagnostic test with its gold standard reference test, only logit transformation has been used for the trivariate random effects meta-analysis of sensitivity, specificity and disease prevalance parameters in practice (Chu et al., 2009). Some other transformations such as probit, complementary log-log etc. have not been explored in this setting. Some of these transformations may provide a better goodness of fit than logit transformation and provide better statistical inference for sensitivity, specificity and disease prevalence. It has been observed that when the probability of an event is very small or very large, complementary log-log models are more applicable for a diagnostic test with a very high sensitivity and specificity. Moreover, asymetrical complementary log-log transformation is likely to provide better goodness of fit for skewed data as compared to symmetrical logit and probit transformations. If $Se_{i}$ , $Sp_{i}$ and $\pi_{i}$ denote sensitivity, specificity and disease prevalence of study $i$ , then this implies that the trivariate normal distribution assumption on $(Se_{i},1-Sp_{i},\pi_{i})$ , $(Se_{i},Sp_{i},\pi_{i})$ , $(1-Se_{i},Sp_{i},\pi_{i})$ , $(1-Se_{i},1-Sp_{i},\pi_{i})$ , $(Se_{i},1-Sp_{i},1-\pi_{i})$ , $(Se_{i},Sp_{i},1-\pi_{i})$ , $(1-Se_{i},Sp_{i},1-\pi_{i})$ , and $(1-Se_{i},1-Sp_{i},1-\pi_{i})$ in the transformed scale will provide the same goodness of fit and inference if we use logit or probit transformation, but goodness of fit is different on using the complementary log-log transformation.

In this article, we focus on situations where the reference test can be considered as a gold standard and consider a trivariate generalized non linear mixed effects model (TGLMM) for meta-analysis of diagnostic accuracy studies with logit, probit and complementary log-log transformation as special cases.

Some notations and basic definitions are given in Section 2. In Section 3, we present the generalized trivariate random effects model and derive the maximum likelihood function under the parameterization. Median estimation for logit, probit and complementary log-log link functions has been discussed in Section 3. Nonlinear mixed (NLMIXED) estimation approach is presented in Sectiom 4. In Section 5, we analyze a data set. Simulation results are presented in Section 6 and a brief conclusion follows in Section 7.

Table 1
Cell counts and cell probabilities

Gold standard
Diagnostic test	Disease	Healthy	Total
Positive ( $+$ )	$n_{i11}$	$n_{i01}$	$n_{i.1}$
	$\pi_{i}Se_{i}$	$(1-\pi_{i})Sp_{i}$	$\pi_{i}Se_{i}+(1-\pi_{i})Sp_{i}$
	$\textit{PPV}_{i}\times P_{i}$	$(1-\textit{PPV}_{i})\times P_{i}$	$P_{i}$
Negative ( $-$ )	$n_{i10}$	$n_{i00}$	$n_{i.0}$
	$\pi_{i}(1-Se_{i})$	$(1-\pi_{i})Sp_{i}$	$\pi_{i}(1-Se_{i})+(1-\pi_{i})Sp_{i}$
	$(1-\textit{NPV}_{i})\times(1-P_{i})$	$\textit{NPV}_{i}\times(1-P_{i})$	$1-P_{i}$
Total	$n_{i1.}=n_{i11}+n_{i10}$	$n_{i0.}=n_{i01}+n_{i00}$	$n_{i}$
	$\pi_{i}$	$(1-\pi_{i})$	1
	$\textit{PPV}_{i}\times P_{i}+(1-\textit{NPV}_{i})\times(1-P_{i})$	$(1-\textit{PPV}_{i})\times P_{i}+\textit{NPV}_{i}\times(1-P_{i})$	1

2. Notations and defintions

For $i^{\rm th}$ study, let $n_{i11}$ , $n_{i00}$ , $n_{i01}$ and $n_{i10}$ denote the number of true positives, true negatives, false positives and false negatives. $n_{i1.}=n_{i11}+n_{i10}$ and $n_{i0.}=n_{i01}+n_{i00}$ denote the number of diseased and healthy subjects and $n_{i.1}=n_{i11}+n_{i01}$ and $n_{i.0}=n_{i10}+n_{i00}$ denote the number of test positive and negative subjects. The total number of subjects in the $i^{th}$ study is $n_{i}=n_{i1.}+n_{i0.}=n_{i.1}+n_{i.0}$ . Sensitivity, specificity and disease prevalence are defined by $Se_{i}$ , $Sp_{i}$ and $\pi_{i}$ of study $i$ , respectively. $P$ denotes the test prevalence. PPV and NPV denote positive predictive value and negative predictive values. $T=1,0$ represents test positive or negative and $D=1,0$ indicates diseased or non-diseased. The second and third line in Table 1 present the likelihood contribution based on the parameterization of ( $\pi_{i}$ , $Se_{i}$ , $Sp_{i}$ ) and $(P_{i},\textit{PPV}_{i},\textit{{NPV}}_{i})$ , respectively.

Definition 1. Sensitivity (the true positive fraction), is defined as the conditional probability of testing positive in diseased subjects i.e. $Se_{i}=\Pr(T=1|D=1)$ . Specificity (the true negative fraction), is the conditional probability of negative test result in non-diseased subjects i.e. $Sp_{i}=\Pr(T=0|D=0)$ .

Definition 2. Disease prevalence is defined as $\pi_{i}=\Pr(D=1)$ , the probability of having disease.

Definition 3. Positive predictive value (PPV) is the probability that a subject with positive test result is truly diseased i.e. $\Pr(D=1|T=1)$ and Negative predictive value (NPV) is defined as the probability of not having disease in people with a negative test i.e. $\Pr(D=0|T=0)$ .

3. Random effect model based on parameterization of $\pi_{i}$ , $Se_{i}$ and $Sp_{i}$

In order to take correlation among the disease prevalence, sensitivity and specificity between studies into consideration, we extend the bivariate generalized linear mixed model approach for diagnostic measures ( $Se_{i}$ , $Sp_{i}$ ) to jointly model ( $\pi_{i}$ , $Se_{i}$ , $Sp_{i}$ ). Specifically, in the first stage, the trivariate generalized linear mixed effects model can be specified as follows:

$n_{i1}|\pi_{i}\sim Bin(n_{i},\pi_{i}),\hskip 7.227ptn_{i11}|(n_{i1},Se_{i})% \sim Bin(n_{i1},Se_{i})\hskip 7.227pt\text{and}\hskip 7.227ptn_{i00}|(n_{i0},% Sp_{i})\sim Bin(n_{i1},Sp_{i})$ (1)

If ( $\pi_{i}$ , $Se_{i}$ , $Sp_{i}$ ) are known, then the test prevalence ( $P$ ), PPV and NPV can be estimated by using

$\displaystyle Pi=\pi_{i}Se_{i}+(1-\pi_{i})(1-Sp_{i})\quad\textit{PPV}_{i}=% \frac{\pi_{i}Se_{i}}{\pi_{i}Se_{i}+(1-\pi_{i})(1-Sp_{i})}$ $\displaystyle\textit{NPV}_{i}=\frac{(1-\pi_{i})Sp_{i}}{\pi_{i}(1-Se_{i})+(1-% \pi_{i})Sp_{i}}.$

In each cell, the first line shows the number of subjects, the second and third line present the corresponding probabilities based on the parameterization of $(\pi_{i},Se_{i},Sp_{i})$ and $(P_{i},\textit{PPV}_{i},\textit{NPV}_{i})$ , respectively.

To take heterogeneity and potential between-study correlations of $\pi_{i}$ , $Se_{i}$ and $Sp_{i}$ into consideration, we specify random effects models in the second stage as follows:

$\displaystyle g(\pi_{i})=\eta+\epsilon_{i}\quad g(Se_{i})=\alpha+\mu_{i}\quad g% (Sp_{i})=\beta+\nu_{i},$

where $g(.)$ is a monotone link function namely logit, probit or complementary log-log transformation and $(\eta,\alpha,\beta)$ are fixed effect parameters. If $g^{-1}(.)$ is the inverse link function, then medians of $\pi$ , $S e$ and $S p$ can be approximated by $g^{-1}(\eta)$ , $g^{-1}(\alpha)$ and $g^{-1}(\beta)$ respectively. The random effects $(\epsilon_{i},\mu_{i},\nu_{i})^{T}$ are jointly distributed as Trivariate Normal with mean vector $\mathbb{0}$ and variance-covariance matrix

$\mathbf{\Sigma}=\left[\begin{array}[]{ccc}\sigma_{\epsilon}^{2}&\rho_{\epsilon% \mu}\sigma_{\epsilon}\sigma_{\mu}&\rho_{\epsilon\nu}\sigma_{\epsilon}\sigma_{% \nu}\\ &\sigma_{\mu}^{2}&\rho_{\mu\nu}\sigma_{\mu}\sigma_{\nu}\\ &&\sigma_{\nu}^{2}\end{array}\right]$

where the diagonal entries in $\mathbf{\Sigma}$ give between study variations of $\pi$ , $S e$ and $S p$ and the off-diagnonal elements depict potential correlations among the three parameters. When there is scientific evidence of homogeneity among studies (i.e. $\sigma_{\epsilon}^{2},\sigma_{\mu}^{2},\sigma_{\nu}^{2}\sim 0$ ), the corresponding random effects $(\epsilon_{i},\mu_{i},\nu_{i})$ can be dropped from the model. $\displaystyle(\rho_{\epsilon\mu},\rho_{\epsilon\nu},\rho_{\mu\nu})$ denote the correlations among the random effects. When the correlations are assumed to be zeros, then fitting of the trivariate model is equivalent to independently fitting three separate univariate models for $(\pi_{i},Se_{i},Sp_{i})$ , respectively. Furthermore, when $(\rho_{\epsilon\mu},\rho_{\epsilon\nu})$ are assumed to be zeros, the fitting of a trivariate model is equivalent to independently fitting a univariate model for $\pi_{i}$ and a bivariate model for ( $Se_{i},Sp_{i}$ ). For better approximation by normal and improving computational performance, the correlation parameters can be transformed as $Z=\frac{1}{2}[\log(1+\rho)+\log(1-\rho)]$ with corresponding to the three correlations coefficients $\displaystyle\rho_{\epsilon\mu},\rho_{\epsilon\nu},\rho_{\mu\nu}$ .

3.1 Estimation for link functions

The estimation for medians of disease prevalence, sensitivities and specificities and an approximation to the medians of predictive values and positive and negative likelihood ratios for all studies in a meta-analysis are discussed below for different link functions.

3.2 Logit Link

The median estimates for the logit model are

$\displaystyle\pi=\log it^{-1}(\eta)\quad P=\log it^{-1}(\eta)\log it^{-1}(% \alpha)+\log it^{-1}(\eta)\log it^{-1}(\beta)$ $\displaystyle Se=\log it^{-1}(\alpha)\text{ and }Sp=\log it^{-1}(\beta),$ $\displaystyle\textit{PPV}=\frac{\log it^{-1}(\eta)\log it^{-1}(\alpha)}{\log it% ^{-1}(\eta)\log it^{-1}(\alpha)+\{1-\log it^{-1}(\eta)\}\{1-\log it^{-1}(\beta% )\}},$ $\displaystyle\textit{NPV}=\frac{\{1-\log it^{-1}(\eta)\}\log it^{-1}(\beta)}{% \{1-\log it^{-1}(\eta)\}\log it^{-1}(\beta)+\log it^{-1}(\eta)\{1-\log it^{-1}% (\alpha)\}},$ $\displaystyle LR+=\log it^{-1}(\alpha)/\{1-\log it^{-1}(\beta)\},$ $\displaystyle LR-=\{1-\log it^{-1}(\alpha)\}/\log it^{-1}(\beta).$

3.3 Probit Link

The Probit model is written as $\Phi^{-1}(\pi_{i})=\eta+\epsilon_{i},\Phi^{-1}(Se_{i})=\alpha+\mu_{i},\text{ % and }\Phi^{-1}(Sp_{i})=\beta+\nu_{i}$ .

The median estimates for this model are

$\displaystyle\pi=\Phi(\eta)\quad P=\Phi(\eta)\Phi(\alpha)+\Phi(\eta)\Phi(\beta% )\quad Se=\Phi(\alpha)\quad Sp=\Phi(\beta),$ $\displaystyle\textit{PPV}=\frac{\Phi(\eta)\Phi(\alpha)}{\Phi(\eta)\Phi(\alpha)% +\left\{1-\Phi(\eta)\right\}\left\{1-\Phi(\beta)\right\}},$ $\displaystyle\textit{NPV}=\frac{\left\{1-\Phi(\eta)\right\}\Phi(\beta)}{\left% \{1-\Phi(\eta)\right\}\Phi(\beta)+\Phi(\eta)\left\{1-\Phi(\alpha)\right\}},$ $\displaystyle LR+=\Phi(\alpha)/(1-\Phi(\beta))\text{ and }LR-=(1-\Phi(\alpha))% /\Phi(\beta).$

3.4 Complementary log-log link

The Complementary log-log model is written as $\log_{e}[-\log_{e}(1-\pi_{i})]=\eta+\epsilon_{i},\log_{e}[-\log_{e}(1-Se_{i})]% =\alpha+\mu_{i}\text{ and }\log_{e}[-\log_{e}(1-Sp_{i})]=\beta+\nu_{i}$ .

The median estimates for the Complementary log-log model are

$\displaystyle\pi=1-\exp\{-\exp(\eta)\},$ $\displaystyle P=(1-\exp\{-\exp(\eta)\})(1-\exp\{\exp(\alpha)\})+(1-\exp\{-\exp% (\eta)\})(1-\exp\{-\exp(\beta)\}),$ $\displaystyle Se=1-\exp\{-\exp(\alpha)\}\text{ and }Sp=1-\exp\{-\exp(\beta)\},$ $\displaystyle\textit{PPV}=\frac{(1-\exp\{-\exp(\eta)\})(1-\exp\{-\exp(\alpha)% \})}{(1-\exp\{-\exp(\eta)\})(1-\exp\{-\exp(\alpha)\})+(\exp\{-\exp(\eta)\})(% \exp\{-\exp(\beta)\})},$ $\displaystyle\textit{NPV}=\frac{(\exp\{-\exp(\eta)\})(1-\exp\{-\exp(\beta)\})}% {(\exp\{-\exp(\eta)\})(1-\exp\{-\exp(\beta)\})+(1-\exp\{-\exp(\eta)\})(\exp\{-% \exp(\alpha)\})},$ $\displaystyle LR+=(1-\exp\{-\exp(\alpha)\})/(1-(1-\exp\{-\exp(\beta)\})),$ $\displaystyle LR-=(1-(1-\exp\{-\exp(\alpha)\}))/(1-\exp\{-\exp(\beta)\}).$

Assuming a multinomial distribution, the likelihood for $\theta_{i}=(\pi_{i},Se_{i},Sp_{i})$ given cell counts is:

$\displaystyle L(\theta_{i}|Data)\sim\{\pi_{i}Se_{i}\}^{n_{i11}}\{(1-\pi_{i})(1% -Sp_{i})\}^{n_{i10}}\{\pi_{i}(1-Se_{i})\}^{n_{i01}}\{(1-\pi_{i})Sp_{i}\}^{n_{i% 00}}.$ (2)

The log-likelihood of $\theta_{i}$ is

$\displaystyle\log L(\theta_{i}|Data)=n_{i11}\{\log(\pi_{i})+\log(Se_{i})\}+n_{% i10}\{\log(1-\pi_{i})+\log(1-Sp_{i})\}$ $\displaystyle\quad∼{}+n_{i01}\{\log(\pi_{i})+\log(1-Se_{i})\}+n_{i00}\{\log(1-% \pi_{i})+\log(Sp_{i})\}$ (3)

Let $\{\theta_{i}=\theta\}$ . Assuming independence among studies conditional on $\theta_{i}$ , the total log likelihood of $\theta$ is:

$\displaystyle\log L(\theta|Data)=\sum_{i=1}^{N}{\log(L(\theta_{i}|Data)}.$ (4)

4. NLMIXED approach for estimating parameters

To model study-specific covariate effects on $(\pi_{i},Se_{i},Sp_{i})$ , we can replace the fixed effects $(\eta,\alpha,\beta)$ by ( $\mathbf{X_{i}}\eta,$ $\mathbf{Z_{i}}\alpha,\mathbf{W_{i}}\beta$ ) where $\bf X_{i}$ , $\bf Z_{i}$ and $\bf W_{i}$ are (possibly overlapping) associated vectors of covariates.

Akaike’s Information Criterion (AIC) (Burnham & Anderson, 1998) is used for checking goodness of fit. Smaller value of AIC results into better goodness-of-fit. Trivariate Generalized Non Linear Mixed Model (TGLMM) can be fitted by using statistical softwares such as SAS, SPLUS, R or STATA. This has been implemented through NLMIXED approach in SAS by using adaptive Gaussian quadrature to approximate the likelihood integrated over the random effects and maximizing the approximate likelihood by dual quasi-Newton optimization technique (Pinheiro & Bates, 1995). Moreover, this approach computes the population estimates of the back-transformed parameters of interest including the median sensitivity and specificity by using delta method. The predicted values of disease prevalences, sensitivities and specificities, positive & negative predictive values and positive & negative likelihood ratios are computed using empirical-Bayes estimates of the random effects. Since the distribution of these parameters is generally skewed, therefore medians are used instead of means.

The NLMIXED procedure provides estimates of median prevalence, $S e$ , $S p$ , PPV, NPV, $LR+$ , $LR-$ .

5. Data example

To illustrate the trivariate generalized linear mixed effects model discussed in this article, we apply them to meta-analysis data sets discussed in Gould et al. (2001).

5.1 Diagnostic accuracy of FDG-PET for malignant focal pulmonary lesions

Gould et al. (2001) presented 40 studies estimating the diagnostic accuracy of positron emission tomography (PET) with the glucose analog 18-fluorodexoxyglucose (FDG) of pulmonary lesions to identify malignant focal pulmonary nodules and mass lesions. Among the 40 studies, six studies did not report specificity and three studies examined FDG imaging with a modified gamma camera. To illustrate and compare different models on sensitivity, specificity and disease prevalence for FDG-PET, we have excluded these nine studies.

We fitted the TGLMM on the data of 31 studies on the diagnostic accuracy of FDG-PET of pulmonary nodules and mass lesions. We assumed trivariate normal distribution of ( $Se_{i}$ , $Sp_{i}$ , $\pi_{i}$ ) on the transformed scale using the logit, probit and complementary log-log transformations. Since the complementary log-log transformation is asymmetrical, we also fitted the TGLMM for $(Se_{i},1-Sp_{i},\pi_{i})$ , $(Se_{i},Sp_{i},\pi_{i})$ , $(1-Se_{i},Sp_{i},\pi_{i})$ , $(1-Se_{i},1-Sp_{i},\pi_{i})$ , $(Se_{i},1-Sp_{i},1-\pi_{i})$ , $(Se_{i},Sp_{i},1-\pi_{i})$ , $(1-Se_{i},Sp_{i},1-\pi_{i})$ , and $(1-Se_{i},1-Sp_{i},1-\pi_{i})$ .

Table 2
Summary of parameter estimates with standard errors for saturated model

	Logit	Probit	Complementary log-log
	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, 1-Sp, $\pi$ )	(1-Se, 1-Sp, $\pi$ )	(1-Se, Sp, $\pi$ )	(Se, Sp, $1-\pi$ )	(1-Se, Sp, $1-\pi$ )	(1-Se, 1-Sp, $1-\pi$ )	(Se, 1-Sp, $1-\pi$ )
$\eta_{0}$	1.0324	0.6242	0.2627	0.2628	0.2632	0.2633	$-$ 1.2046	$-$ 1.2054	$-$ 1.2055	$-$ 1.2047
	(0.1199)	(0.0686)	(0.0614)	(0.0612)	(0.0613)	(0.0614)	(0.1048)	(0.1047)	(0.1044)	(0.1043)
$\alpha_{0}$	3.8615	2.0589	1.3743	1.3730	$-$ 3.8101	$-$ 3.8019	1.3729	$-$ 3.7955	$-$ 3.8043	1.3719
	(0.4256)	(0.2088)	(0.1575)	(0.1574)	(0.3785)	(0.3740)	(0.1569)	(0.3715)	(0.3763)	(0.1570)
$\beta_{0}$	1.1407	0.6836	0.3146	$-$ 1.3256	$-$ 1.3256	0.3149	0.3008	0.3012	$-$ 1.3094	$-$ 1.3098
	(0.2106)	(0.1234)	(0.1125)	(0.1787)	(0.1780)	(0.1118)	(0.1173)	(0.1166)	(0.1796)	(0.1803)
$\rho_{\epsilon\nu}$	$-$ 0.07176	$-$ 0.0377	$-$ 0.0099	0.0687	$-$ 0.0633	0.0143	1.0000	1.0000	$-$ 1.0000	$-$ 1.0000
	(0.2449)	(0.2404)	(0.2339)	(0.2456)	(0.2481)	(0.2430)	(0.3372)	(0.3380)	(0.3303)	(0.3275)
$\rho_{\mu\nu}$	$-$ 0.1554	$-$ 0.1459	$-$ 0.1178	$-$ 0.1172	0.1723	0.1782	$-$ 0.0123	0.4505	0.4673	0.0802
	(0.2217)	(0.2255)	(0.2298)	(0.2299)	(0.2351)	(0.2380)	(0.2294)	(0.2157)	(0.2148)	(0.2438)
$-$ 2logL	4819.2	4819.4	4823.8	4818.2	4812.1	4818.7	4828.5	4823.4	4834.8	4823.0
AIC	4837.2	4837.4	4841.8	4836.2	4830.1	4836.7	4846.5	4841.4	4816.8	4841.0

Table 3

Summary of disease prevalance, sensitivities, specificities and predictive values with standard errors for saturated model

	Logit	Probit	Complementary log-log
	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, 1-Sp, $\pi$ )	(1-Se, 1-Sp, $\pi$ )	(1-Se, Sp, $\pi$ )	(Se, Sp, $1-\pi$ )	(1-Se, Sp, $1-\pi$ )	(1-Se, 1-Sp, $1-\pi$ )	(Se, 1-Sp, $1-\pi$ )
DP	0.7374	0.7378	0.7276	0.7276	0.7278	0.7278	0.2590	0.2589	0.2588	0.2590
	(0.0232)	(0.0225)	(0.0225)	(0.0217)	(0.0217)	(0.0217)	(0.0232)	(0.0232)	(0.0231)	(0.0371)
Se	0.9794	0.9802	0.9808	0.9807	0.02190	0.02208	0.9807	0.0222	0.0220	0.9806
	(0.0085)	(0.0100)	(0.0119)	(0.0120)	(0.0082)	(0.0081)	(0.0119)	(0.0081)	(0.0081)	(0.0120)
Sp	0.7578	0.7529	0.7458	0.2333	0.2333	0.7459	0.7410	0.7411	0.2366	0.2365
	(0.0386)	(0.0391)	(0.0391)	(0.0364)	(0.0362)	(0.0389)	(0.0410)	(0.0408)	(0.0370)	(0.0371)
PPV	0.9191	0.9754	0.9115	0.7736	0.0709	0.1886	0.5697	0.0291	0.0099	0.3099
	(0.0086)	(0.0043)	(0.0092)	(0.0254)	(0.0271)	(0.0615)	(0.0632)	(0.0124)	(0.0036)	(0.0208)
NPV	0.9291	0.9975	0.9356	0.8189	0.0819	0.2220	0.9910	0.6846	0.4093	0.9721
	(0.0300)	(0.0041)	(0.0390)	(0.0963)	(0.0087)	(0.0257)	(0.0055)	(0.0202)	(0.0615)	(0.0181)
LR $+$	4.0438	3.9666	1.0756	1.0252	2.2757	4.8491	0.9254	4.7773	2.2811	0.9746
	(0.6444)	(0.6278)	(0.0485)	(0.0156)	(0.0617)	(0.5902)	(0.0478)	(0.5956)	(0.0636)	(0.0156)
LR $-$	0.02719	$-$ 0.3020	$-$ 0.02576	$-$ 0.08278	$-$ 4.1925	$-$ 1.3110	0.0260	$-$ 1.3193	$-$ 4.1333	0.0819
	(0.0113)	(0.0693)	(0.0160)	(0.0537)	(0.6507)	(0.0694)	(0.0161)	(0.0736)	(0.6455)	(0.0531)

Table 4

Summary of parameter estimates with standard errors for the partially reduced model

	Logit	Probit	Complementary log-log
	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, 1-Sp, $\pi$ )	(1-Se, 1-Sp, $\pi$ )	(1-Se, Sp, $\pi$ )	(Se, Sp, $1-\pi$ )	(1-Se, Sp, $1-\pi$ )	(1-Se, 1-Sp, $1-\pi$ )	(Se, 1-Sp, $1-\pi$ )
$\eta_{0}$	1.0412	0.6308	0.2692	0.2692	0.2695	0.2695	$-$ 1.2121	$-$ 1.2123	$-$ 1.2125 $-$ 1.2121
	(0.1172)	(0.0678)	(0.0614)	(0.0614)	(0.0614)	(0.0614)	(0.1018)	(0.1016)	(0.1018)	(0.1018)
$\alpha_{0}$	3.8628	2.0557	1.3720	1.3724	$-$ 3.8090	$-$ 3.7933	1.3719	$-$ 3.7920	$-$ 3.8074	1.3723
	(0.4242)	(0.2065)	(0.1565)	(0.1568)	(0.3759)	(0.3692)	(0.1564)	(0.3685)	(0.3750)	(0.1567)
$\beta_{0}$	1.2766	0.7622	0.3879	$-$ 1.4231	$-$ 1.4230	0.3880	0.3879	0.3880	$-$ 1.4230	$-$ 1.4231
	(0.1960)	(0.1143)	(0.09945)	(0.1782)	(0.1781)	(0.09912)	(0.0994)	(0.09912)	(0.1781)	(0.1782)
$\rho_{\epsilon\mu}$	$-$ 0.0002	$-$ 0.0011	$-$ 0.0004	0.0013	0.0001	0	0.0721	0.0004	0.0008	0.0721
	(0)	(0)	(0)	(0)	(0)	(0)	(0.2160)	(0)	(0)	(0.2160)
$\rho_{\epsilon\nu}$	$-$ 0.0002	$-$ 0.0011	$-$ 0.0005	0.0013	0.0001	0	0.0189	0.0004	0.0008	0.0087
	(0)	(0)	(0)	(0)	(0)	(0)	(0)	(0)	(0)	(0)
$\rho_{\mu\nu}$	$-$ 0.1144	$-$ 0.1012	$-$ 0.0720	$-$ 0.0721	0.1202	0.1202	0.0189	$-$ 0.1159	$-$ 0.1155	0.0087
	(0.2234)	(0.2256)	(0.2274)	(0.2273)	(0.2324)	(0.2350)	(0)	(0.2238)	(0.2218)	(0)
$-$ 2logL	4838.7	4838.2	4840.9	4835.8	4830.0	4836.2	4847.5	4842.8	4836.2	4842.4
AIC	4856.7	4856.2	4858.9	4853.8	4848.0	4854.2	4865.5	4860.8	4854.6	4860.4

Table 5

Summary of disease prevalance, sensitivities, specificities and predictive values with standard errors for the partially reduced model

	Logit	Probit	Complementary log-log
	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, 1-Sp, $\pi$ )	(1-Se, 1-Sp, $\pi$ )	(1-Se, Sp, $\pi$ )	(Se, Sp, $1-\pi$ )	(1-Se, Sp, $1-\pi$ )	(1-Se, 1-Sp, $1-\pi$ )	(Se, 1-Sp, $1-\pi$ )
DP	0.7391	0.7359	0.7299	0.72999	0.7300	0.7300	0.2574	0.2589	0.2588	0.2574
	(0.0226)	(0.0221)	(0.0119)	(0.0217)	(0.0217)	(0.0217)	(0.0224)	(0.0232)	(0.0231)	(0.0224)
Se	0.9794	0.9801	0.9806	0.9806	0.0219	0.0222	0.9806	0.0222	0.0220	0.9806
	(0.0085)	(0.0099)	(0.0119)	(0.0119)	(0.0081)	(0.0081)	(0.0119)	(0.0081)	(0.0081)	(0.0119)
Sp	0.7819	0.7770	0.7710	0.2141	0.2141	0.7710	0.7710	0.7411	0.2366	0.2141
	(0.0334)	(0.0341)	(0.0335)	(0.0337)	(0.0337)	(0.0334)	(0.0335)	(0.0408)	(0.0370)	(0.0337)
PPV	0.9245	0.9793	0.9204	0.7713	0.0701	0.2082	0.5974	0.3019	0.0095	0.0326
	(0.0133)	(0.0058)	(0.0134)	(0.0208)	(0.0259)	(0.0686)	(0.0454)	(0.0266)	(0.0036)	(0.0126)
NPV	1.0000	0.9975	0.9364	0.8037	0.0749	0.2258	0.9914	0.9696	0.3873	0.6947
	(0.0073)	(0.0041)	(0.0377)	(0.1031)	(0.0133)	(0.0206)	(0.0053)	(0.0189)	(0.0467)	(0.0267)
LR+	4.4900	4.3959	0.9154	0.9754	2.2446	5.2694	0.9153	4.7773	2.2811	0.9754
	(0.6890)	(0.6741)	(0.0536)	(0.0152)	(0.0544)	(0.6248)	(0.0536)	(0.5956)	(0.0636)	(0.0152)
LR-	0.0263	$-$ 0.2613	0.0251	0.0904	$-$ 4.5674	$-$ 1.2681	0.0251	$-$ 1.3193	$-$ 4.1333	0.0904
	(0.0109)	(0.0568)	(0.0155)	(0.0577)	(0.7203)	(0.0560)	(0.0155)	(0.0736)	(0.6455)	(0.0576)

Table 6

Summary of parameter estimates with standard errors for the reduced model with zero correlation

	Logit	Probit	Complementary log-log
	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, 1-Sp, $\pi$ )	(1-Se, 1-Sp, $\pi$ )	(1-Se, Sp, $\pi$ )	(Se, Sp, $1-\pi$ )	(1-Se, Sp, $1-\pi$ )	(1-Se, 1-Sp, $1-\pi$ )	(Se, 1-Sp, $1-\pi$ )
$\eta_{0}$	1.0399	0.6302	0.2687	0.2687	0.2687	0.2687	$-$ 1.2114	$-$ 1.2214	$-$ 1.2112	$-$ 1.2114
	(0.1172)	(0.0678)	(0.0615)	(0.0615)	(0.0615)	(0.0615)	(0.1017)	(0.1017)	(0.1015)	(0.1017)
$\alpha_{0}$	3.8354	2.0439	1.3670	1.3673	$-$ 3.7847	$-$ 3.7692	1.3670	$-$ 3.7848	$-$ 3.7692	1.3673
	(0.4120)	(0.2007)	(0.1543)	(0.1545)	(0.3657)	(0.3590)	(0.1543)	(0.3657)	(0.3590)	(0.1545)
$\beta_{0}$	1.2766	0.7622	0.3879	$-$ 1.4231	$-$ 1.4230	0.3880	0.3879	$-$ 1.4230	0.3880	$-$ 1.4231
	(0.1960)	(0.1143)	(0.0994)	(0.1782)	(0.1781)	(0.0991)	(0.0994)	(0.1781)	(0.0991)	(0.1782)
$\rho_{\epsilon\mu}$	0	0	0	0	0	0	0	0	0	0
$\rho_{\epsilon\nu}$	0	0	0	0	0	0	0	0	0	0
$\rho_{\mu\nu}$	0	0	0	0	0	0	0	0	0	0
$-$ 2logL	4839.0	4838.4	4841.0	4835.9	4830.2	4836.4	4847.6	4836.8	4843.1	4842.5
AIC	4851.0	4850.4	4853.0	4847.9	4842.2	4848.4	4859.6	4848.4	4855.1	4854.5

Table 7

Summary of disease prevalance, sensitivities, specificities and predictive values with standard errors for the reduced model with zero correlation

	Logit	Probit	Complementary log-log
	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, Sp, $\pi$ )	(Se, 1-Sp, $\pi$ )	(1-Se, 1-Sp, $\pi$ )	(1-Se, Sp, $\pi$ )	(Se, Sp, $1-\pi$ )	(1-Se, Sp, $1-\pi$ )	(1-Se, 1-Sp, $1-\pi$ )	(Se, 1-Sp, $1-\pi$ )
DP	0.7388	0.7357	0.7297	0.7297	0.7297	4848.4	4859.6	4848.4	4855.1	4854.5
	(0.0226)	(0.0222)	(0.0217)	(0.0217)	(0.0217)	(0.0217)	(0.0224)	(0.0224)	(0.0224)	(0.0224)
Se	0.9789	0.9795	0.9802	0.9803	0.0224	0.0228	0.9802	0.0224	0.0228	0.9803
	(0.0085)	(0.0099)	(0.0119)	(0.0119)	(0.0081)	(0.0080)	(0.0119)	(0.0081)	(0.0080)	(0.0119)
Sp	0.7819	0.7770	0.7710	0.2141	0.2141	0.7710	0.7710	0.7710	0.2141	0.2141
	(0.0334)	(0.0341)	(0.0335)	(0.0337)	(0.0337)	(0.0334)	(0.0335)	(0.0337)	(0.0337)	(0.0337)
PPV	0.9270	0.9793	0.9203	0.7710	0.0716	0.2119	0.5975	0.0334	0.0098	0.3020
	(0.0130)	(0.0058)	(0.0134)	(0.0210)	(0.0253)	(0.0666)	(0.0453)	(0.0129)	(0.0037)	(0.0265)
NPV	0.9290	0.9972	0.9352	0.8007	0.0750	0.2261	0.9912	0.6946	0.3871	0.9690
	(0.0278)	(0.0043)	(0.0373)	(0.1015)	(0.0133)	(0.0207)	(0.0054)	(0.0266)	(0.0466)	(0.0191)
LR+	4.4874	4.3933	0.9137	0.9749	$-$ 0.2439	$-$ 3.2671	0.9137	$-$ 3.2671	$-$ 0.2439	0.9749
	(0.6886)	(0.6737)	(0.0537)	(0.0152)	(0.0543)	(0.6244)	(0.0537)	(0.6244)	(0.0543)	(0.0152)
LR-	0.0270	$-$ 0.2606	0.02564	0.09222	+4.5649	1.2675	0.0256	1.2675	4.5649	0.0922
	(0.0109)	(0.0568)	(0.0155)	(0.0577)	(0.7199)	(0.0560)	(0.0155)	(0.0560)	(0.7199)	(0.0577)

Table 3 compares the regression parameter estimates, standard errors and the goodness of fit measure using Akaike’s Information Criterion (AIC) resulting from the trivariate random effects meta-analysis obtained from the logit, probit and complementary log log link models (saturated models). Table 4 compares the regression parameter estimates, standard errors and the goodness of fit measurement using Akaike’s Information Criterion (AIC) obtained from the partially reduced models with $(\rho_{\epsilon\mu},\rho_{\epsilon\nu})=(0,0)$ which is equivalent to independently fitting a univariate model for $\pi_{i}$ and a bivariate model for $(Se_{i},sp_{i})$ . Table 6 compares the regression parameter estimates, standard errors and the goodness of fit measurement using Akaike’s Information Criterion (AIC) obtained from the reduced models with zero correlations which are the same as independently fitting three separate univariate random effects models on $(\pi_{i},,Se_{i},Sp_{i})$ . Tables 3, 6 and 7 present the population estimates and standard errors of disease prevalence (DP), sensitivities (Se) and specificities (Sp), PPVs and NPVs and LR $+$ and LR $-$ for the saturated model, partially reduced model and the reduced model with zero correlations.

On the basis of results in Tables 3–7, it is concluded that

(iii) (i)

For saturated model, complementary log-log model (1-Se, 1-Sp, $\displaystyle 1-\pi$ ) provides the best fit as its AIC value is lower than the other models.

(ii)

For the partially reduced model and the reduced model with zero correlation, complementary log-log model (1-Se, 1-Sp, $\displaystyle\pi$ ) provides the best fit as its AIC value is lowest.

(iii)

Saturated model provides the best fit as its AIC is the lowest as compared to others and suggest that

–

There is perfect negative correlation between random effects of disease prevalence and specificity.

–

There is moderate positive correlation between random effects of sensitivity and specificity.

6. Simulations

We conduct simulation studies under two sets to evaluate the performance of trivariate generalized linear mixed model with complementary log log transformation. In first set of simulations, ( $\pi_{i}$ , $Se_{i}$ , $Sp_{i}$ ) is assumed to follow trivariate normally distribution with medians (0.25, 0.7, 0.9), correlation coefficients $(\rho_{\epsilon\mu},\rho_{\epsilon\nu},\rho_{\mu\nu})=$ (0, 0, 0.7), and standard deviations $(\sigma_{\epsilon}^{2},\sigma_{\mu}^{2},\sigma_{\nu}^{2})=(0.5,1.0,1.0)$ . In the second set of simulations, the same parameter values are used except $(\rho_{\epsilon\mu},\rho_{\epsilon\nu})=$ (0.7, 0.7). Five thousand meta-analysis data sets are simulated with 30 studies in each data set under each setting. For each set of simulations, 5000 replicates are generated. For each replicate, 30 studies each with 200 subjects are generated and analyzed by Eq. (1) and the trivariate model on ( $\pi_{i}$ , $Se_{i}$ , $Sp_{i}$ ) with constrains of correlation parameters $(\rho_{\epsilon\mu},\rho_{\epsilon\nu})=$ (0, 0) in the first set and the trivariate model on ( $\pi_{i}$ , $Se_{i}$ , $Sp_{i}$ ) without any constraints in the second set. $\pi_{i}$ , $Se_{i}$ and $Sp_{i}$ for each study were generated according to the trivariate assumption. True and false positives and true and false negatives are sampled from the multinomial distribution in Table 1. Percentage of bias and standard error of mean (SEM) for estimates of various indicators have been computed and reported in Table 8.

Table 8
Bias of median estimates and the standard errors for $(\pi_{i},Se_{i},Sp_{i})$

	$(\rho_{\epsilon\mu},\rho_{\epsilon\nu},\rho_{\mu\nu})=$ (0, 0, 0.7)		$(\rho_{\epsilon\mu},\rho_{\epsilon\nu},\rho_{\mu\nu})=$ (0.7, 0.7, 0.7)
	Bias (%)	SEM	Bias (%)	SEM
Disease prevalance	$-$ 0.13	0.19	$-$ 0.15	0.18
Sensitivity	$-$ 0.31	0.39	$-$ 0.30	0.39
Specificity	$-$ 0.32	0.48	$-$ 0.31	0.47
PPV	$-$ 0.34	0.34	$-$ 0.36	0.34
NPV	$-$ 0.15	0.26	$-$ 0.14	0.26

Table 8 shows that under considered scenarios, the estimated median disease prevalence, sensitivity, specificity, PPV and NPV are unbiased under the complementary log-log link function.

7. Conclusion

In this paper, we use the TGLMM approach to jointly model the sensitivities, specificities and disease prevalence and discuss the estimation of sensitivity, specificity, disease prevalence, positive(negative) predictive values and positive(negative) likelihood ratios in meta-analysis of diagnostic tests. Earlier these studies have been carried out by using logit link function. We explore the use of other link functions like probit and complementary log-log and compare our results with existing results on the basis of Akaike Information Criterion (AIC). We conclude through a practical illustration that complementary log-log link function results into better goodness of fit for saturated models, partially reduced models and the reduced models with zero correlations. Simulation studies are carried out under two sets to evaluate the performance of trivariate generalized non linear mixed model with complementary log log transformation. Under considered scenarios, the estimated median disease prevalence, sensitivity, specificity, PPV and NPV are unbiased under the complementary log-log link function. The results depict a better fit than the cases where logit and probit link functions are used.

Footnotes

Acknowledgments

The first author is grateful to Department of Science and Technology (DST), Goverment of India for providing financial assistance under INSPIRE for carrying out this work. All the authors also acknowledge the support provided by DST under PURSE grants.

References

Burnham

K. P.

, & Anderson

D. R.

(1998). Model selection and inference: A practical information-theoretic approach.

Chu

Nie

Cole

S. R.

, & Poole

(2009). Meta-analysis of diagnostic accuracy studies accounting for disease prevalence: Alternative parameterizations and model selection. Statistics in Medicine, 28(18), 2384-2399.

Chu

H. G. H.

(2009). Letter to the editor. Biostatistics, 10(1), 201-203.

Egger

Davey-Smith

, & Altman

(2008). Systematic reviews in health care: Meta-analysis in context. John Wiley & Sons.

Gould

M. K.

Maclean

C. C.

Kuschner

W. G.

Rydzak

C. E.

, & Owens

D. K.

(2001). Accuracy of positron emission tomography for diagnosis of pulmonary nodules and mass lesions: A meta-analysis. JAMA, 285(7), 914-924.

Fine

J. P.

, & Safdar

(2007). Prevalence-dependent diagnostic accuracy measures. Statistics in Medicine, 26(17), 3258-3273.

Macaskill

(2004). Empirical bayes estimates generated in a hierarchical summary roc analysis agreed closely with those of a full bayesian analysis. Journal of Clinical Epidemiology, 57(9), 925-932.

Mallett

Deeks

J. J.

Halligan

Hopewell

Cornelius

, and Altman

D. G.

(2006). Systematic reviews of diagnostic tests in cancer: Review of methods and reporting. BMJ, 333(7565), 413.

Pepe

M. S.

(2003). The statistical evaluation of medical tests for classification and prediction. Oxford University Press, USA.

10.

Pinheiro

J. C.

, & Bates

D. M.

(1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4(1), 12-35.

11.

Reitsma

J. B.

Glas

A. S.

Rutjes

A. W.

Scholten

R. J.

Bossuyt

P. M.

, and Zwinderman

A. H.

(2005). Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology, 58(10), 982-990.

12.

Riley

R. D.

Abrams

K. R.

Sutton

A. J.

Lambert

P. C.

, & Thompson

J. R.

(2007). Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Medical Research Methodology, 7(1), 1.

13.

Riley

R. D.

Thompson

J. R.

, & Abrams

K. R.

(2008). An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics, 9(1), 172-186.

14.

Rutter

C. M.

, & Gatsonis

C. A.

(2001). A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Statistics in Medicine, 20(19), 2865-2884.

15.

Song

Khan

K. S.

Dinnes

, & Sutton

A. J.

(2002). Asymmetric funnel plots and publication bias in meta-analyses of diagnostic accuracy. International Journal of Epidemiology, 31(1), 88-95.

16.

Houwelingen

H. C.

Arends

L. R.

, & Stijnen

(2002). Advanced methods in meta-analysis: Multivariate approach and meta-regression. Statistics in Medicine, 21(4), 589-624.

17.

Zhou

X. H.

McClish

D. K.

, & Obuchowski

N. A.

(2009). Statistical Methods in Diagnostic Medicine, volume 569. John Wiley & Sons.

18.

Zwinderman