Sure Joint Screening for High Dimensional Cox's Proportional Hazards Model Under the Case-Cohort Design

Abstract

This study develops a sure joint feature screening method for the case–cohort design with ultrahigh-dimensional covariates. Our method is based on a sparsity-restricted Cox proportional hazards model. An iterative reweighted hard thresholding algorithm is proposed to approximate the sparsity-restricted, pseudo-partial likelihood estimator for joint screening. We rigorously show that our method possesses the sure screening property, with the probability of retaining all relevant covariates tending to 1 as the sample size goes to infinity. Our simulation results demonstrate that the proposed procedure has substantially improved screening performance over some existing feature screening methods for the case–cohort design, especially when some covariates are jointly correlated, but marginally uncorrelated, with the event time outcome. A real data illustration is provided using breast cancer data with high-dimensional genomic covariates. We have implemented the proposed method using MATLAB and made it available to readers through GitHub.

1. INTRODUCTION

Sure feature screening is a popular dimension reduction technique for ultrahigh-dimensional problems with statistical guarantees (Fan and Lv, 2008). Among many of its applications, sure feature screening methods have been frequently used to rapidly reduce the ultrahigh-dimensional data to a more manageable reduced-dimension model and consequently improve the computational expediency, statistical accuracy, and algorithm stability of many variable selection, estimation, and inference methods for high-dimensional data. For instance, when coupled with a classical, regularized variable selection method, the resulting two-stage variable selection and estimation procedure is not only computationally more stable and faster but also more accurate for variable selection and parameter estimation (Fan and Lv, 2008; Liu et al, 2020).

Since the seminal work by Fan and Lv (2008), there have been extensive developments of feature screening procedures for a variety of model settings that include linear models (Fan and Lv, 2008), generalized linear models (Fan and Song, 2010; Xu and Chen, 2014), additive models (Fan et al, 2011), varying coefficient models (Fan et al, 2014; Liu et al, 2014), quantile models (He et al, 2013), and model-free screening (Li et al, 2012; Zhu et al, 2011) and for different data types such as survival data (Gorst-Rasmussen and Scheike, 2013; Liu et al, 2021; Liu et al, 2020; Song et al, 2014; Yang et al, 2016), missing data (Wang and Li, 2018), longitudinal data (Zhang et al, 2019), and case–cohort data (Zhang et al, 2021a).

We refer to studies by Liu et al (2015), He et al (2019), and Liu and Li (2020), among others, for some nice surveys and further references of recent developments on sure feature screening methods.

This study considers the problem of feature screening for the case–cohort design. The case–cohort design is a widely used design in epidemiological studies and disease prevention trials, with many appealing properties and utilities (Kalbfleisch and Lawless, 1988; Kulich and Lin, 2004; Prentice, 1986). In a case–cohort study, covariates are only measured for subjects in a randomly selected subcohort and for all cases in the full cohort. It is particularly useful when cases are rare and some covariates such as genomic information are expensive to obtain. It does not require matching and thus is easy to implement. It also has the advantage of enabling one to study several disease outcomes by using the same randomly selected subcohort.

It is worth noting that the case–cohort design also offers a useful tool for analyzing the time to a rare event with massive Electronic Health Record (EHR) data, for which fitting a Cox proportional hazards model can be costly and practically infeasible (Borgan et al, 2000; Kawaguchi et al, 2021). However, despite the extensive statistical literature on either sure feature screening or the case–cohort design, sure screening methods for the case–cohort design remain sparse and underdeveloped.

To the best of our knowledge, the only sure screening tool currently available for the case–cohort design was recently developed by Zhang et al (2021a), who derived a weighted sure independence screening (WSIS) method and an iterative weighted sure independence screening (IWSIS) method for a case–cohort design with ultrahigh-dimensional covariates.

However, as shown by the simulation results in Section 3, although WSIS works well for screening marginal effects, it often fails to retain covariates that are jointly correlated, but marginally uncorrelated, with the response. Its iterative version, IWSIS, may offer some limited improvements, but the overall performance is still unsatisfactory for joint screening. Therefore, there is an unmet need to develop improved sure joint screening methods for the case–cohort design.

The purpose of this study is to develop a sure joint feature screening method for the case–cohort design under a high-dimensional Cox proportional hazards model. Our screening procedure is based on a sparsity-restricted maximum pseudo-partial likelihood estimator (SMPLE), inspired by some recent work for the simple random sampling design (Liu et al, 2021; Liu et al, 2020; Xu and Chen, 2014; Yang et al, 2016).

We first rigorously establish that the SMPLE for the case–cohort design has a sure joint screening property in a sense that as the sample size increases, it retains all true active covariates with probability tending to 1. Then, because solving for the exact SMPLE is NP-hard, we propose a customized iterative reweighted hard thresholding (IWHT) algorithm to approximate the SMPLE and show that each IWHT update yields a sure joint screening procedure when a Least Absolute Shrinkage and Selection Operator (LASSO) initial estimate is employed.

The proposed SMPLE screening procedure is a natural extension of the procedure used by Yang et al (2016), from the simple random sampling design to the case–cohort design by incorporating a weight function $ρ_{j} (t)$ in the pseudo-partial likelihood [see Equation (3)]. However, nontrivial modifications are required in the technical proof of its sure screening property. For example, the proof of Theorem 1 requires establishing some concentration inequalities, as stated in Lemma 1 of Section 7, which is challenging for the case–cohort design because special considerations are needed to account for the weight in the pseudo-partial likelihood.

Additionally, we establish a new theoretical result for the Cox model under the case–cohort design: each LASSO-initiated IWHT update yields a sure joint screening procedure, which is practically highly desirable and has not been previously established by Yang et al (2016) for the Cox model under the simple random sampling design. Finally, we demonstrate through simulations that the proposed joint screening method can provide substantial improvements over WSIS and IWSIS for the case–cohort design, especially when some covariates are jointly correlated, but marginally uncorrelated, with the event time outcome.

The rest of the study is organized as follows. In Section 2, we describe our SMPLE joint screening method for the ultrahigh-dimensional Cox proportional hazards model for case–cohort data and state its sure screening properties. Section 3 presents simulation studies to assess the finite sample performance of our procedure in comparison with the existing WSIS and IWSIS methods for the case–cohort design. In Section 4, we illustrate our method using breast cancer data to identify relevant genomic features for predicting overall survival.

2. JOINT FEATURE SCREENING FOR COX'S MODEL UNDER THE CASE–COHORT DESIGN

2.1. Case–cohort design and preliminaries

Let $\tilde{T}$ denote a survival time of interest, C a censoring time, and $X = (X_{1}, \dots, X_{p}) \in R^{p}$ a p-dimensional vector of predictors. Define $T = min (\tilde{T}, C)$ and $δ = I (\tilde{T} \leq C)$ , where $I (\cdot)$ denotes the indicator function. Suppose there are n independent subjects in a full cohort and that the complete data would consist of n i.i.d. triplets $(X_{i}, T_{i}, δ_{i})$ , $i = 1, \dots, n$ .

However, under the case–cohort design, the covariate X is only collected from all cases (failures) and from a randomly selected subcohort of fixed size $ñ$ from the full cohort. Let $ξ_{i}$ denote the indicator of the ith subject being selected into the subcohort, and let $α = ñ ∕ n = P (ξ_{i} = 1) \in (0, 1]$ be the selection probability.

Assume the following Cox proportional hazards model (Cox, 1972) for $\tilde{T}$ : $λ (t | X) = λ_{0} (t) exp (β^{T} X),$ (1)

where $λ (t | X)$ is the conditional hazard function of $\tilde{T}$ given X, $β$ is a pvector of unknown coefficients, and $λ_{0} (t)$ is an unspecified baseline hazard function.

For the classic low-dimensional covariate setting with fixed p, $β$ can be estimated by the sparsity-restricted maximum pseudo-partial likelihood estimator (Kalbfleisch and Lawless, 1988).

where

is the logarithm of a pseudo-partial likelihood (Kalbfleisch and Lawless, 1988), $N_{i} (t) = I (T_{i} \leq t, δ_{i} = 1)$ , $Y_{i} (t) = I (T_{i} \geq t)$ , $τ$ is the time to the end of the study, and $ρ_{i} (t) = δ_{i} + (1 - δ_{i}) ξ_{i} ∕ \hat{α} (t)$ , with $\hat{α} (t) = \sum_{i = 1}^{n} (1 - δ_{i}) ξ_{i} Y_{i} (t) ∕ \sum_{i = 1}^{n} (1 - δ_{i}) Y_{i} (t)$ . It has been well established that $\tilde{β}$ is consistent and asymptotically normal (Kulich and Lin, 2004).

Below we develop a joint screening procedure for the high-dimensional setting $p ≫ n$ such that the screened model $ŝ$ satisfies the sure screening property $P (s^{*} \subset ŝ) \to 1$ as $n \to \infty$ , where $s^{*} = {j : β_{j} \neq 0}$ denotes the set of indices of the true active covariates.

2.2. Sure joint screening using a sparsity-restricted maximum pseudo-partial likelihood estimator

Define a sparsity-restricted maximum pseudo-partial likelihood estimator (SMPLE) of β by

where the cardinality of the true set $| s^{*} | = q < k$ . The next theorem states that the SMPLE has a sure screening property under mild regularity conditions. The proof is provided in Section 7.

Theorem 1. The sure screening property of SMPLE. Let $ŝ = {j : {\hat{β}}_{j} \neq 0}$ denote the screened model based on the SMPLE $\hat{β}$ . Assume that $log (p) = O (n^{d})$ for some $0 \leq d < 1$ . Then, under the regularity conditions (C1)–(C6) in Section 6, we have

Note that the optimization problem (4) is NP-hard and hence computationally infeasible in a high-dimensional setting. Instead of solving for the exact SMPLE $\hat{β}$ , below we propose an IWHT algorithm to approximate $\hat{β}$ .

IWHT algorithm:

Step 0. Let $β^{(0)}$ be an initial estimate.

Step t. For each $t = 1, 2, \dots$ , we update $β^{(t)}$ to $β^{(t + 1)}$ using the following two-step procedure.

(a) Define

β_{*}^{(t + 1)} = arg {max}_{γ} h (γ, β^{(t)}) s u b j e c t t o ∥ γ ∥_{0} \leq k

(5)

where

is a local quadratic approximation of $l (β)$ , $u > 0$ is a scaling parameter, and $W (β) = d i a g {- l'' (β)}$ is a diagonal matrix consisting the diagonal elements of $- l'' (β)$ . Let $γ^{(t + 1)} = β^{(t)} + u^{- 1} W^{- 1} (β^{(t)}) l' (β^{(t)})$ and $r_{j} = w_{j} γ_{j}^{2}$ , where w_j is the diagonal element of W and $j = 1, 2, \dots, p$ . It can be shown that

β_{*}^{(t + 1)} = γ^{(t + 1)} I (| r | \geq | r_{(k)} |),

(7)

where $| r_{(k)} |$ denotes the kth largest component of $| r |$ .

(b) Obtain an updated $β^{(t + 1)}$ by maximizing the pseudo-partial likelihood (3) with the selected variables of $β_{*}^{(t + 1)}$ .

To appreciate the above IWHT algorithm, one may regard $h (γ, β)$ in Equation (6) as an approximate quadratic Taylor series expansion of $l (β)$ , with $- l'' (β)$ being replaced by uW, where the latter is introduced to overcome some key shortcomings of the former such as being computationally costly and not invertible when $p > n$ . We also note that the solution in Equation (7) is a direct consequence of the following fact:

We next state some desirable properties of the IWHT in Theorems 2 and 3. The proofs are provided in Section 7.

Theorem 2. Let ${β^{(t)}}$ be the sequence defined by the algorithm. Assume that $u \geq {sup}_{β} λ_{max} (W^{- 1 ∕ 2} (β^{(t)}) {- l'' (β)} W^{- 1 ∕ 2} (β^{(t)}))$ , where $λ_{max} (A)$ denotes the largest eigenvalue of A. Then, $l (β^{(t)}) \leq l (β^{(t + 1)})$ for every t.

Theorem 3. The sure screening property of LASSO-initiated IWHT. Let $β^{(0)} = arg {max}_{β} {l (β) - n λ ∥ β ∥_{1}}$ be the LASSO initial estimator, where $λ$ satisfies $λ n^{1 ∕ 2 - d} \to \infty$ and $λ n^{τ_{1} + τ_{2}} \to 0$ . Then, under the regularity conditions (C1)–(C7) in Section 6, and $u > ξ n k$ for some $ξ > 0$ , we have

for any fixed $t \geq 0$ , where $s^{(t)} = {j : β_{j}^{(t)} \neq 0}$ .

3. NUMERICAL STUDIES

In this section, we present some simulations to evaluate the performance of the proposed SMPLE joint screening method and compare it with the WSIS and IWSIS methods by Zhang et al (2021a). For the IWSIS method, we follow the same setup used by Zhang et al (2021a), with $N = 2$ , $q_{1} = [k ∕ 2]$ , and $q_{2} = k - q_{1}$ .

In our simulation, we consider $n = 500$ and $p = 2000, 5000$ . The failure times were generated from the Cox proportional hazards model: $λ (t) = λ_{0} (t) exp (X^{T} β),$ where $λ_{0} (t) = 10$ and X is a p-dimensional covariate. The censoring times were generated from a uniform distribution on $[0, c_{0}]$ , where c₀ is chosen to achieve a desired censoring rate (CR). We set the CR to 80% as the CR is typically high in case–cohort studies. We considered two different case-to-noncase ratios, 1:1 and 1:2, by setting the selection probability $α$ to (1-CR)/CR and 2 (1-CR)/CR, respectively.

We considered the following three model settings:

Case 1: and $β_{s_{c}^{*}} = 0$ . The predictor X is multivariate normal $N (0, Σ)$ , where $Σ = (σ_{i j})$ , with $σ_{i i} = 1$ and $σ_{i j} = 0.15$ for $i, j \in s^{*}$ and $0.3$ for $i o r j \in s_{c}^{*}$ .

Case 2: and $β_{s_{c}^{*}} = 0$ . The predictor X is multivariate normal $N (0, Σ)$ , where $Σ = (σ_{i j})$ is compound symmetric, with $σ_{i j} = 0.4$ for $i \neq j$ and 1 for $i = j$ .

Case 3: The same as Case 1, except that and $β_{s_{c}^{*}} = 0$ .

It can be shown that in all three settings, $c o v (X_{4}, Y) = 0$ , thus X₄ is marginally uncorrelated with Y, but jointly correlated with Y. We note that the setting of Case 2 is similar to Case 1, except that Case 2 has stronger signals with larger coefficients for the active variables. Case 3 is similar to Case 1, but with both positive and negative marginally correlated coefficients.

The performance of the screening methods is evaluated and compared using P_j, the proportion that the jth predictor is selected in a given model size k in 100 Monte Carlo replications, and $P_{A l l}$ , the proportion that all active predictors are selected in a given model size k in 100 Monte Carlo replications.

We set $k = [5 n_{c}^{\frac{1}{5} - \frac{1}{500}}]$ , where n_c denotes the total number of cases; $β^{(0)}$ as the LASSO estimate defined in Theorem 3; and $u = 1$ and stop the IWHT iteration when $∥ β^{(t)} - β^{(t - 1)} ∥ < 1 0^{- 3}$ or the maximum number of iterations exceeds 10. The value of $k = [5 n_{c}^{1 ∕ 5 - 1 ∕ 500}]$ has been suggested in the case–cohort study literature (Ni et al, 2016; Zhang et al, 2021a) and satisfies the regularity condition (C2). It worked well for all scenarios in our simulations. The results are summarized in Table 1.

Table 1.
Simulated Probabilities P_j and $P_{A l l}$ of Retaining an Individual Variable X_j and All Four Active Variables ( $n = 500$ ; Each Entry Is Based on 100 Monte Carlo Replicates)

Model Case-to-noncase P_j $P_{A l l}$

Setting p Ratio Method X₁ X₂ X₃ X₄ All

Case 1 2000 1:1 SMPLE 0.58 0.65 0.59 0.75 0.21

WSIS 0.54 0.65 0.55 0.00 0.00

IWSIS 0.49 0.52 0.51 0.73 0.05

1:2 SMPLE 0.90 0.92 0.84 0.82 0.58

WSIS 0.79 0.80 0.78 0.00 0.00

IWSIS 0.80 0.81 0.80 0.77 0.36

5000 1:1 SMPLE 0.58 0.48 0.53 0.73 0.17

WSIS 0.56 0.53 0.46 0.00 0.00

IWSIS 0.47 0.48 0.39 0.66 0.04

1:2 SMPLE 0.79 0.79 0.79 0.81 0.47

WSIS 0.80 0.69 0.66 0.00 0.00

IWSIS 0.75 0.70 0.61 0.86 0.22

Case 2 2000 1:1 SMPLE 1.00 1.00 1.00 1.00 1.00

WSIS 1.00 1.00 1.00 0.00 0.00

IWSIS 0.99 1.00 0.99 1.00 0.98

1:2 SMPLE 1.00 1.00 1.00 1.00 1.00

WSIS 1.00 1.00 1.00 0.00 0.00

IWSIS 1.00 1.00 1.00 1.00 1.00

5000 1:1 SMPLE 1.00 1.00 1.00 1.00 1.00

WSIS 1.00 1.00 1.00 0.00 0.00

IWSIS 0.98 0.99 0.99 1.00 0.96

1:2 SMPLE 1.00 1.00 1.00 1.00 1.00

WSIS 1.00 1.00 1.00 0.00 0.00

IWSIS 1.00 1.00 1.00 1.00 1.00

Case 3 2000 1:1 SMPLE 1.00 1.00 1.00 0.81 0.81

WSIS 0.96 1.00 0.02 0.00 0.00

IWSIS 0.95 1.00 0.92 0.09 0.07

1:2 SMPLE 1.00 1.00 1.00 0.87 0.87

WSIS 0.98 1.00 0.02 0.00 0.00

IWSIS 0.99 1.00 1.00 0.12 0.12

5000 1:1 SMPLE 1.00 1.00 1.00 0.82 0.82

WSIS 0.86 1.00 0.00 0.00 0.00

IWSIS 0.82 1.00 0.93 0.05 0.01

1:2 SMPLE 1.00 1.00 1.00 0.81 0.81

WSIS 0.97 1.00 0.00 0.00 0.00

IWSIS 0.99 1.00 1.00 0.08 0.07

Model		Case-to-noncase		P_j				$P_{A l l}$
Case 1	2000	1:1	SMPLE	0.58	0.65	0.59	0.75	0.21
			WSIS	0.54	0.65	0.55	0.00	0.00
			IWSIS	0.49	0.52	0.51	0.73	0.05
		1:2	SMPLE	0.90	0.92	0.84	0.82	0.58
			WSIS	0.79	0.80	0.78	0.00	0.00
			IWSIS	0.80	0.81	0.80	0.77	0.36
	5000	1:1	SMPLE	0.58	0.48	0.53	0.73	0.17
			WSIS	0.56	0.53	0.46	0.00	0.00
			IWSIS	0.47	0.48	0.39	0.66	0.04
		1:2	SMPLE	0.79	0.79	0.79	0.81	0.47
			WSIS	0.80	0.69	0.66	0.00	0.00
			IWSIS	0.75	0.70	0.61	0.86	0.22
Case 2	2000	1:1	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	0.99	1.00	0.99	1.00	0.98
		1:2	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	1.00	1.00	1.00	1.00	1.00
	5000	1:1	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	0.98	0.99	0.99	1.00	0.96
		1:2	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	1.00	1.00	1.00	1.00	1.00
Case 3	2000	1:1	SMPLE	1.00	1.00	1.00	0.81	0.81
			WSIS	0.96	1.00	0.02	0.00	0.00
			IWSIS	0.95	1.00	0.92	0.09	0.07
		1:2	SMPLE	1.00	1.00	1.00	0.87	0.87
			WSIS	0.98	1.00	0.02	0.00	0.00
			IWSIS	0.99	1.00	1.00	0.12	0.12
	5000	1:1	SMPLE	1.00	1.00	1.00	0.82	0.82
			WSIS	0.86	1.00	0.00	0.00	0.00
			IWSIS	0.82	1.00	0.93	0.05	0.01
		1:2	SMPLE	1.00	1.00	1.00	0.81	0.81
			WSIS	0.97	1.00	0.00	0.00	0.00
			IWSIS	0.99	1.00	1.00	0.08	0.07

IWSIS, iterative weighted sure independence screening; SMPLE, sparsity-restricted maximum pseudo-partial likelihood estimator; WSIS, weighted sure independence screening.

It is observed from Table 1 that the proposed SMPLE outperforms WSIS and IWSIS, with a much higher probability $P_{A l l}$ of retaining all active variables across all scenarios. Under all three cases, the marginal method, WSIS, has completely failed to retain X₄ that is jointly correlated, but marginally uncorrelated, with the response. Under Case 1, its iterated version, IWSIS, has increased the chance of selecting X₄, but at the cost of lowering the probabilities, P₁, P₂, and P₃, of selecting other marginal effects, which subsequently resulted in a low probability $P_{A l l}$ of retaining all important variables.

As expected, the performance of all methods would improve as the case-to-noncase ratio increases from 1:1 to 1:2. Furthermore, under Case 2, with much large signals, both SMPLE and IWSIS work well with $P_{A l l}$ close to 1. Finally, under Case 3, with both positive and negative correlated variables, the performance of SMPLE is quite satisfactory with $P_{A l l}$ greater than 80%, whereas IWSIS still performed poorly with $P_{A l l}$ ranging from 1% to 12%.

As suggested by a referee, we conducted more simulations to investigate the performance of the proposed method under a number of other scenarios, including a nonproportional hazards model, discrete covariates, weaker signals, highly correlated covariates, and higher CR. The results are consistent with those reported in Table 1. Details are presented in sections S1–S5 of Supplementary Data.

4. A REAL DATA EXAMPLE

As an illustration, we apply our methodology to a breast cancer dataset that contains 24,885 genes for 295 female patients between 1984 and 1995 at the Netherlands Cancer Institute (Annest et al, 2009). After an initial screening by the Rosetta error model (Annest et al, 2009) and discarding patients with missing values, we used 289 patients and 4919 genes for our analysis. The overall CR is 76%, and we set the case-to-noncase ratio of 1:1.

First, we applied the SMPLE, WSIS, and IWSIS methods to perform screening for important genes related to overall survival. All predictors were standardized with mean zero and variance 1. We set $k = [5 n_{c}^{\frac{1}{5} - \frac{1}{500}}] = 11$ . The screened genes are listed in Table 2. It is worth noting that NM.003258, selected by both SMPLE and WSIS, is known to be related to the organization of the human TK1 gene and the position of the protein functional domains (Gilles et al, 2003).

Table 2.
Genes Selected by the Sparsity-Restricted Maximum Pseudo-Partial Likelihood Estimator, Weighted Sure Independence Screening, and Iterative Weighted Sure Independence Screening Methods for Breast Cancer Data

SMPLE WSIS IWSIS

Contig56759.RC NM.003258 NM.004095

NM.003258 NM.004095 Contig58214.RC

NM.020142 Contig31288.RC Contig52095.RC

NM.004487 NM.003920 NM.004701

NM.004726 NM.004701 NM.004774

Contig1982.RC M96577 NM.006579

NM.014583 NM.005804 Contig3359.RC

NM.006787 NM.006579 Contig56843.RC

AK002088 Contig56843.RC NM.018219

AF161417 NM.001168 NM.001168

NM.000146 NM.002266 Contig58212.RC

SMPLE	WSIS	IWSIS
Contig56759.RC	NM.003258	NM.004095
NM.003258	NM.004095	Contig58214.RC
NM.020142	Contig31288.RC	Contig52095.RC
NM.004487	NM.003920	NM.004701
NM.004726	NM.004701	NM.004774
Contig1982.RC	M96577	NM.006579
NM.014583	NM.005804	Contig3359.RC
NM.006787	NM.006579	Contig56843.RC
AK002088	Contig56843.RC	NM.018219
AF161417	NM.001168	NM.001168
NM.000146	NM.002266	Contig58212.RC

Second, we further performed variable selection with Smoothly Clipped Absolute Deviation (SCAD) penalty for the models obtained by each screening method. The selected genes, along with the degree of freedom, and Bayesian Information Criterion (BIC) scores of the resulting model are summarized in Table 3. It is seen that SMPLE-SCAD yielded the most favorable model with the lowest BIC score compared with WSIS-SCAD and IWSIS-SCAD. We also note that NM.003258 is retained by SMPLE, but not by the other methods.

Table 3.

Genes Selected by Applying Smoothly Clipped Absolute Deviation (SCAD) After Screening for Breast Cancer Data

	SMPLE-SCAD	WSIS-SCAD	IWSIS-SCAD
	Contig56759.RC	NM.004095	NM.004095
	NM.003258	NM.006579	Contig52095.RC
	NM.020142		NM.004701
	NM.004487		NM.018219
	Contig1982.RC
	AF161417
	NM.000146
DF	7	2	4
BIC	443.32	508.22	483.35

BIC, Bayesian information criterion; DF, degree of freedom.

We also point out that replacing the SCAD penalty with either LASSO or Minimax Concave Penalty (MCP) will lead to a similar conclusion (see Table S6 of Supplementary Data).

5. DISCUSSION

We have developed a sure joint screening procedure with rigorous theoretical justification for the ultrahigh-dimensional Cox proportional hazards model for the case–cohort design. The proposed procedure is based on an SMPLE and computationally efficient IWHT algorithm. Empirical studies show that the proposed SMPLE screening procedure generally outperforms the existing marginal screening method (WSIS) and its iterative version (IWSIS) (Zhang et al, 2021a) when some covariates are marginally uncorrelated, but jointly correlated, with the event time.

Although this study only considers time-independent covariates for simplicity, the proposed method can be easily extended to time-dependent covariates that are continuously measured over time. It is also worth noting that oftentimes, a time-dependent covariate is measured only intermittently and/or with measurement error, which can be handled using a joint model framework for longitudinal and time-to-event outcomes (Elashoff et al, 2017). It would be interesting to investigate if the proposed method can be extended to joint models in future research.

Furthermore, our joint screening method for the Cox proportional hazards model is developed for the case–cohort design with a simple, random subcohort sample. It would be useful to further extend this approach to the case–cohort design with other sampling schemes such as the stratified, simple random sampling (Borgan et al, 2000) and counter-matched design (Langholz and Borgan, 1995). It would also be of interest to extend this approach to other models such as Accelerated Failure Time (AFT) models for the case–cohort design (Chiou et al, 2014; Kong and Cai, 2009), other related designs such as the nested case–control design (Goldstein and Langholz, 1992), and other types of survival data such as left-truncated data (Volovics and van den Brandt, 1997).

Finally, it would be interesting to investigate if some existing model-free screening methods such as the sure independence screening procedure based on distance correlation (Li et al, 2012) and sufficient variable screening/selection (Yang et al, 2022; Yang et al, 2019) for the standard random sample setting can be extended to the case–cohort design.

6. REGULARITY CONDITIONS

The following notations and regularity conditions are needed to establish the sure screening properties of the proposed joint screening procedure.

For each n, we define

Regularity conditions:

(C1) $\int_{0}^{τ} λ_{0} (t) d t < \infty$ and $E (Y (τ)) > 0$ .

(C2) ${min}_{j \in s^{*}} | β_{j}^{*} | \geq ω_{1} n^{- τ_{1}} a n d q < k \leq ω_{2} n^{τ_{2}}$ for some positive constants $ω_{1}, ω_{2}$ and nonnegative constants $τ_{1}, τ_{2}$ satisfying $2 τ_{1} + 3 τ_{2} < 1$ and $4 τ_{2} < 1$ .

(C3) $log (p) = O (n^{d})$ for some $0 \leq d < (1 - 2 τ_{1} - 3 τ_{2}) ∕ 2$ .

(C4) There exists a positive constant c₁ such that for sufficiently large n, $λ_{min} (- n^{- 1} l'' (β_{s})) \geq c_{1}$ for $s \in S_{+}^{2 k}$ , where $λ_{min} (\cdot)$ denotes the smallest eigenvalue of a matrix and $S_{+}^{k} = {s : s^{*} \subset s; ∥ s ∥_{0} \leq k}$ denotes the collections of the overfitted models of size less than or equal to k.

(C5) There exist positive constants, c₂ and c₃, such that $| X_{i j} | \leq c_{2}$ and $| β^{* T} X_{i} | \leq c_{3}$ .

(C6) There exists a neighborhood $B_{n}$ of $β^{*}$ such that for all $β \in B_{n}$ and $t \in [0, τ]$ , $\partial s^{(0)} (β, t) ∕ \partial β = s^{(1)} (β, t)$ and $\partial^{2} s^{(0)} (β, t) ∕ \partial β \partial β^{T} = s^{(2)} (β, t)$ . The functions $s^{(k)} (β, t)$ ( $k = 0, 1, 2$ ) are continuous and bounded, and $s^{(0)} (β, t)$ is bounded away from zero.

(C7) There exists a positive constant $ν$ , such that for sufficiently large n,

δ^{T} (- l'' (β)) δ \geq ν n ∥ δ_{s^{*}} ∥_{2}^{2}

for any $δ \neq 0$ satisfying $∥ δ_{s_{c}^{*}} ∥_{1} \leq 3 ∥ δ_{s^{*}} ∥_{1}$ and for any $β$ within some neighborhood of $β^{*}$ , where for any p-dimensional vector $δ$ , $δ_{s}$ denotes the $| s |$ -dimensional vector confined to a submodel s.

Note that (C1) guarantees a finite, baseline cumulative hazard and a nonempty risk set at the end of the study. Conditions (C2)–(C5) are similar to those used by Xu and Chen (2014) for the sure screening property. Condition (C6) essentially requires $exp (β^{T} X_{i})$ to be integrable under a diverging dimension so that integration and differentiation with respect to $S^{(k)} (β, t)$ ( $k = 0, 1$ ) can be interchanged (Ni et al, 2016). Condition (C7) is needed for deriving an error bound for the LASSO estimator.

7. TECHNICAL PROOFS

Lemma 1. Let $S_{-}^{k} = {s : s^{*} ⁄ \subset s; ∥ s ∥_{0} \leq k}$ be the collections of underfitted models. Under conditions (C1)–(C6), there exist positive constants, $C, D$ , and K, such that

and

for any $s \in S_{-}^{k}$ , $s' = s \cup s^{*} \in S_{+}^{2 k}$ , $x > 0$ , and $j = 1, \dots, p$ .

Proof of Lemma 1. For simplicity, we only provide the proof of Equations (10) and (11). The other two inequalities follow similarly.

We first prove (10). Write $R_{j} = {sup}_{t} | S_{j}^{(1)} (β_{s'}^{*}, t) - s_{j}^{(1)} (β_{s'}^{*}, t) |$ . We adapt the arguments used by Lin and Lv (2013), and the main idea is to apply a functional Hoeffding-type inequality (Massart, 2000). Under condition (C5), it is clear that the class of functions ${Y (t) X_{j} exp (β_{s'}^{* T} X_{s'}) : t \in [0, τ]}$ has a bounded uniform entropy integral. By lemma 19.38 of van der Vaart (1998), we have $E R_{j} \leq C n^{- 1 ∕ 2}$ for some constant $C > 0$ . It then follows from theorem 9 of Massart (2000) that $P (R_{j} > C n^{- 1 ∕ 2} (1 + x)) \leq P (R_{j} > E R_{j} + C n^{- 1 ∕ 2} x) \leq exp (- K x^{2}) .$

Next, we prove (11). Rewrite

and we have

Similar to the proof of Equation (10), we have

and

which completes the proof of Equation (11).□

Proof of Theorem 1. Let ${\hat{β}}_{s}$ denote the estimate of $β$ based on the model s. Clearly, the result of the theorem holds if $P {ŝ \in S_{+}^{k}} \to 1$ . Hence, it is sufficient to show that as $n \to \infty$ ,

We now prove (12). Consider $∥ β_{s'} - β_{s'}^{*} ∥_{2} = ω_{1} n^{- τ_{1}}$ . By the Taylor expansion and condition (C4), we have

where ${\tilde{β}}_{s'}$ lies between $β_{s'}$ and $β_{s'}^{*}$ . Thus,

where $c = ω_{2}^{- 1 ∕ 2} c_{1} ω_{1} ∕ 2 \sqrt{2}$ . Note that

By theorem 3.1 of Bradic et al (2011), we have $P (| I_{j 1} | \geq \frac{c}{2} n^{1 - τ_{1} - \frac{1}{2} τ_{2}}) \leq b_{1} exp (- b_{2} n^{0.5 - τ_{1} - \frac{1}{2} τ_{2}}),$ (15)

where $b_{1} a n d b_{2}$ are some generic positive constants.

For $I_{j 2}$ , we have

We further note that

and

By (C6), $s^{(0)} (β, t)$ is bounded away from zero. Setting $U = {inf}_{t \in [0, τ]} s^{(0)} (β, t)$ , $Ω_{1} = {{inf}_{t \in [0, τ]} S^{(0)} (β, t) \geq \frac{U}{2}}$ , and $Ω_{2} = {{inf}_{t \in [0, τ]} {\tilde{S}}^{(0)} (β, t) \geq \frac{U}{2}}$ , we have $\begin{matrix} P (Ω_{1}^{c}) \leq P (| S^{(0)} (β, t) - s^{(0)} (β, t) | > \frac{U}{2}) \leq D exp (- K n) a n d \\ P (Ω_{2}^{c}) \leq P (| {\tilde{S}}^{(0)} (β, t) - s^{(0)} (β, t) | > \frac{U}{2}) \leq D exp (- K n) . \end{matrix}$

It follows immediately that $I_{j 21} \leq D exp (- K n^{1 - 2 τ_{1} - τ_{2}}) + D exp (- K n)$ and $I_{j 22} \leq D exp (- K n^{1 - 2 τ_{1} - τ_{2}}) + D exp (- K n)$ .

Combining (13) to (16) gives $P (l (β_{s'}) - l (β_{s'}^{*}) \geq 0) \leq 2 k b_{1} exp (- b_{2} n^{0.5 - τ_{1} - \frac{1}{2} τ_{2}}) .$

Hence,

for some generic positive constants $b_{3} a n d b_{4}$ . Because $l (β_{s'})$ is concave in $β_{s'}$ , the above result holds for any $β_{s'}$ such that $∥ β_{s'} - β_{s'}^{*} ∥_{2} \geq ω_{1} n^{- τ_{1}}$ .

For any $s \in S_{-}^{k}$ , let ${\overset{̌}{β}}_{s'}$ be ${\hat{β}}_{s}$ augmented with zeros corresponding to the elements in $s^{*} ∕ s$ . By condition (C2), we have ${\overset{̌}{β}}_{s'} - β_{s'}^{*} ∥_{2} \geq ∥ β_{s^{*} ∕ s} ∥ \geq ω_{1} n^{- τ_{1}}$ . Thus,

This completes the proof. □

Proof of Theorem 2. Note that for every $t \geq 0$ ,

where $\tilde{β}$ lies between $β^{(t)}$ and $β_{*}^{(t + 1)}$ , and the last inequality follows from the definition of $β^{(t + 1)}$ , which is a maximal pseudo-partial likelihood estimate. □

To prove Theorem 3, we first establish the following lemma.

Lemma 2. Let $β_{L} = arg {max}_{β} {l (β) - n λ ∥ β ∥_{1}}$ for some $λ > 0$ . Then, under the conditions of Theorem 3, we have $P (∥ β_{L} - β^{*} ∥ \leq 16 ν^{- 1} λ q) \to 1, a s n \to \infty,$ (19)

where the constant $ν$ is defined in condition (C7).

Proof of Lemma 2. By definition of $β_{L}$ , it is easy to see that $l (β_{L}) - l (β^{*}) \geq n λ ∥ β_{L} ∥_{1} - n λ ∥ β^{*} ∥_{1} .$

Let . Then, by the Taylor series expansion, we have $l (β_{L}) - l (β^{*}) = δ^{T} l' (β^{*}) + (1 ∕ 2) δ^{T} l'' (\tilde{β}) δ$

for some intermediate value $\tilde{β}$ between $β_{L}$ and $β^{*}$ . Hence, we have $δ^{T} (- \frac{1}{n} l'' (\tilde{β})) δ \leq - 2 λ ∥ β_{L} ∥_{1} + 2 λ ∥ β^{*} ∥_{1} + \frac{2}{n} | δ |^{T} | l' (β^{*}) | .$ (20)

We next establish an upper bound for $l' (β^{*})$ . Define

where for each $j \in {1, \dots, p}$ , $l'_{j} (β^{*})$ is the jth component of $l' (β^{*})$ . Note that $P (| l'_{j} (β^{*}) | > \frac{n λ}{2}) \leq c exp (- c' \sqrt{n} λ)$ , which can be obtained using similar arguments in dealing with Equation (13) for generic positive constants $c a n d c'$ . Thus, under conditions (C3) and $λ n^{(1 - 2 d) ∕ 2} \to \infty$ , we have

as $n \to \infty$ , where a is a generic positive constant. This implies that $P (A) \to 1$ and $∥ l'_{j} (β^{*}) ∥_{\infty} = O_{p} (n λ)$ . Clearly, under event $A$ , the left-hand side of Equation (20) is bounded by $δ^{T} (- \frac{1}{n} l'' (\tilde{β})) δ \leq - 2 λ ∥ β_{L} ∥_{1} + 2 λ ∥ β^{*} ∥_{1} + λ ∥ δ ∥_{1},$

which further implies that

Since $δ^{T} (- \frac{1}{n} l'' (\tilde{β})) δ \geq 0$ , inequality (21) implies that $∥ δ ∥_{1} \leq 4 ∥ δ_{s *} ∥_{1}$ . Subsequently, $∥ δ_{s_{c} *} ∥_{1} \leq 3 ∥ δ_{s *} ∥_{1}$ , which together with condition (C7) implies that for sufficiently large n, $δ^{T} (- \frac{1}{n} l'' (\tilde{β})) δ \geq ν ∥ δ_{s^{*}} ∥_{2}^{2} .$

Combining Cauchy inequality and Equation (21), we have $∥ δ_{s *} ∥_{1}^{2} \leq q ∥ δ_{s *} ∥_{2}^{2} \leq q ν^{- 1} [δ^{T} (- \frac{1}{n} l'' (\tilde{β})) δ] \leq 4 q ν^{- 1} λ ∥ δ_{s *} ∥_{1} .$

This leads to $∥ δ_{s *} ∥_{1} \leq 4 q ν^{- 1} λ$ . Subsequently, $∥ δ ∥_{1} = ∥ δ_{s_{c} *} ∥_{1} + ∥ δ_{s *} ∥_{1} \leq 4 ∥ δ_{s *} ∥_{1} \leq 16 q ν^{- 1} λ$

under event $A$ . Because we have shown that $P (A) \to 1$ , the lemma is therefore proved. □

Proof of Theorem 3. Let $w = {min}_{j \in s^{*}} | β_{j}^{*} |$ . Then, it is sufficient to show that $P (∥ β^{(t)} - β^{*} ∥_{\infty} < w ∕ 2) \to 1$ , which holds if $∥ β^{(t)} - β^{*} ∥_{\infty} = o_{p} (w)$ . Below, we prove it using mathematical induction.

When $t = 0$ , by Lemma 2, we have $P (∥ β^{(0)} - β^{*} ∥_{1} \leq 16 ν^{- 1} λ q) \to 1 .$

Under conditions $λ = o (n^{- τ_{1} - τ_{2}})$ , $q = o (n^{τ_{2}})$ , and $w^{- 1} = O (n^{τ_{1}})$ , we have $λ q = o (w)$ . Then, the desired result is obtained for $t = 0$ .

Suppose that $∥ β^{(t - 1)} - β^{*} ∥_{\infty} = o_{p} (w)$ . We next show that $∥ β^{(t)} - β^{*} ∥_{\infty} = o_{p} (w)$ . Recall that $β_{*}^{(t)} = H (γ^{(t)}, k)$ where $γ^{(t)} = β^{(t - 1)} + u^{- 1} W^{- 1} (β^{(t - 1)}) l' (β^{(t - 1)}),$ and H is the hard thresholding function. If $∥ γ^{(t)} - β^{*} ∥_{\infty} = o_{p} (w)$ , then $∥ γ_{s_{c}^{*}}^{(t)} ∥_{\infty} = o_{p} (w)$ and $∥ γ_{s^{*}}^{(t)} ∥_{\infty} = O_{p} (w)$ , where the latter is in a strict sense. Hence, components in $r_{s^{*}}^{(t)}$ are among the ones with the top k largest absolute values in probability. Note that $∥ γ^{(t)} - β^{*} ∥_{\infty} \leq ∥ β^{(t - 1)} - β^{*} ∥_{\infty} + \frac{1}{u} ∥ W^{- 1} (β^{(t - 1)}) l' (β^{(t - 1)}) ∥_{\infty} .$

Under the induction assumption, $∥ β^{(t - 1)} - β^{*} ∥_{\infty} = o_{p} (w)$ . Then, we have

Under the conditions of this theorem, $λ n^{τ_{1} + τ_{2}} \to 0$ and $u > ξ n k$ , one can show that $\frac{1}{u} ∥ W^{- 1} (β^{(t - 1)}) l' (β^{(t - 1)}) ∥_{\infty} = o_{p} (ω)$ using a technique similar to that used by Xu and Chen (2014). Hence, $∥ γ^{(t)} - β^{*} ∥_{\infty} = o_{p} (w)$ is proved.

Finally, from theorem 1 of Ni et al (2016) and condition (C2), we have $∥ β^{(t)} - β^{*} ∥_{\infty} = O_{p} (\sqrt{\frac{k}{n}}) = o_{p} (w)$ . This completes the proof of the theorem. □

Footnotes

AUTHOR DISCLOSURE STATEMENT

The authors declare they have no conflicting financial interests.

FUNDING INFORMATION

G.L.'s research was supported, in part, by the U.S. National Science Foundation (Grant No. 2205441) and the U.S. National Institutes of Health (Grant Nos. P30 CA-16042, UL1TR000124-02, and P50 CA211015). Y.L.'s research was supported by the National Natural Science Foundation of China (Grant No. 11801567).

SUPPLEMENTARY MATERIAL

References

Annest

, Bumgarner

, Raftery

, et al. Iterative Bayesian model averaging: A method for the application of survival analysis to high-dimensional microarray data. BMC Bioinform, 2009; 10(1):72.

Borgan

, Langholz

, Samuelsen

, et al. Exposure stratified case-cohort designs. Lifetime Data Anal, 2000; 6(1):39–58.

Bradic

, Fan

, Jiang

. Regularization for Cox's proportional hazards model with NP-dimensionality. Ann Stat, 2011; 39(6):3092–3120.

Chiou

, Kang

, Yan

. Fast accelerated failure time modeling for case-cohort data. Stat Comput, 2014; 24(4):559–568.

Cox

DR.

Regression models and life-tables. J R Stat Soc Ser B, 1972; 34:187–220.

Elashoff

, Li

Joint Modeling of Longitudinal and Time-to-Event Data, Volume 151 of Monographs on Statistics and Applied Probability. CRC Press, Boca Raton, FL; 2017.

Fan

, Feng

, Song

. Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc, 2011; 106:544–557.

Fan

, Lv

. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B, 2008; 70(5):849–911.

Fan

, Ma

, Dai

. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Am Stat Assoc, 2014; 109:1270–1284.

10.

Fan

, Song

. Sure independence screening in generalized linear models with NP-dimensionality. Ann Stat, 2010; 38:3567–3604.

11.

Gilles

, Romain

, Casellas

, et al. Mutation analysis in the coding sequence of thymidine kinase 1 in breast and colorectal cancer. Int J Biol Mark, 2003; 18(1):1–6.

12.

Goldstein

, Langholz

. Asymptotic theory for nested case-control sampling in the Cox regression model. Ann Stat, 1992; 20(4):1903–1928.

13.

Gorst-Rasmussen

, Scheike

. Independent screening for single-index hazard rate models with ultrahigh dimensional features. J R Stat Soc Ser B, 2013; 75(2):217–245.

14.

, Xu

, Kang

. A selective overview of feature screening methods with applications to neuroimaging data. Wiley Interdiscip Rev Comput Stat, 2019; 11(2):e1454.

15.

, Wang

, Hong

. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat, 2013; 41(1):342–369.

16.

Kalbfleisch

, Lawless

. Likelihood analysis of multi-state models for disease incidence and mortality. Stat Med, 1988; 7(1–2):149–160.

17.

Kawaguchi

, Shen

, Suchard

, et al. Scalable algorithms for large competing risks data. J Comput Graph Stat, 2021; 30(3):685–693.

18.

Kong

, Cai

. Case-cohort analysis with accelerated failure time model. Biometrics, 2009; 65(1):135–142.

19.

Kulich

, Lin

. Improving the efficiency of relative-risk estimation in case-cohort studies. J Amer Stat Assoc, 2004; 99(467):832–844.

20.

Langholz

, Borgan

. Counter-matching: A stratified nested case-control sampling method. Biometrika, 1995; 82(1):69–79.

21.

, Zhong

, Zhu

. Feature screening via distance correlation learning. J Am Stat Assoc, 2012; 107(499):1129–1139.

22.

Lin

, Lv,J. High-dimensional sparse additive hazards regression. J Am Stat Assoc, 2013; 108(501):247–264.

23.

Liu

, Li

, Wu

. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Am Stat Assoc, 2014; 109(505):266–274.

24.

Liu

, Zhong

, Li

. A selective overview of feature screening for ultrahigh-dimensional data. Sci China Math, 2015; 58(10):2033–2054.

25.

Liu

, Li

Variable selection and feature screening. Advanced Studies in Theoretical and Applied Econometrics, Vol. 52. Springer; 2020; pp. 293–326.

26.

Liu

, Chen

, Li

. A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates. Stat Methods Med Res, 2020; 29(6):1499–1513.

27.

Liu

, Xu

, Li

. Sure joint feature screening in nonparametric transformation model for right censored data. Canad J Stat, 2021; 49(2):549–565.

28.

Massart

About the constants in Talagrand's concentration inequalities for empirical processes. Ann Probab, 2000; 28(2):863–884.

29.

, Cai

, Zeng

. Variable selection for case-cohort studies with failure time outcome. Biometrika, 2016; 103(3):547–562.

30.

Prentice

RL.

A case-cohort design for epidemiological cohort studies and disease prevention trials. Biometrika, 1986; 73(1):1–11.

31.

Song

, Lu

, Ma

, et al. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014; 101(4):799–814.

32.

van der Vaart

. Asymptotic Statistics. Volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics, pages xvi +443. The Organization, Cambridge University Press, Cambridge; 1998.

33.

Volovics

, van den Brandt

. Methods for the analysis of case-cohort studies. Biom J, 1997; 39(2):195–214.

34.

Wang

, Li

. How to make model-free feature screening approaches for full data applicable to the case of missing response?. Scand J Stat, 2018; 45(2):324–346.

35.

, Chen

. The sparse mle for ultrahigh-dimensional feature screening. J Am Stat Assoc, 2014; 109(507):1257–1269.

36.

Yang

, Yin

, Zhang

. Sufficient variable selection using independence measures for continuous response. J Multivariate Anal, 2019; 173:480–493.

37.

Yang

, Wu

, Yin

. On sufficient variable screening using log odds ratio filter. Electron J Stat, 2022; 16(1):498–526.

38.

Yang

, Yu

, Li

, et al. Feature screening in ultrahigh dimensional Cox's model. Stat Sinica, 2016; 26(3):881–901.

39.

Zhang

, Zhou

, Liu

, et al. Feature screening for case-cohort studies with failure time outcome. Scand J Stat, 2021a;48(1):349–370.

40.

Zhang

, Zhao

, Li

, et al. Nonparametric independence screening for ultra-high dimensional generalized varying coefficient models with longitudinal data. J Multivariate Anal, 2019; 171:37–52.

41.

Zhu

L-P

, Li

, et al. Model-free feature screening for ultrahigh-dimensional data. J Am Stat Assoc, 2011; 106(496):1464–1475.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.16 MB

Model		Case-to-noncase		P_j				$P_{A l l}$
Setting	p	Ratio	Method	X₁	X₂	X₃	X₄	All
Case 1	2000	1:1	SMPLE	0.58	0.65	0.59	0.75	0.21
			WSIS	0.54	0.65	0.55	0.00	0.00
			IWSIS	0.49	0.52	0.51	0.73	0.05
		1:2	SMPLE	0.90	0.92	0.84	0.82	0.58
			WSIS	0.79	0.80	0.78	0.00	0.00
			IWSIS	0.80	0.81	0.80	0.77	0.36
	5000	1:1	SMPLE	0.58	0.48	0.53	0.73	0.17
			WSIS	0.56	0.53	0.46	0.00	0.00
			IWSIS	0.47	0.48	0.39	0.66	0.04
		1:2	SMPLE	0.79	0.79	0.79	0.81	0.47
			WSIS	0.80	0.69	0.66	0.00	0.00
			IWSIS	0.75	0.70	0.61	0.86	0.22
Case 2	2000	1:1	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	0.99	1.00	0.99	1.00	0.98
		1:2	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	1.00	1.00	1.00	1.00	1.00
	5000	1:1	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	0.98	0.99	0.99	1.00	0.96
		1:2	SMPLE	1.00	1.00	1.00	1.00	1.00
			WSIS	1.00	1.00	1.00	0.00	0.00
			IWSIS	1.00	1.00	1.00	1.00	1.00
Case 3	2000	1:1	SMPLE	1.00	1.00	1.00	0.81	0.81
			WSIS	0.96	1.00	0.02	0.00	0.00
			IWSIS	0.95	1.00	0.92	0.09	0.07
		1:2	SMPLE	1.00	1.00	1.00	0.87	0.87
			WSIS	0.98	1.00	0.02	0.00	0.00
			IWSIS	0.99	1.00	1.00	0.12	0.12
	5000	1:1	SMPLE	1.00	1.00	1.00	0.82	0.82
			WSIS	0.86	1.00	0.00	0.00	0.00
			IWSIS	0.82	1.00	0.93	0.05	0.01
		1:2	SMPLE	1.00	1.00	1.00	0.81	0.81
			WSIS	0.97	1.00	0.00	0.00	0.00
			IWSIS	0.99	1.00	1.00	0.08	0.07