A unified approach for subgroup identification and individualized treatment recommendation with applications to randomized control trials and observational studies

Abstract

Precision medicine is important in the new era of medical product development. It focuses on optimizing healthcare decision for each individual patient based on this subject’s context information. Traditional statistics methods for precision medicine and subgroup identification primarily focus on single treatment or two arm randomized control trials. Its has limited capability to handle observational studies where treatment assignments could depend on covariates. In this paper, we described the limitations of traditional subgroup identification methods, and propose a general framework which connects the subgroup identification methods and individualized treatment recommendation rules. The proposed framework is able to handle two or more than two treatments from both randomized control trials and observation studies. We implement our algorithm in C++, and connect it with R. The performance is evaluated by simulations, and we apply our method to a dataset from a diabetes study.

Keywords

Multiple treatments observational studies personalized medicine randomized control trials subgroup identification value function

1. Introduction

Responses to treatments can vary widely due to the heterogeneity in most patient populations. One treatment that works for a majority of individuals may not work for a subset of patients with certain characteristics. Thus, significant improvements in treating patients could potentially result from treating individuals based on this patient’s specific characteristics rather than a one-size-fits-all approach. Recently, precision medicine has attracted great attention in medical and biostatistics research. As quoted from an article in Pharmacogenomics: “therapy with the right drug at the right dose in the right patient” is a great description of how precision medicine will affect the future of treatment (Mancinelli et al., 2000).

With new treatments and novel technology available, precision medicine has become an important piece in the new era of medical product development. At the evolution of drug development, for most therapeutic areas, there are multiple treatments available for the same disease. For example, in treating Type II diabetes mellitus, there are different classes of oral antidiabetic agents (Metformin, Sulfonylurea, Thiazolidinedione, DPP-4, SGLT-2, etc.), and multiple classes of injectable treatments such as GLP-1 and Insulin (American Diabetes Association and others, 2014). In each class, there are multiple drugs, and each drug often has different doses. The various of choices of treatments with different doses for a single disease provide the possibility of personalized medicine. At the same time, the information revolution has allowed the collection of vast amounts of data. Healthcare claims databases and other types of electronic medical records now contain multiple years of medical information for millions of patients. With the advent of genomics and our deepening knowledge of translational medicine, we are able to better characterize a patient which opens great opportunities to develop personalized medicine.

There are notable examples of marketed compounds that make tailored therapeutics a reality. For instance, Tamoxifen used to be a drug commonly prescribed to women with ER+ breast cancer, but 65% of women initially taking it developed resistance. After some research, it was discovered that women with certain mutation in their CYP2D6 gene, a gene that encodes the metabolizing enzyme, were not able to efficiently break down Tamoxifen, making it an ineffective treatment for their cancer (Ellsworth et al., 2010). Since then, women are now genotyped for those specific mutations, so that immediately these women can have the most effective treatment therapy. Another example is that Trastuzumab (Herceptin ${}^{\circledR}$ ) is a monoclonal antibody drug that interferes with the HER2/neu receptor. Its main use is to treat certain breast cancers. This drug is only used if a patient’s cancer is tested for overexpression of the HER2/neu receptor (Telli et al., 2007). Scientists also discovered that metabolic factors may play a role in tailoring as with the recent understanding of how a genetic polymorphism in cytochrome P450 enzyme 2C19 affects the ability of patients to metabolize clopidogrel (Plavix ${}^{\circledR}$ ) to its active form, thereby detrimentally impacting platelet aggregation and clinical outcomes for this sub-group of patients (Shuldiner et al., 2009).

Despite these successful examples, there remain tremendous challenges in developing solutions for personalized medicine. Besides the medical, operational, ethical, and regulatory issues, there are considerable statistical complexity as well. Much has been written about the dangers of subgroup analyses (Brookes et al., 2004; Lagakos, 2006; Ruberg et al., 2010), and various authors have proposed guidelines for analysis and reporting on subgroups (Rothwell, 2005; Wang et al., 2007). In particular, the FDA has already started to take initiatives to integrate personalized medicine into their regulatory policies. They developed a report in October 2013 entitled, Paving the Way for Personalized Medicine: FDA’s role in a New Era of Medical Product Development, in which they outlined steps they would have to take to integrate genetic and biomarker information for clinical use and drug development. They determined that they would have to develop specific regulatory science standards, research methods, reference material and other tools in order to incorporate personalized medicine into their current regulatory practices.

There are many existing methods to evaluate heterogeneity. Traditionally, we often conduct the subgroup analysis in clinical trials. Statisticians test for differential treatment effects among subgroups of patients in a very straightforward way (i.e. by adding a treatment-by-subgroup interaction term to a model). Consistent recommendations are that a small number of subgroups be specified in advance, that interaction tests be used, that all sub-group analyses be reported (a priori and post hoc), that multiplicity adjustments be considered, that interpretation of findings should be viewed cautiously even in the presence of significant multiplicity-adjusted $p$ -values (Rothwell, 2005; Wang et al., 2007). However, despite our best scientific endeavors and preclinical models, most often we do not know all the potential effects of a new molecule or biologic. Under such very common circumstances, the traditional approach described above is not enough in searching for important factors or subgroups of patients.

Therefore, novel methodologies are developed for personalized medicine which are generally in 3 categories. The first approach focuses on treatment by subgroup interaction detection. For example Su et al. (2009) and Lipkovich et al. (2011) developed the interaction tree methods by building splitting rules based on covariate-treatment interaction tests. The second category of methods are two-step methods (Cai et al., 2011; Zhao et al., 2013; Foster et al., 2011; Faries & Obenchain, 2013). The first step is to estimate differential treatment effect of each individual patient measured by a score function, then use the score as responses to establish relationship with covariates as the second step. The third class of methods are based on value functions which evaluate the patients benefit. The optimal personalized treatment recommendation rule is based on maximizing a special value function (Qian & Murphy, 2011; Zhao et al., 2012; Zhang et al., 2012).

New methodologies greatly extend our ability to explore solutions for personalized medicine, but there are also some limitations of these methods. First, the concept of subgroups may look like intuitive, but there is no consistent analytic definition. Instead of providing analytic definition of subgroups, different researchers may use a surrogate to search “subgroups”, such as largest treatment difference (Lipkovich et al., 2011) or smallest $p$ -value of treatments by covariates interactions (Su et al., 2009). Those methods may not obtain the desired outcomes. For example, although algorithms maximize the covariate-treatment interactions at each level of their trees, final trees do not connect with any objective function. Thus, it is hard to define an optimal solution for patients. As consequences, people often struggle with whether to choose a large subgroup with a moderate treatment effect, or pick a small subgroup with a significant treatment benefit. Second, finding subgroups are rarely the ultimately goal of our research. For example, the answer of why we need subgroups is often for treatment assignments (i.e. assign right patients on the right treatments), and the purpose of assigning right patients on the right treatments is to maximize patient benefit. Similarly, the purpose of searching subgroups based on data from a phase II study is to design a phase III trial to maximize the probability of study success. Those examples illustrate that finding subgroup is often a mean to an end to maximize something. Then, it is natural to ask a further question on why we directly search a rule to maximize that thing instead of searching the subgroup as a proxy. The two-step methods focus on studies with two treatments, and their methods are not easy to be applied to multiple treatment situations. The value function approach provided a nice framework for personalized medicine. Although the current solutions based on $l_{1}$ penalized least squares (Qian & Murphy, 2011) or based on support vector machine algorithm (Zhao et al., 2012) provide accurate results in theory, they are not easy to be applied in clinical trial settings where simple decision rules are needed. A desired simple rule is well illustrated by the drug label for prasugrel (Effient ${}^{\circledR}$ ) relating to product safety warnings for patients with body weight $<$ 60 kg and age $\geqslant$ 75. These simple decision rules for personalized methods are important because they are easy to be implemented and tested in future clinical trials incorporated into the protocol by inclusion or exclusion criteria, and in practice clinicians also prefer to a simple rule for their decision making.

In addition, the current methods primarily focus on learning precision medicine solution based on randomized control trials (RCTs). However, there are growing needs to develop personalized medicine solution based on observational studies, electronic medical records, and insurance claim database. First, payers are not only interested in efficacy data from RCTs but also the more generalizable effectiveness results from real world evidence. Second, not all the treatments are compared in RCTs but this information is often available in real world data. Third, most RCTs are not powered for subgroup analysis, but the real world data typically has much larger sample size. However, the use of real world data adds the additional challenge of addressing the confounding present in such data as part of the personalized medicine solution.

The purpose of this paper is to develop a method meeting the needs for drug development. We may comprise certain theoretical properties with practical requirements. The prioritized needs are:

1.
The results are easy to interpret and easy to be implemented in clinical settings.
2.
The method is able to handle data from randomized control trials and also from observational studies.
3.
The method is capable to deal with multiple treatments.
4.
The framework is general enough to handle binary, continuous, time-to-event endpoints.

Therefore, we propose such a method that approximates a target function whose value directly reflects correct treatment assignment for patients. We show that our method naturally connects the subgroup identification and individualized treatment recommendation methods. The rest of this paper is organized as following, we propose our method in Section 2. Within this section, we start to introduce the general framework of individualized treatment recommendation, and then demonstrate the connection with subgroup identification. We present methods for both two treatment and multiple treatments. How to improve the numerical stability is also discussed. In Section 3, we introduce our search algorithm and demonstrate numerical results through multiple simulation studies. Real data analysis is conducted in Section 4. We close this paper with a discussion in Section 5.
2. Method

In this section, we introduce the framework of individualized treatment recommendation, connect it with subgroup identification problems for multiple treatment subgroup identification methods, and in the end numerical stability issue is discussed.

2.1 Individualized treatment recommendation

We have a random sample of size $N$ from a large population. For each unit $i$ in the sample, where $i=1,\ldots,N$ , let $T_{i}$ be the treatment assignment, $Y_{i}$ be the response, and $X_{i}$ be a vector of covariates. Let $(Y,T,X)$ be the generic random variable of $\{(Y_{i},T_{i},X_{i})\}$ , $\mathcal{P}$ be the distribution of $(Y,T,X)$ , and $E$ be the expectation with respect to $\mathcal{P}$ . For any given individualized treatment recommendation $r(\cdot)$ , which is a rule defining a treatment recommendation for each individual in a population, we let $\mathcal{P}^{r}$ be the distribution of $(Y,T,X)$ given that $T=r(X)$ .

The research question for individualized treatment recommendation or subgroup identification is only valid when multiple treatment options are available for the same subject. In other words, if only one treatment option is allowed or available for certain subjects, the optimal treatment is the only available treatment where the research question is trivial. Therefore, without loss of generality, our population space $\Omega_{x}$ is defined as $\Omega_{x}=\left\{x:P(T=t|X=x)\in(0,1),\forall t\in\mathcal{T}\right\}$ where $\mathcal{T}$ is a finite collection of treatment options. Since $d\mathcal{P}=p(y|x,t)p(t|x)p(x)$ and $d\mathcal{P}^{r}=p(y|x,t)I_{t=r(x)}p(x)$ , we have,

$\displaystyle\frac{d\mathcal{P}^{r}}{d\mathcal{P}}=\frac{I_{t=r(x)}}{p(t|x)}.$ (1)

The expected value of treatment benefit with respect to $r$ is,

$\displaystyle V(r)=E^{r}(Y)=\int Yd\mathcal{P}^{r}=\int Y\frac{d\mathcal{P}^{r% }}{d\mathcal{P}}d\mathcal{P}=E\left\{\frac{I_{T=r(X)}}{p(T|X)}Y\right\}.$ (2)

Our goal is to estimate $r_{o}$ , such that,

$\displaystyle r_{o}\in\arg\max_{r\in R}V(r),$ (3)

where $R$ is a collection of ways to assign treatments.

Figure 1.

Illustration of Individualized Treatment Recommendation. Each subject has one covariate from $-$ 10 to 10, and $Y$ is their response. A higher value of $Y$ means a better outcome. Each circle represents one subject receiving treatment, and each “X” represents one subject receiving control. The shaded area is the optimal treatment recommendation. By following the recommendation, each patient will receive the maximal benefit, then the total benefit for the whole patient population is maximized.

Figure 2.

Illustration of Individualized Treatment Recommendation. Each subject has one covariate from $-$ 10 to 10, and $Y$ is their response. A higher value of $Y$ means a better outcome. Each circle represents one subject receiving treatment, and each “X” represents one subject receiving control. The shaded area is the optimal treatment recommendation. In panel (a), patients on treatment arm have better outcome than the control arm regardless of their covariates. Obviously, the optimal treatment recommendation for each subject is to take treatment. In panel (b), we illustrate that we can adjust the outcomes from control group by adding a nontrivial benefit margin, then apply our ITR for the adjusted data.

Figure 1 is an intuitive explanation of Eq. (2). In this simple example, we have only one covariate $X$ . For each $X=x$ , there are two subjects with one on treatment and one on control. If $f_{t}(x)$ is the response function when subjects taking treatment, and $f_{c}(x)$ is the response function while subjects taking control. By definition Eq. (2), the optimal treatment recommendation is $r(x)=I\{f_{t}(x)>f_{c}(x)\}$ where $I\{\cdot\}$ is an indicator function, $I\{\cdot\}=1$ is to recommend patients to take treatment, and $I\{\cdot\}=0$ for control. Under this recommendation, each subject $x$ receives the benefit as $f(x)=\max\{f_{t}(x),f_{c}(x)\}$ . This example essentially illustrates the case for a randomized control trial with 1:1 randomization ratio. In reality, the randomization ratio may not be 1:1, and in observational study, the treatment assignment may depends on covariates. The denominator of Eq. (2) $P(T|X)$ takes these factors into account. In particular, when there are two treatments, $P(T|X)$ is the propensity score which has been widely used in casual inference (Rosenbaum & Rubin, 1983). Since $P(T|X)$ is in the denominator, Eq. (2) is also connected with the inverse probability weighting methods (Horvitz & Thompson, 1952; Robins & Rotnitzky, 1992). Therefore, subgroup identification and patient level predictions based on real world data using this method will be adjusted for biases from measured confounders.

Figure 2 illustrates another example common to placebo-controlled randomized clinical trials. In panel (a), subjects on treatment group have better outcome than those on the control arm regardless of their covariates. Obviously, the optimal treatment recommendation for each subject is to take treatment. However, to develop a new drug, we often look for a nontrivial benefit over the traditional therapy or placebo. Therefore, we can shift the outcomes from the control group by adding a nontrivial benefit margin as shown in panel (b). So that the ITR method can help us to identify the subjects with a pre-defined nontrivial benefit. This is important for payers since the new treatments usually have higher cost than generic drugs, and we look for the nontrivial benefit to justify the price. This is also helpful to design a future trial because, to increase the chance of success, it is important to choose an appropriate study population for which the new treatment is expected to have nontrivial overall benefits that compensate for its risks and/or costs. Figure 2 provides an intuitive example on how our ITR method could connect with subgroup identification methods.

Qian and Murphy (2011) propose Eq. (2), and solve it using a two-step procedure. In the first step, a linear model is applied to sufficiently approximate the conditional expectation, with the estimated rule derived via $l_{1}$ penalized least squares approach. Using a sparse $l_{1}$ penalty when modeling the conditional expectation, the method introduces parsimony and facilitates ease of interpretation. One problem with this approach is the mismatch between minimizing the prediction error and the goal of maximizing the value function (Murphy, 2005). A model which may not accurately predict the conditional expectation cannot guarantee a finding of the treatment strategy that yields the optimal value function. In addition, it suffers from overfitting problem when including all the interaction effects with high dimensional covariate space (Zhao & Zeng, 2013). Zhao et al. (2012) showed that solving Eq. (3) is equivalent to optimize a classification problem with a weighted 0-1 loss function, then they use a hinge loss function to approximate the 0-1 loss function, and develop algorithm based on support vector machine (SVM) techniques. They also proved that their solutions converge to the truth. However, in a finite sample, the optimizer from the hinge loss may not be the same as the solution from a 0-1 loss function. Furthermore, Foster et al. (2011) pointed out that instead of focusing on finding the optimal or largest subgroup with enhanced treatment effects, it may be more desirable to determine a sub-optimal subgroup from simple and interpretable predictive variables. However, Zhao et al. (2012)’s SVM based algorithm solves the optimization problem in a dual space which cannot select important variables directly, and results are difficult to be used to identify easy-to-interpret subgroups.

Therefore, these limitations motivate us to develop a simple yet direct approach. In this paper, we connect optimizing the value function Eq. (2) with subgroup identification, and provide algorithm directly searching all subgroups with rectangle shapes. Our goal is to generate easy to interpret subgroups, and the methods not only handle data from randomized control clinical trials but also observational studies.

2.2 Subgroup identification

One key challenge of existing subgroup identification methods is the lack of analytic definition of subgroups. It is important to have a clear objective on why we need subgroups. The purpose of subgroup assignment should maximize the defined objective function. Then, we can classify patients into different subgroups to maximize the objective function. Therefore, subgroup identification is a classification problem. However, many methods estimate the underlying functions first, then classifying the patients into different subgroups (Cai et al., 2011; Zhao et al., 2013; Foster et al., 2011; Faries & Obenchain, 2013). It is obvious that if we can estimate the underlying function correctly, we can do the classification. But the cost of estimating functions is high which often depend on strong modeling assumptions. Since our ultimate goal is classification, directly classification methods should be more efficient.

Our definition of subgroup is a way of treatment assignment which maximize the total patients benefit if it is followed by the entire patient population, which is consistent with Eq. (3).

To illustrate our point, we start with a simple case with binary treatment. Let $T$ indicate whether the treatment of interest was received, with $T=1$ indicating that subjects receive the active treatment, and $T=0$ for the control. Using the potential outcome notation, let $Y(0)$ denote the outcome under control, and $Y(1)$ the outcome under treatment. We observe $T$ and $Y$ , where $Y\equiv TY(1)+(1-T)Y(0)$ . Because,

$\displaystyle V(r)=E\left\{\frac{I_{T=r(X)}}{p(T|X)}Y\right\}$ $\displaystyle=E\left[E\left\{\left.\frac{I_{T=r(X)}}{p(T|X)}Y\right|T,X\right% \}\right]$ $\displaystyle=E\left[E\left\{\left.\frac{I_{r(X)=1}}{p(T=1|X)}Y\right|T=1,X% \right\}p(T=1|X)+E\left\{\left.\frac{I_{r(X)=0}}{p(T=0|X)}Y\right|T=0,X\right% \}p(T=0|X)\right]$ $\displaystyle=E\left[I_{r(X)=1}\left\{E(Y|T=1,X)-E(Y|T=0,X)\right\}\right]+E% \left\{E(Y|T=0,X)\right\},$

to maximize $V(r)$ with respect to $r$ , we have the optimizer as,

$r_{o}(X)=\begin{cases}1,&E(Y|T=1,X)>E(Y|T=0,X),\\ 0,&E(Y|T=1,X)\leqslant E(Y|T=0,X).\end{cases}$ (4)

The interpretation of Eq. (4) is straightforward which simply assigns a treatment to patients who can benefit more from it. In other words, a patient with a covariate vector $X$ is recommended to take treatment when $\{E(Y|T=1,X)-E(Y|T=0,X)\}>0$ . Equation (4) also connects our method with other personalized medicine methods. The contrast is a score function $\{E(Y|T=1,X)-E(Y|T=0,X)\}$ which is used by Cai et al. (2011) and Foster et al. (2011). Without loss of generality, in the context of personalized medicine, people often assume that responses $Y$ are from the following model

$\displaystyle Y=\beta_{0}+g(X)+Td(X)+\epsilon,$ (5)

where $\beta_{0}$ is the overall mean, $g(X)$ is a function of prognostic markers, and $d(X)$ is a function of predictive markers. Both $g(X)$ and $d(X)$ are centered to 0. Based on Eq. (4), it is easy to see that the optimal solution $r_{o}(X)$ $=$ $\,{\rm sign}\,\{d(X)\}$ . Therefore, our method also target on the treatment by covariate interactions which connects the interaction tree approaches (Su et al., 2009; Lipkovich et al., 2011).

In practice, we are not only interested in developing individualized treatment recommendation, but also interested in identifying subgroup of patients which are suitable for a certain treatment. The ideal subgroups are defined by simple and easy to interpret rules. Therefore, if our purpose is to provide treatment recommendation for a group of patients, i.e. $r(X)=1,\forall X\in A$ . We can relax the previous condition Eq. (4) to,

$r_{o}(X)=\begin{cases}1,&E\left\{E(Y|T=1,X)|X\in A\right\}>E\left\{E(Y|T=0,X)|% X\in A\right\},\\ 0,&E\left\{E(Y|T=1,X)|X\in A\right\}\leqslant E\left\{E(Y|T=0,X)|X\in A\right% \},\end{cases}$ (6)

which is equivalent to,

$r_{o}(X)=\begin{cases}1,&E\left\{{\displaystyle\frac{Y\cdot T}{P(T=1|X)}}|X\in A% \right\}>E\left\{{\displaystyle\frac{Y\cdot(1-T)}{P(T=0|X)}}|X\in A\right\},\\ 0,&E\left\{{\displaystyle\frac{Y\cdot T}{P(T=1|X)}}|X\in A\right\}\leqslant E% \left\{{\displaystyle\frac{Y\cdot(1-T)}{P(T=0|X)}}|X\in A\right\}.\end{cases}$ (7)

Therefore, our subgroup identification method for binary treatment situation is described by the following proposition.

.

Let,

$\displaystyle r_{o}^{A}=I\left[E\left\{\left.\frac{Y\cdot T}{P(T=1|X)}\right|X% \in A\right\}-E\left\{\left.\frac{Y\cdot(1-T)}{P(T=0|X)}\right|X\in A\right\}% \right],$ (8)

where $I(x)=1$ if $x>0$ , otherwise, $I(x)=0$ . Let $A_{o}^{1}$ and $A_{o}^{0}$ be partitions of the patient population $\Omega_{x}$ , i.e. $A_{o}^{1}\bigcup A_{o}^{0}=\Omega_{x}$ and $A_{o}^{1}\bigcap A_{o}^{0}=\emptyset$ . The subgroup $A_{o}^{1}$ of patients should be assigned to treatment, i.e. $T=1$ , is

$\displaystyle A_{o}^{1}=\arg\max_{A\in\mathcal{A}}E\left\{\frac{I_{T=r_{o}^{A}% (X)}}{p(T|X)}Y\right\},$ (9)

where $\mathcal{A}$ is the collection of possible subgroups, and the rest of patients should be assigned to control, i.e. $A_{o}^{0}=\Omega_{x}-A_{o}^{1}$ .

Equation (8) is evaluated by observed data where $E\left\{\frac{YT}{P(T=1|X)}|X\in A\right\}$ is estimated by,

$\displaystyle\sum_{x_{i}\in A}\left.\frac{t_{i}y_{i}}{\hat{e}(x_{i})}\right/% \sum_{x_{i}\in A}\frac{t_{i}}{\hat{e}(x_{i})},$

and $E\left\{\frac{Y(1-T)}{P(T=0|X)}|X\in A\right\}$ is estimated by,

$\displaystyle\sum_{x_{i}\in A}\left.\frac{(1-t_{i})y_{i}}{1-\hat{e}(x_{i})}% \right/\sum_{x_{i}\in A}\frac{1-t_{i}}{1-\hat{e}(x_{i})},$

where $\hat{e}(x_{i})$ is a propensity score which can be estimated by a logistic regression.

Proposition 1 provides a unified framework for individualized treatment recommendation and subgroup identification. The framework is very general. Here responses $Y$ can be binary, continuous, or time-to-event endpoints, and covariates $X$ can cover a broad range of variables. This framework also allows the treatment assignment to depend on covariates, so that it can handle both randomized control trials as well as observational studies.

If we require nontrivial treatment benefit for a new treatment as illustrated in Fig. 2, the modified response can be written as below,

$\displaystyle Y_{i}^{m}\equiv T_{i}Y_{i}(1)+(1-T_{i})m\{Y_{i}(0)\},$

where function $m(\cdot)$ is used to define the additional benefit margin. For continuous endpoint, $m(\cdot)$ could be defined as $m\{Y_{i}(0)\}=Y_{i}(0)+\Delta$ , where $\Delta>0$ .

Since treatment $T$ in Eq. (2) is general which allows multiple treatments, by the similar derivation of Proposition 1, it is easy to show that the individualized treatment recommendation for multiple treatments can be obtained as in the following proposition.

.

When there are multiple treatments, the optimal individualized treatment recommendation is

$\displaystyle t_{o}=\arg\max_{t_{p}\in\mathcal{T}}E(Y|T=t_{p},X).$

The optimal individualized treatment recommendation for a subgroup of patients $A$ is

$\displaystyle t_{o}^{A}=\arg\max_{t_{p}\in\mathcal{T}}E\left\{\left.\frac{Y% \cdot I_{T=t_{p}}}{P(T=t_{p}|X)}\right|X\in A\right\},$

Let $\mathcal{G}={A_{1},\ldots,A_{g}}$ is a partition of the patient population, i.e. $\bigcup_{i=1}^{g}A_{i}=\Omega_{x}$ and $A_{i}\bigcap A_{j}=\emptyset,\forall i\neq j$ . So $t_{o}^{A_{i}}$ is the optimal individualized treatment recommendation for the subgroup of $A_{i}$ . The optimal subgroups $\mathcal{G}_{o}$ can be found by,

$\displaystyle\mathcal{G}_{o}=\arg\max_{\mathcal{G}\in\mathfrak{G}}E\left\{% \frac{I_{T=t_{o}^{G_{X}}}}{p(T|X)}Y\right\},$ (10)

where $\mathfrak{G}$ is the collection of possible partitions, and $G_{X}$ is the group where $X$ belongs to.

2.3 Numerical stabilization

Since we do not know the distribution of $\mathcal{P}$ , the expectation in Eq. (2) cannot be directly evaluated. We search the optimal ITR based on observed data. Therefore, $r_{o}$ is estimated by,

$\displaystyle\hat{r}_{o}\in\arg\max_{r\in R}\frac{1}{N}\sum\frac{Y_{i}I_{T_{i}% =r(X_{i})}}{p(T=T_{i}|X_{i})}.$ (11)

Searching $\hat{r}_{o}$ based on random samples is a stochastic optimization problem. Through various of simulation studies, it was found that algorithms for directly optimizing Eq. (11) based on the original $Y_{i}$ were not stable. For example, if our data are generated from Eq. (5), the accuracy varies significantly by simply changing the constant $\beta_{0}$ . The reason is explained below. It is easy to show that,

$\displaystyle\mathbb{E}_{N}\left\{\frac{I_{T=r(X)}}{p(T|X)}Y\right\}=\frac{1}{% N}\sum\frac{I_{T=r(X_{i})}Td(X_{i})}{P\{T=r(X_{i})\}}$ $\displaystyle\quad∼{}+\mathcal{N}\left(\frac{1}{N}\sum h(X_{i}),\frac{1}{n^{2}% }\sum\frac{1-P\{T=r(X_{i})|X_{i}\}}{P\{T=r(X_{i})|X_{i}\}}h^{2}(X_{i})\right)+% o_{p}(1),$

where $h(X)=\beta_{0}+g(X)$ , and $\mathbb{E}_{N}(\cdot)$ is to calculate sample average. When $h(X)\gg d(X)$ , the random variable in the second term dominate the results which result in unstable optimizer.

Following the idea in Liu et al. (2014), we introduce the following numerical stabilization algorithm. First, instead of using the original $Y$ as responses, we fit a model with $Y$ versus $X$ , then put the residuals as response. To demonstrate this approach is valid, we show that,

$\displaystyle\arg\max_{r\in R}E\left\{\frac{I_{T=r(X)}}{p(T|X)}Y\right\}=\arg% \max_{r\in R}E\left[\frac{I_{T=r(X)}}{p(T|X)}\{Y-m(X)\}\right],$

where $m(X)$ is any function of $X$ . This equivalence comes from,

$\displaystyle E\left[\frac{I_{T=r(X)}}{p(T|X)}\{Y-m(X)\}\right]=E\left\{\frac{% I_{T=r(X)}}{p(T|X)}Y\right\}-E\left\{\frac{I_{T=r(X)}}{p(T|X)}m(X)\right\}$ $\displaystyle=E\left\{\frac{I_{T=r(X)}}{p(T|X)}Y\right\}-E\left[\frac{I_{T=r(X% )}}{p\{T=r(X)|X\}}m(X)\right]$ $\displaystyle=E\left\{\frac{I_{T=r(X)}}{p(T|X)}Y\right\}-E\left[\frac{p\{T=r(X% )|X\}}{p\{T=r(X)|X\}}m(X)\right]$ $\displaystyle=E\left\{\frac{I_{T=r(X)}}{p(T|X)}Y\right\}-E\{m(X)\}.$ (12)

Suppose that we can find a consistent estimate of $h(X)$ as $\hat{h}(X)$ , it is easy to show that,

$\displaystyle\mathbb{E}_{N}\left[\frac{I_{T=r(X)}}{p(T|X)}\{Y-\hat{h}(X)\}% \right]=\frac{1}{n}\sum\frac{I_{T=r(X_{i})}Td(X_{i})}{P\{T=r(X_{i})\}}+o_{p}(1).$

This eliminates the impact on $h(X)$ , and stabilize the solution. The key difference between searching $r$ in Eq. (2) versus $\hat{r}$ in Eq. (11) comes from Eq. (2.3). The $\hat{r}$ involves variability from $I_{T=r(X)}$ . Based on our simulation, even a simple linear regression can significantly improve the numerical stable of our searching algorithm.

3. Implementation and numerical evaluation

In this section, we describe our algorithm and implementation for two treatments situation, then we use 5 simulation studies to evaluate its performance.

3.1 Implementation

As we discuss in the introduction, our paper focuses on meeting the needs for drug development. We are interested in subgroups with open rectangle shape (e.g. $\mbox{Age}\leqslant$ 75 and $\mbox{BMI}>$ 18). Although more complicated subgroups could potentially improve the accuracy, the results could become difficult to implement in real world clinical settings. The subgroups are defined by certain number of variables. The number of variables to define a subgroup is called the depth. In practice, arguably, we prefer a depth is less or equal to 3. Therefore, we propose a comprehensive search algorithm. It is implemented in C++, and we connect it with R (R Core Team, 2014) through Rcpp package (Eddelbuettel et al., 2011).

Our searching algorithm for patients who should be on treatment can be described as the following steps,

1.
We have the observed data as $(Y,T,X)$ , then use the observed data and a logistic regression to estimate propensity scores $e=P(T=1|X)$ . If the data are from RCTs, we directly use the true randomization ratio as input data for the next step.
2.
We fit a linear model with $Y$ versus $X$ , and denote the residuals as $\tilde{Y}$ . The data input to the algorithm are $(\tilde{Y},T,e,X)$ . We have two remarks here. First, we can fit other models to $Y$ to improve the accuracy. Second, by the derivation in Section 2.3, we can fit linear model to binary data or time-to-event outcomes.
3.
For a given number of covariates $X_{1},\ldots,X_{p}$ , we select 3 covariates as $X_{k1},X_{k2},X_{k3}$ , where $k=1,\ldots,C(3,p)$ , and $C(3,p)$ is the number of ways to choose 3 elements out of $p$ .
4.
For each selected covariate, we choose a split value $c_{k1},c_{k2},c_{k3}$ .
5.
For each split value, we select a direction to define the subgroup. It can be $\leqslant$ or $>$ . Then we can finish defining one possible subgroup, e.g. $A_{k,j}=(X_{k1}\leqslant c_{k1}\cap X_{k2}\leqslant c_{k2}\cap X_{k3}>c_{k3})$ , where $j=1,\ldots,8$ . Since we choose depth is equal to 3, for a fixed 3 variables and 3 split values, we have 8 ways to define subgroups based on different directions.
6.
After we have a subgroup in step 3, we assign treatment to this subgroup $A_{k,j}$ and control otherwise. Then we evaluate our value function,

$\displaystyle V_{k,j}=\frac{1}{N}\sum\frac{\tilde{Y_{i}}I_{T_{i}=I_{X_{i}\in A% _{k,j}}}}{T_{i}e_{i}+(1-T_{i})(1-e_{i})}.$
7.
By selecting different covariates, split values, and directions, we can evaluate all the value functions $V_{k,j}$ , and provide a subgroup associated maximal value of $V_{k,j}$ .

Figure 3.
Speed test for ITR algorithm. From panel (a) and panel (b), the time spent is in an exponential shape as the number of parameters increasing. In panel (c), the time spent is in an approximately linear relationship with the sample size.

3.2 Simulation

In this subsection, we design 5 simulation studies to evaluate our algorithm.

•
Speed Test: The comprehensive search is computational intensive, because there are many combinations to define a subgroup. For example, if we have $p=$ 30 covariates, depth is equal to 3, and each covariate has 10 split values, there will be about 32 million options to define subgroups (30 $$ 29 $$ 28/6 $$ 10 ${}^{3}$ $$ 8). Our first question to our algorithm is whether this is feasible. Thanks to the Rcpp package, C++ tremendously improve speed to approximate 400 fold than our the original R codes. Figure 3 shows the time for each situation. We ran our simulation on a regular laptop, with CPU Intel i5M520 2.4 Ghz 3 GB RAM, 32-bit Windows XP Version 2002 Service Pack 3, and R 3.0.0. In this simulation, each variable has 10 split values. As we can see from panel (b), when there are 30 covariates, depth is equal to 3, our algorithm takes about 200 seconds to evaluate all subgroups. The time is exponentially increases with the number of covariates. But it is linearly increase with sample size as shown in panel (c).
•
Improve Numerical Stable: The second simulation is to demonstrate the procedure described in Section 2.3 can improve the numerical stability. We simulate 4 covariates $X_{1},\ldots,X_{4}$ . Each covariates is generated from a uniform distribution $U(0,1)$ . Response $Y$ is generated from the following model,

$\displaystyle Y=\beta_{0}+\,{\rm sign}\,(X_{2}-0.5)+T\cdot I_{X_{1}\leqslant 0% .6}+(1-T)\cdot I_{X_{1}>0.6}.$

We do not include random noise $\epsilon$ in this model so that the randomness are from nuisance covariates $X3,X4$ . Therefore, we can better evaluate our procedure. We simulate 200 patients for our study which is close to the sample size as Phase II studies. The treatment is randomly assigned with a probability as 0.5. We change our intercept from 0 to 20, and other parts of the model keep the same through whole simulation. For different models with different intercepts, we run 1000 simulations with searching depth equal to 1. In each simulation, we count whether we can correctly select $X_{1}$ as the most important variable (i.e. with largest value function). As shown in Fig. 4, as the intercept increases, fitting on original $Y$ results in a poor performance. In particular, when $\beta_{0}=20$ , the chance to select $X_{1}$ is close to 1/4 which is close to a random selection (there are 4 covariates in this example). However, if we use the residuals from a linear model as responses, the chance to select right variables is almost 100% correct in this simulation. From now on, all the simulations and data analysis are residuals instead of original $Y$ .

Figure 4.
Improving numerical stable. Data are generated from $Y=\beta_{0}+\,{\rm sign}\,(X_{2}-0.5)+T\cdot I_{X_{1}\leqslant 0.6}+(1-T)\cdot I% _{X_{1}>0.6}$ where we include $X_{3}$ and $X_{4}$ as nuisance covariates. The dotted line is the proportion of corrections to select $X_{1}$ to define the treatment rule by applying ITR on the observed $Y$ . The solid line is from applying ITR to the residuals from a linear regression with $Y$ and $\{X_{1},\ldots,X_{4}\}$ . It shows that fitting on residuals significantly improve the accuracy.

•
Convergence in RCTs: It is natural to ask whether our algorithm can converge to the truth when sample size or signal-noise ratio becomes larger and larger. We design a simulation study to answer this question in RCTs settings. We have 4 covariates $X_{1},\ldots,X_{4}$ . Each covariates is generated from a uniform distribution $U(0,1)$ . Responses are generated form the following model,

$\displaystyle Y=\,{\rm sign}\,(X_{2}-0.5)+\theta\cdot T\cdot I_{X\in A}+\theta% \cdot(1-T)\cdot I_{X\in A^{c}}+\epsilon,$

where $\epsilon$ is i.i.d. $N(0,1)$ . We define true subgroup $A$ by different depths from 1 to 3 as,

–
When depth is equal to 1, $A=\{X:X_{1}\leqslant 0.6\}$ .
–
When depth is equal to 2, $A=\{X:X_{1}\leqslant 0.7\cap X_{3}>0.3\}$ .
–
When depth is equal to 3, $A=\{X:X_{1}\leqslant 0.8\cap X_{2}>0.2\cap X_{3}>0.2\}$ .

Study sample sizes are tested at 100,200,500,1000. The treatment is randomly assigned with probability equal to 0.5. We use $\theta$ to change the signal-noise ratio. We range $\theta$ from 0.1 to 0.5 where larger $\theta$ means stronger signal. We evaluate accuracy by correctly picking out the variables to define the subgroups. The results are shown in Fig. 5. This simulation demonstrates that our algorithm converges to the truth when sample size or signal-noise ratio becomes larger and larger.

Figure 5.
Algorithm convergence. Data are generated from simulated randomized control trials. These three plots illustrated that when sample sizes or signal-to-noise ratio go to higher, the algorithms become more accurate and converge to the truth.

•
Convergence in Observational Studies: We design this simulation study to evaluate the performance of our method to identify correct subgroup when treatment assignment depends on covariates. We have 5 covariates $X_{1},\ldots,X_{5}$ which are generated from uniform distributions $U(0,1)$ . Responses are generated form the following model,

$\displaystyle Y=1+2\cdot X_{2}+\theta\cdot T\cdot I_{X_{1}>0.5}+\theta\cdot(1-% T)\cdot I_{X_{1}\leqslant 0.5}+\epsilon,$

where $\epsilon$ is i.i.d. $N(0,1)$ . The treatment assignment probability is from the following model,

$\displaystyle\mbox{logit}(p)=-0.5b+bX_{i},$

where we let $b=6.5$ , so that we have 95% chance to assign treat 1 when $X_{i}=$ 0.05. We have 3 different sets of simulation for $i=1,2,3$ . When $i=1$ , the treatment assignment is related to the predictive marker, when $i=2$ , the treatment assignment is associated with prognostic marker, and when $i=3$ , the treatment assignment is correlated with the nuisances covariate. We varies the sample size as 500, 2000, and 5000, and change $\theta=$ 0.25, 0.5, 1. The accuracy is measured by the percentage times that we correctly select the subgroup as $A=\{X:X_{1}>0.5\}$ . The results are shown in Table 1. This simulation demonstrates that our algorithm converges to the truth for observational studies when sample size or signal-noise ratio becomes larger and larger.

Table 1
Algorithm convergence for simulated observational studies. Data are generated from simulated observational. The table shows that when sample sizes $n$ or signal-to-noise ratio ( $\theta$ ) go to higher, the algorithms become more accurate and converge to the truth. The numbers in this table are the percentage to capture the true variable with the true cut point

$X_{1}$ $X_{2}$ $X_{3}$

$\theta=0.25$ $\theta=0.5$ $\theta=1$ $\theta=0.25$ $\theta=0.5$ $\theta=1$ $\theta=0.25$ $\theta=0.5$ $\theta=1$

$n=$ 500 0.282 0.805 0.998 0.235 0.723 0.966 0.240 0.675 0.968

$n=$ 2000 0.839 0.999 1.000 0.678 0.972 1.000 0.648 0.978 1.000

$n=$ 5000 0.989 1.000 1.000 0.913 0.999 1.000 0.921 0.999 1.000

Figure 6.
Variable importance. We have 10 variables, and the true subgroup is defined by variable 4 and 6. We have 500 simulations, and in each run we count the relative frequency for each variable to define the 5% ITRs associated with the highest value functions.

•
Variable Importance: It is often asked that which depth is a reasonable number to choose. We may introduce redundancy to use depth equal to 3 when the true depth is 2, and we may miss some signal to use depth equal to 2 when the truth is equal to 3. To answer this question, we need a way to evaluate the relatively important of the variables. This simulation demonstrates that our method is able to identify the important variables to define subgroups. We have 10 covariates $\{X_{1},\ldots,X_{10}\}$ which are generated from uniform distributions $U(0,1)$ . Responses are generated form the following model,

$\displaystyle Y=\,{\rm sign}\,(X_{2}-0.5)+0.5T\cdot I_{X\in A}+0.5\cdot(1-T)% \cdot I_{X\in A^{c}}+\epsilon,$

where $\epsilon$ is i.i.d. $N(0,1)$ , $A=\{X:X_{4}\leqslant 0.7\cap X_{6}\leqslant 0.7\}$ . We have 500 subjects in each simulation study, and treatment is randomly assigned with probability equal to 0.5. For this simulation, we select the top 5% subgroups with the largest value function, then count the proportion of each covariates. We repeat this simulation for 1000 times, and the results are shown in Fig. 6. It shows that our algorithm successfully pick out covariates $X_{4}$ and $X_{6}$ as the most important variables to define the subgroups.

4. Data analysis

	$X_{1}$	$X_{2}$	$X_{3}$
$n=$ 500	0.282	0.805	0.998	0.235	0.723	0.966	0.240	0.675	0.968
$n=$ 2000	0.839	0.999	1.000	0.678	0.972	1.000	0.648	0.978	1.000
$n=$ 5000	0.989	1.000	1.000	0.913	0.999	1.000	0.921	0.999	1.000

Diabetes mellitus, or simply diabetes, is a disease characterized by elevated blood glucose. It is a major cause of kidney failure, nontraumatic lower-limb amputations, blindness, heart disease and stroke. As a result, diabetes is one of the leading causes of death. Based on the data from Centers for Disease Control and Prevention (http://www.cdc.gov/diabetes/data), in year 2014, diabetes affects 29.1 million American people which is 9.3% of the U.S. population. The newly diagnosed cases are expected to be at a rate of 1 million people per year. The goal of treating diabetes patients is to lower their blood glucose. Patients are often first on diets and exercise, then are prescribed for metformin which is the first line oral antidiabetic treatment. After patients are failed on metformin, gliclazide and pioglitazone are popular choices for the second line oral treatments American Diabetes Association and others (2014). Therefore, it is important to understand which patients are suitable to use gliclazide and who are suitable for pioglitazone. In our study, we use a data set from a randomized, double-blind, parallel-group comparison phase III study to compare drug efficacy of gliclazide (control) versus pioglitazone (treatment). A total of 1270 patients with Type 2 diabetes were randomized in this study with poorly controlled HbA1c (7.5%–11%). Patients were either received pioglitazone up to 45 mg once daily or gliclazide up to 160 mg two times a day. Primary efficacy endpoint was change in HbA1c from baseline to the end of the study (52 weeks). Charbonnel et al. (2005) provides more details on this study design and analyses including a patients demographic table. Here, we define the baseline HbA1c at randomization visit, while if HbA1c value is missing at randomization visit, we use the screening visit HbA1c value which is 2 weeks before. If both values are missing, we remove this patient from analysis. The last observation carry forward method is used to impute the last HbA1c value to calculate change from baseline HbA1c. After we processed the data, there are 593 patients on pioglitazone, and 591 patients are on gliclazide.

Figure 7.

Variable importance for the real data. We apply ITR with depth equal to 3, and rank the variable relative frequency for the top 0.01% ITRs.

In our analysis, we include 22 biomarkers measured at baseline. They are: Age, ALT, AST, BMI, Diastolic blood pressure, Systolic blood pressure, Cholesterol, Creatinine, Duration of diabetes, Fasting blood glucose, Fasting insulin, GGT, HbA1c, HDL, HomaB, HomaIR, HomaS, LDL, Pulse, Triglycerides, Waist, Weight. The reason that we use acronyms for some of these biomarkers is because they are standard names in clinical practice which could be used more often than their full names (e.g. BMI vs. body mass index).

As an initial step, we fit our data with a depth equal to 3. Therefore, we have 35,925,120 candidate ITRs to define the partition of our patient population. Each ITR is corresponding to a value function which measures the overall benefit under this rule. We select the top 0.01% ITRs which have the smallest value functions (smaller HbA1c change from baseline means a better efficacy). Then we plot the frequency of the biomarkers in Fig. 7. From the figure, we can see that the top 2 biomarker fasting insulin and baseline HbA1c are standing out. Therefore, we decide to use depth equal to 2 to fit the data.

Table 2

HbA1c Reduction Before and After Following ITR. Patients with baseline fasting insulin $\geqslant$ 61.12 pmol/L and baseline HbA1c $\geqslant$ 8.1% ( $A_{o}^{1}$ ) are recommended to take Pioglitazone, otherwise ( $A_{o}^{0}$ ) patients are recommended to take Gliclazide. After following ITR, the overall HbA1c reduction changes from $-$ 1.287% to $-$ 1.473%

Original			Follow ITR
$-$ 1.287			$-$ 1.473
	Control	Treatment		Control	Treatment
Mean	$-$ 1.271	$-$ 1.303	$A_{o}^{1}$	$-$ 1.394	[rgb]1,0,0 $-$ 1.864
			$A_{o}^{0}$	[rgb]1,0,0 $-$ 1.19	$-$ 0.932

We fit our final model with depth equal to 2. Then there are 224,532 candidate ITRs. We rank all the value functions corresponding to each ITR. The smallest value function is associated with the ITR is that baseline fasting insulin $\geqslant$ 61.12 pmol/L and baseline HbA1c $\geqslant$ 8.1% ( $A_{o}^{1}$ ) are recommended to take pioglitazone, otherwise ( $A_{o}^{0}$ ) patients are recommended to take gliclazide. It is worth to point out that in this example, the top two biomarkers selected from the first step (depth 3) happens to be the biomarkers associated with the smallest value function in step 2. This may not be always the case. If not, we may need to consider to comprise the best choice from step 2, and use clinical judgement to select the ITR. The comparison results are presented in Table 2. It clearly shows the value. First, if patients are simply randomly assigned to pioglitazone or gliclazide, the overall HbA1c reduction for these patients are $-$ 1.287%. However, after follow the recommended rule, the overall HbA1c reduction becomes to $-$ 1.473%. Second, for each subgroup patients have chance to significantly improve their outcomes. For example, in group $A_{o}^{1}$ , if patients are assigned to gliclazide, they have $-$ 1.394% HbA1c reduction comparing $-$ 1.864% HbA1c reduction if they assigned on pioglitazone. In the end, our subgroup of $A_{o}^{1}$ contains 39.3% patients, and $A_{o}^{0}$ has 60.7% patients.

This simple rule is not only easy to use but also medically meaningful. It is known that pioglitazone is insulin sensitizer which makes patients more sensitive to insulin. Gliclazide belongs to a class of sulfonylurea which push patients beta cell to produce more insulin. For patients with high fasting insulin, pushing out more insulin by gliclazide may not be a solution since they already have good amount of insulin. However pioglitazone make patients more sensitive to insulin, and as a consequence patients can better utilize their insulin to lower blood glucose. Patients who have higher fasting insulin and higher HbA1c are those patients who are insulin resistant while still have a good amount of insulin in their body. It is not surprising that insulin sensitizers, like pioglitazone, work better for those patients. This finding is aligned with the mechanism of action of these two drugs, and consistent with other results (Charbonnel et al., 2005).

5. Discussion

In this paper, we connect the outcome weighted learning approach by Qian and Murphy (2011) and Zhao et al. (2012) to traditional subgroup identification problems. This framework is general enough to handle data from randomized control trials as well as observational studies with 2 or more than 2 treatments. Furthermore, we developed a searching algorithm which is particularly tailored for drug developing for pharmaceutical companies. The algorithm focuses on searching easy to interpret treatment recommendation rules with an ability to select and rank variables.

One limitation of the current algorithm is not able to incorporate large number of coavariates. In current clinical trials, the number of key baseline covariates including demographic and lab data are often less than 50. We can safely say that our algorithm can handle most current late phase studies. However, when we consider more comprehensive biomarkers including genetics data, our algorithm will fail. As a future work, tree algorithms (Hastie et al., 2011) could be a solution to reduce the computational complexity.

Currently we are working on an extension to incorporate meta-analysis into this framework. Sample sizes for most of clinical trials are powered for the primary objectives of those studies, and often not for personalized medicine or subgroup identifications. Synthesising evidence from multiple studies could potentially develop more robust ITR. In particular, our current framework allows the treatment to depend on covariates, and $X$ could incorporate study ID.

Footnotes

Acknowledgments

The authors are grateful to the editor, associate editor, and referees for review this article.

References

American Diabetes Association and others. (2014). Standards of medical care in diabetes 2014. Diabetes Care, 37, S14-S80.

Brookes

S. T.

Whitely

Egger

Smith

G. D.

Mulheran

P. A.

, & Peters

T. J.

(2004). Subgroup analyses in randomized trials: Risks of subgroup-specific analyses; power and sample size for the interaction test. Journal of Clinical Epidemiology, 57, 229-236.

Cai

Tian

Wong

P. H.

, & Wei

(2011). Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics, 12, 270-282.

Charbonnel

Matthews

Schernthaner

Hanefeld

, & Brunetti

(2005). A long-term comparison of pioglitazone and gliclazide in patients with Type 2 diabetes mellitus: A randomized, double-blind, parallel-group comparison trial. Diabetic Medicine, 22, 399-405.

Eddelbuettel

François

Allaire

Chambers

Bates

, & Ushey

(2011). Rcpp: Seamless R and C+⁣+ integration. Journal of Statistical Software, 40, 1-18.

Ellsworth

R. E.

Decewicz

D. J.

Shriver

C. D.

, & Ellsworth

D. L.

(2010). Breast cancer in the personal genomics era. Current Genomics, 11, 146-161.

Faries

D. E.

Chen

Lipkovich

I.A.

Zagar

Liu

& Obenchain

R. L.

(2013). Local control for identifying subgroups of interest in observational research: Persistence of treatment for major depressive disorder. International Journal of Methods in Psychiatric Research, 22, 185-194.

Foster

J. C.

Taylor

J. M.

, & Ruberg

S. J.

(2011). Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30, 2867-2880.

Hastie

T. J.

Tibshirani

R. J.

, & Friedman

J. H.

(2011). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York: Springer.

10.

Horvitz

D. G.

, & Thompson

D. J.

(1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663-685.

11.

Lagakos

S. W.

(2006). The challenge of subgroup analyses-reporting without distorting. New England Journal of Medicine, 354, 1667.

12.

Lipkovich

Dmitrienko

Denne

, & Enas

(2011). Subgroup identification based on differential effect search a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine, 30, 2601-2621.

13.

Liu

Wang

Kosorok

Zhao

, & Zeng

(2014). Robust hybrid learning for estimating personalized dynamic treatment regimes. Manuscript Under Review.

14.

Mancinelli

Cronin

, & Sadée

(2000). Pharmacogenomics: the promise of personalized medicine. Aaps Pharmsci, 2, 29-41.

15.

Murphy

S. A.

(2005). A generalization error for Q-learning. Journal of Machine Learning Research: JMLR, 6, 1073-1097.

16.

Qian

, & Murphy

S. A.

(2011). Performance guarantees for individualized treatment rules. Annals of Statistics, 39, 1180-1210.

17.

R Core Team (2014). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.

18.

Robins

J. M.

, & Rotnitzky

(1992). Recovery of information and adjustment for dependent censoring using surrogate markers. Aids Epidemiology, Methodological Issues, 297-331.

19.

Rosenbaum

P. R.

, & Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.

20.

Rothwell

P. M.

(2005). Subgroup analysis in randomised controlled trials: Importance, indications, and interpretation. The Lancet, 365, 176-186.

21.

Ruberg

S. J.

Chen

, & Wang

(2010). The mean does not mean as much anymore: finding sub-groups for tailored therapeutics. Clinical Trials, 7, 574-583.

22.

Shuldiner

A. R.

OConnell

J. R.

Bliden

K. P.

Gandhi

Ryan

Horenstein

R. B.

Damcott

C. M.

Pakyz

Tantry

U. S.

Gibson

Pollin

T. I.

Post

Parsa

Mitcheli

B. D.

Faraday

Herzog

, & Gurbel

P. A.

(2009). Association of cytochrome P450 2C19 genotype with the antiplatelet effect and clinical efficacy of clopidogrel therapy. Jama, 302, 849-857.

23.

Tsai

C.-L.

Wang

Nickerson

D. M.

, & Li

(2009). Subgroup analysis via recursive partitioning. The Journal of Machine Learning Research, 10, 141-158.

24.

Telli

M. L.

Hunt

S. A.

Carlson

R. W.

, & Guardino

A. E.

(2007). Trastuzumab-related cardiotoxicity: calling into question the concept of reversibility. Journal of Clinical Oncology, 25, 3525-3533.

25.

Wang

Lagakos

S. W.

Ware

J. H.

Hunter

D. J.

, & Drazen

J. M.

(2007). Statistics in medicinereporting of subgroup analyses in clinical trials. New England Journal of Medicine, 357, 2189-2194.

26.

Zhang

Tsiatis

A. A.

Laber

E. B.

, & Davidian

(2012). A robust method for estimating optimal treatment regimes. Biometrics, 68, 1010-1018.

27.

Zhao

Tian

Cai

Claggett

, & Wei

L.-J.

(2013). Effectively selecting a target population for a future comparative study. Journal of the American Statistical Association, 108, 527-539.

28.

Zhao

, & Zeng

(2013). Recent development on statistical methods for personalized medicine discovery. Frontiers of Medicine, 7, 102-110.

29.

Zhao

Zeng

Rush

A. J.

, & Kosorok

M. R.

(2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107, 1106-1118.

	$X_{1}$			$X_{2}$			$X_{3}$
	$\theta=0.25$	$\theta=0.5$	$\theta=1$	$\theta=0.25$	$\theta=0.5$	$\theta=1$	$\theta=0.25$	$\theta=0.5$	$\theta=1$
$n=$ 500	0.282	0.805	0.998	0.235	0.723	0.966	0.240	0.675	0.968
$n=$ 2000	0.839	0.999	1.000	0.678	0.972	1.000	0.648	0.978	1.000
$n=$ 5000	0.989	1.000	1.000	0.913	0.999	1.000	0.921	0.999	1.000