Bayesian reliability optimization for multivariate binary responses

Abstract

Response Surface Methodology is a popular set of statistical techniques used to improve a system process. Peterson (2004) proposed a Bayesian multivariate response optimization method that considers the dependence structure among the responses when locating the optimal region as defined by some loss or desirability function. The main contribution of this paper lies in addressing the Bayesian reliability optimization for multivariate binary responses where the logistic models with traditional Bayesian reliability approach suffers from computational complexities. This work is focused on reducing the computational complexities by introducing latent variables in the response structure.

Keywords

Bayesian reliability binary response gibbs steps latent variable MCMC response surface method

1. Introduction

Numerous Response Surface and Design of Experiment (DOE) methodologies including both frequentist and Bayesian approaches have been proposed to address the problem of optimizing multiple responses. Frequentist approaches include Harrington (1965), Derringer and Suich (1980), Del Castillo, Montgomery, and McCarville (1996) which are all based on the optimization of a desirability function with respect to the experimental factor levels. Pignatiello (1993), Ames et al. (1997) and Vinning (1998) proposed procedures based on minimization of a quadratic loss function. A Bayesian approach has been proposed by Raghunathan (2000) which analyzes the proportion of nonconforming units but it does not address optimization over response surfaces. Peterson (2004) introduced a Bayesian posterior predictive approach which optimizes a user specified target region on the response space based on the posterior predictive probability with respect to the input space. Peterson’s method also addressed specific weaknesses of the frequentist approach by taking into account the covariance structure of the responses and the parameter uncertainty of the models. Wang et. al. incorporated the measures of bias or robustness by combining the idea of quality loss function (Ko et. al., 2005) with Peterson’s Bayesian reliability approach and showed that the proposed method provides a more reasonable solution.

Although, Peterson’s approach has considered optimizing multivatiate quantitative responses only, the same idea can be extended to qualitative responses as well. For example, an output of interest can be defined in terms of whether or not multiple outputs meets certain quality characteristics or tests, whether the output product is free of multiple possible defects, and so on. Currently, no widely accepted methods exist for how to optimize response surfaces derived from experimental designs to address such questions. Moreover, the proposed method – which is based on Bayesian statistical theory – can be extended to a broader audience as it can solve many other experimental design problems like drug tests in the pharmaceutical industries or checking the effects of different chemical components (e.g. ignition tests) in the chemical industries or even in manufacturing domain. For a specific research interest, we can consider the serum testing experiment on infected mice (Smith, 1932) where different doses of serum were injected to mice and the response was their survival after seven days. Different other qualitative characteristics of those mice can be obtained as responses to retrieve the effects of different dosage of serum.

Recent advancements is different computation techniques and efficiency gained through higher computing power opened many opportunities to handle complex modeling scenarios where the data need to be handled in an indirect way. In many situations, unsupervised learning methods need to be implemented to gain better insight from the dataset. For example, let us consider a situation of a psychometric test where some patients have been given with different dosages of multiple madicines and after completing a full course of treatment, they are provided with a questionnaire with some sample questions as whether they have any pain (yes or now), whether they have developped any rashes (yes or no) or how they are feeling compared to the time before the treatment started (on a scale of 1 to 10) and so on. One another example could be a market survey for a cookie to optimize it’s quality based some responses like chewiness, hardness, sweetness etc. These are couple of typical scenarios where the responses are qualitative but correlated. A typical arcsine transformation would treat the responses independently and also will tend to skew if number of categories grows moderately large. Hence, it is important to handle the responses in a data driven way such that we can preserve the dependence structure among themselves.

In this paper, we extend the implementation of Peterson’s (2004) posterior predictive approach to binary responses. Although we limit our approach to the binary case, it can, however, be easily generalized to categorical responses with more than two levels. We illustrate this approach with a case study.

Implementation of the Bayesian posterior predictive approach to the binary logistic case induces a complicated posterior formulation. As a simple example, consider a univariate binary logistic model as,

$\displaystyle\text{logit}\ P(y=1\mid\underset{\sim}{\beta},\sigma)=\underset{% \sim}{\beta}^{\prime}z(\underset{\sim}{x})+e$

where $e$ has a multivariate normal distribution with mean vector $\underset{\sim}{0}$ and covariance matrix $\sigma^{2}I_{n}$ . We consider a similar setup as Peterson (2004) with a Bayesian modeling of the above logit link and a non-informative prior on $\underset{\sim}{\beta}$ and $\sigma^{2}$ as $\pi(\underset{\sim}{\beta},\sigma^{2})\propto\frac{1}{\sigma}$ . The joint density of $(\underset{\sim}{y},\underset{\sim}{\beta},\sigma^{2})$ is given by,

$\displaystyle\pi(\underset{\sim}{y},\underset{\sim}{\beta},\sigma^{2}\mid% \underset{\sim}{x})\propto\frac{e^{\underset{\sim}{\beta}^{\prime}z(\underset{% \sim}{x})}}{1+e^{\underset{\sim}{\beta}^{\prime}z(\underset{\sim}{x})}}\times% \frac{1}{\sigma}$

Hence a closed form expression of the posteriors as well as the posterior predictive density is difficult to obtain even in the univariate case.

A nice approximation in the multivariate logistic regression case comes from the multivariate t distribution (Albert and Chib, 1993),

$\displaystyle P(y_{1}=y_{10},y_{2}=y_{20},...,y_{p}=y_{p0}\mid B,\Sigma,% \underset{\sim}{x})$ $\displaystyle=\tau_{p,\nu}(\underset{\sim}{y}\mid\underset{\sim}{\mu}=% \underset{\sim}{0},\Sigma)\times\prod_{i=1}^{p}\frac{l(y_{i}\mid\underset{\sim% }{\beta}_{i})}{\tau_{\nu,1}(y_{i}\mid\mu_{i}=0,\sigma^{2}_{i}=1)}$

where,

$\displaystyle\tau_{p,\nu}(\underset{\sim}{y}\mid\underset{\sim}{\mu},\Sigma)=% \Big{(}\frac{\Gamma((\nu+p)/2)}{\Gamma(\nu/2)(\nu\pi)^{p/2}\mid\Sigma\mid^{1/2% }}\Big{)}\times\Big{\{}1+\frac{1}{\nu}(\underset{\sim}{y}-\underset{\sim}{\mu}% )^{{}^{\prime}}\Sigma_{-1}(\underset{\sim}{y}-\underset{\sim}{\mu})\Big{\}}^{-% (\nu+p)/2}$

and,

$\displaystyle l(\underset{\sim}{y}|\underset{\sim}{\beta_{i}})=\frac{e^{% \underset{\sim}{x}^{\prime}\underset{\sim}{\beta_{i}}}}{1+e^{\underset{\sim}{x% }^{\prime}\underset{\sim}{\beta_{i}}}}$

Here $\underset{\sim}{\beta_{i}}$ is the $i^{th}$ row of $B$ . If we consider a similar prior set up as Peterson (2004), i.e. $\pi(B,\Sigma)\propto|\Sigma|^{-\frac{p+1}{2}}$ , then the joint posterior can be written as,

$\displaystyle P(B,\Sigma\mid\underset{\sim}{y},\underset{\sim}{x})\propto\tau_% {p,\nu}(\underset{\sim}{y}\mid\underset{\sim}{\mu}=\underset{\sim}{0},\Sigma)% \times\prod_{i=1}^{p}\frac{l(y_{i}\mid\underset{\sim}{\beta}_{i})}{\tau_{\nu,1% }(y_{i}\mid\mu_{i}=0,\sigma^{2}_{i}=1)}\times|\Sigma|^{-\frac{p+1}{2}}$

The marginal posteriors do not to have closed form expressions and require the Metropolis-Hastings to approximate.

The Bayesian method proposed in this paper approaches the problem from a Bayesian classification modeling perspective. We iteratively map the latent response variables to the original observed binary responses in such a way that we get a optimal representation of the binary variables in terms of a separated non-overlapping hyperplane. In Section 2, we present a binary response optimization model along with the corresponding prior structure. Gibbs steps are proposed on how to obtain samples so that MCMC estimates of the posterior predictive probabilities can be obtained. In Section 3, simulation studies have been carried out for multiple scenatios to check the performance of the model and some convergence diagnostics for the MCMC have been provided. In Section 4, the model is implemented to a real data through different numerical optimizations over the posterior predictive density domain. In Section 5, we conclude with key findings along with the scope of the method’s applicability.

2. Bayesian reliability for multivariate binary response

Suppose $\underset{\sim}{y}=(y_{1},y_{2},\ldots,y_{p})^{{}^{\prime}}$ is the multivariate $(p\times 1)$ response vector and $\underset{\sim}{x}=(x_{1},x_{2},...,\\ x_{k})^{{}^{\prime}}$ is the $(k\times 1)$ vector of predictors. Here, the responses $y_{i}$ $\forall i\in 1,2,\ldots,p$ are binary taking values 0 or 1. The goal of the analysis is to maximize the probability that $\underset{\sim}{y}$ is inside a user specified target region $A$ with respect to the input space $\underset{\sim}{x}$ , i.e. to maximize $P(\underset{\sim}{y}\in A\mid\underset{\sim}{x})$ .

Let us define a latent variable $t_{ij}$ of the form $y_{ij}=I(t_{ij}>0)$ . Hence $y_{ij}$ ’s are identified though the latent variables $t_{ij}$ ’s. To get an interpretation of such a representation, we can assume $\underset{\sim}{t}$ to be an element from an induced $p$ dimensional hyperplane from the input space. An optimal classification model obtained through $\underset{\sim}{t}$ will divide the hyperplane into two disjoint sections where $\underset{\sim}{y}$ can be classified optimally.

We adapt the Bayesian classical response surface technique to model $\underset{\sim}{t}$ and the density of the distribution of $\underset{\sim}{t}$ towards each side of the real line with respect to zero defines the outcome for $\underset{\sim}{y}$ . We first assume $y_{i}$ ’s are conditionally independent given $t_{i}$ ’s, that is,

$\displaystyle P(Y_{i}=y_{i}\mid t_{i})=\theta_{i}^{y_{i}}(1-\theta_{i})^{1-y_{% i}}\hskip 5.690551pt\forall\hskip 5.690551pti=1,2,...,p$ (1)

where, $\theta_{i}=P(t_{i}>0)$ .

The conditional independence simply translates the propagation of the orthogonality among the dimensions of the hyperplane to the dimensions of the multivariate responses. Hence, the conditional independence of the $Y_{i}^{\prime}s$ preserves the marginal dependence as well as makes the model easier to handle and interpret. Since, $\underset{\sim}{t}$ is defined over the real space and represents the binary responses in a well defined way, we can use $\underset{\sim}{t}$ to write our regular regression model as,

$\displaystyle\underset{\sim}{t}=Bz(\underset{\sim}{x})+\underset{\sim}{e}.$ (2)

where $B$ is a $p\times q$ matrix of regression coefficients and $z(\underset{\sim}{x})$ is a $q\times 1$ vector valued function of $\underset{\sim}{x}$ . A typical example of $z(\underset{\sim}{x})$ can consist of the first and the second order effects as well as the interactions, i.e. if $x_{1}$ and $x_{2}$ are two model inputs, then $z(\underset{\sim}{x})=(x_{1},x_{2},x_{1}^{2},x_{2}^{2},x_{1}x_{2})$ . We assume $z(\underset{\sim}{x})$ is the same for all response variables and $\underset{\sim}{e}$ has a multivariate normal distribution with mean vector $\underset{\sim}{0}$ and covariance matrix $\Sigma$ . We take a non-informative joint prior on $(B,\Sigma)$ as,

$\displaystyle\pi(B,\Sigma)\propto{\mid\Sigma\mid}^{-\frac{p+1}{2}}$ (3)

Although the underlying model we have considered is a multivariate binary response model, considering a nested auxiliary model through latent variables simplifies the extraction of the dependence structure between the responses. The covariance matrix $\Sigma$ defines the dependence between the $t_{i}$ ’s which translates to the dependence structure between the binary responses $y_{i}$ ’s.

The joint posterior density of $\underset{\sim}{t},B,\Sigma\mid\underset{\sim}{y},\underset{\sim}{x}$ can be written as:

$\displaystyle\pi(\underset{\sim}{t},B,\Sigma|\underset{\sim}{y},\underset{\sim% }{x})\propto\pi(\underset{\sim}{y}\mid\underset{\sim}{t})\times\pi(\underset{% \sim}{t}\mid\underset{\sim}{x},B,\Sigma)\times\pi(B,\Sigma)\propto\prod_{i=1}^% {p}\pi_{i}^{y_{i}}(1-\pi_{i})^{1-y_{i}}\times N(Bz(\underset{\sim}{x}),\Sigma)% \times|\Sigma|^{-\frac{p+1}{2}}$

The posterior expression shows the underlying model is a simple Bayesian regression model nested within a binary structure. As the expressions containing $y$ , the binary predictor is free from the model parameters, the closed form posterior expressions for $B$ and $\Sigma$ can be easily obtained from the simple techniques of Bayesian linear regression.

Since binary response modeling has been considered in this paper, the target for the experimenter would be of the form $p(\underset{\sim}{x})=P(\underset{\sim}{y}=A\mid\underset{\sim}{x})$ where $A$ is a vector of 0’s and 1’s. Hence, the simple idea is to obtain the posterior predictive distribution of $\underset{\sim}{t}$ and get the posterior predictive distribution of $\underset{\sim}{y}$ conditioned on $\underset{\sim}{t}$ , but since we are not observing the actual $\underset{\sim}{t}$ , but $\underset{\sim}{y}$ , we need to obtain the optimized $\underset{\sim}{t}$ which can be considered representatives of the distribution of $\underset{\sim}{y}$ . To this end, we start with an arbitrary $\underset{\sim}{t}$ and then update the model with the posterior density of $B$ and $\Sigma$ and then finally update $\underset{\sim}{t}$ by incorporating the observed $\underset{\sim}{y}$ in the updated model. The details of the Gibbs steps for optimizing $\underset{\sim}{t}$ are given below.

Gibbs steps:

(i)

Initialize $T$ w.r.t $Y$

–

Start with $T=t_{s-1}$ at step $s$ .

(ii)

Update $\Sigma=\Sigma_{s}$

–

$\Sigma\mid t_{s-1}\sim\ \textit{InverseWishart}(n-q,\hat{\Sigma})$ ,where, $\hat{\Sigma}=t^{{}^{\prime}}_{s-1}(I-Z(\underset{\sim}{x}){(Z(\underset{\sim}{% x})^{{}^{\prime}}Z(\underset{\sim}{x}))}^{-1}Z(\underset{\sim}{x})^{{}^{\prime% }})t_{s-1}$ .

(iii)

Update $B=B_{s}$

–

Use the fact that:

$\displaystyle B\mid\Sigma_{s},t_{s-1}\sim N_{q\times p}(\hat{B},\Sigma\otimes{% (Z(\underset{\sim}{x})^{{}^{\prime}}Z(\underset{\sim}{x}))}^{-1})$

where,

$\displaystyle\hat{B}=\Big{(}Z(\underset{\sim}{x})^{{}^{\prime}}Z(\underset{% \sim}{x})\Big{)}^{-1}Z(\underset{\sim}{x})^{{}^{\prime}}\underset{\sim}{y}$ (4)

(iv)

Update $T=t_{s}$ and go to step (2)

–

Use the fact that:

$\displaystyle\pi(\underset{\sim}{t}\mid\underset{\sim}{y},\underset{\sim}{x},B% ,\Sigma)\propto\pi(\underset{\sim}{y}\mid\underset{\sim}{t})\times\pi(% \underset{\sim}{t}\mid\underset{\sim}{x},B,\Sigma)=\prod_{i=1}^{p}\pi_{i}^{y_{% i}}(1-\pi_{i})^{1-y_{i}}\times N(Bz(\underset{\sim}{x}),\Sigma)$

We repeat the above steps until convergence. We then finally predict $T=\underset{\sim}{t}^{*}$ based on the optimized input $\underset{\sim}{t}$ using the multivariate t posterior predictive distribution:

$\displaystyle f(\underset{\sim}{t}^{*}\mid\underset{\sim}{x},\underset{\sim}{t% })=c\Big{\{}1+\frac{1}{\nu}\big{(}\underset{\sim}{t}^{*}-\hat{B}z(\underset{% \sim}{x})\big{)}^{{}^{\prime}}H\big{(}\underset{\sim}{t}^{*}-\hat{B}z(% \underset{\sim}{x})\big{)}\Big{\}}^{-\frac{p+\nu}{2}}$ (5)

where,

$\displaystyle c=\frac{\Gamma(\frac{p+\nu}{2})\sqrt{|H|}}{\Gamma(\frac{\nu}{2})% (\pi\nu)^{p/2}},$ $\displaystyle H=\frac{\nu V^{-1}}{1+z(\underset{\sim}{x})^{{}^{\prime}}D^{-1}z% (\underset{\sim}{x})},$ $\displaystyle D=\sum_{i=1}^{n}z(\underset{\sim}{x}_{i})z(\underset{\sim}{x}_{i% })^{{}^{\prime}},$ $\displaystyle V=(T-(\hat{B}Z)^{{}^{\prime}})^{{}^{\prime}}(T-(\hat{B}Z)^{{}^{% \prime}})$

Here $n$ is the number of observations, $Z=(z(\underset{\sim}{x_{1}}),z(\underset{\sim}{x_{2}}),\ldots,z(\underset{\sim% }{x_{n}}))$ , $\hat{B}$ is the least squares estimate of $B$ given in Eq. (4) and $T=(\underset{\sim}{t_{1}}$ , $\underset{\sim}{t_{2}},\ldots,\underset{\sim}{t_{p}})^{{}^{\prime}}$ .

Figure 1.

Autocorrelation plots for the MCMC samples of the model coefficients in scenario 1.

Since the target is not to find the optimized $\underset{\sim}{t}$ , but to re-map it to the binary response space as $\underset{\sim}{y}$ , we obtain predicted $y_{i}$ from the conditional distribution of $y_{i}\mid t_{i}$ as,

$\displaystyle y_{i}\mid t_{i}\equiv I(t_{i}>0)y_{i}+I(t_{i}\leqslant 0)(1-y_{i% })\hskip 5.690551pt\forall\hskip 5.690551pti$ (6)

The final goal is to find the optimum region in the input space, where the probability that $\underset{\sim}{y}=A$ for some user defined ordered set $A$ of 0’s and $1$ ’s can be optimized, we can sample from the posterior predictive distribution of $\underset{\sim}{y}$ an obtain an approximation of $P(\underset{\sim}{y}=A\mid\underset{\sim}{x})$ . Hence, we would be getting the estimates of the probabilities of such kind for each given $\underset{\sim}{x}$ in the input space. Now, the target is to know which $\underset{\sim}{x}$ or some neighborhood that optimizes such a probability so that it can be said to be the most reliable working region for producing the optimized output with respect to the user defined conditions. There are several ways of solving such an optimization problem over the input space $\underset{\sim}{x}$ . The most naive way to is to form a grid over the input space and compute $P(\underset{\sim}{y}=A\mid\underset{\sim}{x})$ for each grid point $\underset{\sim}{x}$ . This is the easiest way to obtain the desired $\underset{\sim}{x}$ but the efficiency of this process depends on the number of splits to obtain the grid points. Also, too many grid point may slow down the whole computation. Another possible way is to pick points randomly from the input space $\underset{\sim}{x}$ and obtain the probability $P(\underset{\sim}{y}=A\mid\underset{\sim}{x})$ . This process also suffers from the same issue as grid search method since significantly many points have to be obtained from the input space so that we do not miss the optimized input region.

Another process mathematically more sophisticated is to obtain the optimized point through ridge analysis where the input space is optimized first over the circumference of a hypersphere of radius $r$ and suppose $\underset{\sim}{x}_{r}$ is the optimized point. Then the process is continued with different values of $r$ and finally we find the optimized $\underset{\sim}{x}_{0}$ over the set $\{\underset{\sim}{x}_{r}:r=r_{1},r_{2},\ldots,r_{k}\}$ ; Hoerl (1985), Peterson (2004).

3. Simulation study

In order to check the performance of modeling multivariate binary responses through latent continuous variables, we perform simulation study on 2 different scenarios.

3.1 Scenario 1

In the first scenario, we consider a $2^{3}$ design with main effects only and 2 response variables. The model is simulated using some randomly generated values for $B=$ ((0.213797, 0.222853, $-$ 0.4923, $-$ 0.22765),( $-$ 0.07103, $-$ 0.38147, $-$ 0.200283, $-$ 0.41591)) and a randomly generated covariance matrix $\Sigma$ .

Table 1
Posterior means of the model coefficients (with Geweke Statistics) for scenario 1

Covariates	$y_{1}$		$y_{2}$
intercept	0.	51638 ( $-$ 0.76628)	0.	01521 ( $-$ 0.32662)
$x_{1}$	0.	52466 ( $-$ 0.35490)	$-$ 0.	95418 ( $-$ 0.08344)
$x_{2}$	$-$ 1.	39594 (0.56667)	$-$ 0.	97495 (0.96802)
$x_{3}$	$-$ 0.	56636 ( $-$ 0.68632)	$-$ 0.	94019 ( $-$ 1.16288)

We run the Gibbs steps 10,000 times with a burn-in period of 8,000. Table 1 shows the posterior means for the model coefficients along with the Geweke summary $Z$ -statistics for the convergence diagnostics. Moreover, Fig. 1 shows the autocorrelation plot for the MCMC samples generated from the posterior distributions of the model coefficients.

3.2 Scenario 2

In the second scenario, we consider a $2^{5}$ design including all second order interactions in the model and 3 response variables. The model is simulated using some randomly generated values of $B$ and $\Sigma$ . The true values for the model coefficients in $B$ is reported in Table 2.

Table 2
True values of the model coefficients for scenario 2

Covariates	$y_{1}$	$y_{2}$	$y_{3}$
intercept	0.388107	$-$ 0.23446	$-$ 0.45348
$x_{1}$	$-$ 0.19421	$-$ 0.22327	$-$ 0.08863
$x_{2}$	$-$ 0.27888	$-$ 0.42154	$-$ 0.00405
$x_{3}$	0.27665	$-$ 0.43752	0.16647
$x_{4}$	0.07281	$-$ 0.14129	$-$ 0.41661
$x_{5}$	0.14889	0.14572	$-$ 0.28989
$x_{1}x_{2}$	$-$ 0.46959	$-$ 0.06803	0.09909
$x_{1}x_{3}$	$-$ 0.46483	0.25307	0.31606
$x_{1}x_{4}$	$-$ 0.05085	$-$ 0.48013	0.36415
$x_{1}x_{5}$	$-$ 0.04651	0.25022	0.15794
$x_{2}x_{3}$	0.114258	0.26989	$-$ 0.18978
$x_{2}x_{4}$	0.30119	$-$ 0.00713	0.20135
$x_{2}x_{5}$	0.17896	$-$ 0.48578	$-$ 0.12737
$x_{3}x_{4}$	0.44510	$-$ 0.08644	0.35822
$x_{3}x_{5}$	0.14442	$-$ 0.48677	$-$ 0.38443
$x_{4}x_{5}$	$-$ 0.49158	0.08179	$-$ 0.40416

We run the Gibbs sampling 25,000 times with a burn-in period of 15,000. The posteror mean for the model coefficient along with the Geweke summary Z-scores for convergence diagnostics has been reported on Table 3. We also report the partial autocorrelation plot for the intercept and the main effects for $y_{1}$ in Fig. 2.

Table 3

Posterior means of the model coefficients (with Geweke Statistics) for scenario 2

Covariates	$y_{1}$		$y_{2}$		$y_{3}$
intercept	0	.34604 ( $-$ 0.11212)	$-$ 0	.38784 ( $-$ 0.57180)	$-$ 0	.45293 (1.28526)
$x_{1}$	$-$ 0	.16649 (0.96785)	$-$ 0	.94561 ( $-$ 0.40638)	0	. 00755 (0.38965)
$x_{2}$	$-$ 0	.28053 (0.0.19622)	$-$ 0	.27266 (0.0.17813)	$-$ 0	.39829 ( $-$ 0.82402)
$x_{3}$	0	.49028 ( $-$ 1.11020)	$-$ 0	.40402 (0.89770)	0	.19362 ( $-$ 0.36783)
$x_{4}$	0	.13212 (0.24512)	0	.01496 ( $-$ 1.48029)	$-$ 0	.43559 (0.78214)
$x_{5}$	0	.24187 ( $-$ 1.03254)	0	.00927 (0.55192)	$-$ 0	.21134 ( $-$ 1.06082)
$x_{1}x_{2}$	$-$ 0	.35518 (0.09125)	$-$ 0	.19972 (1.04044)	$-$ 0	.00131 (0.78449)
$x_{1}x_{3}$	$-$ 0	.34136 ( $-$ 1.27221)	0	.38820 (0.52100)	0	.21234 ( $-$ 1.26620)
$x_{1}x_{4}$	0	.04031 ( $-$ 0.26135)	$-$ 0	.38868 (0.18773)	0	.41726 ( $-$ 0.89546)
$x_{1}x_{5}$	0	.03406 (1.02003)	0	.40674 (0.16239)	0	.17845 (0.36976)
$x_{2}x_{3}$	$-$ 0	.14451 (0.69536)	0	.62541 (0.0.64980)	$-$ 0	.20136 (0.15692)
$x_{2}x_{4}$	0	.33649 (0.56497)	0	.22698 ( $-$ 0.0.30365)	0	.37316 (0.13669)
$x_{2}x_{5}$	0	.13622 ( $-$ 1.26942)	$-$ 0	.54623 (0.55638)	$-$ 0	.16667 ( $-$ 0.06231)
$x_{3}x_{4}$	0	.62216 (0.0.58762)	$-$ 0	.39444 (0.38841)	0	.55030 (0.78882)
$x_{3}x_{5}$	0	.08080 (0.06433)	$-$ 0	.41291 ( $-$ 0.11766)	$-$ 0	.30994 ( $-$ 0.46660)
$x_{4}x_{5}$	$-$ 0	.70652 (1.02351)	0	.02154 (0.03365)	$-$ 0	.63537 ( $-$ 0.59599)

Figure 2.

Partial autocorrelation plots for the MCMC samples of the intercept and the main effects corresponding to $y_{1}$ in scenario 2.

It is evident from both of the simulation studies in Scenario 1 and Scenario 2 that the proposed algorithm is quite efficiently estimating the true underlying model. The technique that has been implemented to generate the true model is to first generate true latent responses from the randomly generated model parameters and then generate the binary responses thrugh the conditional density given the auxiliary latent responses. In that sense, the proposed model is only able to see the data through the binary responses and has no knowledge of the underlying latent responses. Instead of that, the highest posterior density intervals and the posterior means for the model coefficients indicates that the model is well able to get close to the true scenario.

Also, the MCMC diagnostics through Geweke Z-scores and autocorrelation and partial autocorrelation plots indicates the MCMC has attained good mixing and proper convergence to the stationary distribution. Although, thinning has been applied to decrease the autocorrelation between the successive samples.

4. Application to fire research data

In this section, we consider a case study which involves cigarette ignition testing on soft furnishing. The goal of the study is to determine whether potency of cigarette as an ignition source could be moderated by modifying different cigarette characteristics. We consider a part of the John Kransky, Richard Gann, Keith Eberhardt 1986 data which consists of 8 responses and 5 predictors. The key aspect of this study is to check the resistance of the fabric to cigarette ignition. Among those 8 responses, 4 are categorical stating the number of ignitions out of 5 trials over different types of fabrics. The 5 predictors represent the different characteristic of cigarettes.

Since, the motivation of this paper is to find a reliable operating region for multiple binary responses, we will consider working only with the 4 categorical responses namely “California Test Fabric/Cotton Batting”, “100% Cotton Splendor Fabric/Polyur, 2045”, “100% Cotton Denim Fabric/Polyur, 2045” and “100% Cotton Splendor Fabric/Polyur, 2045”. We convert the categorical response problem to a binary response in the following way. The converted response variables take the value 0 if their observed value is less than or equal to 2 and 1 otherwise. As the goal for this experiment is to locate a region within the input space that minimizes the number of ignitions, the response vector of interest is (0,0,0,0). The 5 predictors are respectively “Citrate Concentration (0.8 and 0)”, “Paper Porocity (Low and High)”, “Expanded (No Fluff and Fluff)”, “Tobacco Type (Burley and Flu-Cured)”, “Circumference in MM”. The two levels for each predictor have been denoted as $-1(\textit{low)}$ and $+1(\textit{high})$ respectively in the data. We consider all main effects of the 5 independent variables along with all first order interactions. We restrict ourselves up to a second order polynomial to reduce the number of model parameters and We also do not include the squared main effects as we like to bypass any multicollinearity due to correlated predictors.

Table 4
Posterior means of model coefficients (with posterior standard deviations)

Covariates	$y_{1}$		$y_{2}$		$y_{3}$		$y_{4}$
intercept	$-$ 0.	746 (0.152)	0.	702 (0.141)	1.	339 (0.147)	1.	098 (0.174)
$x_{1}$	0.	093 (0.144)	$-$ 0.	272 (0.150)	$-$ 0.	346 (0.174)	$-$ 0.	092 (0.172)
$x_{2}$	0.	260 (0.134)	0.	269 (0.126)	0.	317 (0.178)	0.	522 (0.148)
$x_{3}$	$-$ 0.	890 (0.150)	$-$ 0.	922 (0.172)	$-$ 0.	321 (0.125)	$-$ 0.	326 (0.142)
$x_{4}$	$-$ 0.	114 (0.155)	$-$ 0.	153 (0.122)	0.	097 (0.169)	0.	068 (0.124)
$x_{5}$	0.	274 (0.142)	0.	085 (0.147)	$-$ 0.	097 (0.157)	0.	496 (0.166)
$x_{1}x_{2}$	0.	365 (0.111)	0.	177 (0.164)	0.	292 (0.176)	0.	122 (0.144)
$x_{1}x_{3}$	$-$ 0.	083 (0.140)	$-$ 0.	278 (0.158)	$-$ 0.	313 (0.120)	0.	119 (0.171)
$x_{1}x_{4}$	$-$ 0.	167 (0.124)	0.	036 (0.148)	0.	105 (0.153)	0.	122 (0.138)
$x_{1}x_{5}$	$-$ 0.	039 (0.176)	$-$ 0.	118 (0.106)	$-$ 0.	093 (0.155)	0.	110 (0.172)
$x_{2}x_{3}$	$-$ 0.	228 (0.152)	0.	240 (0.138)	0.	299 (0.169)	0.	302 (0.180)
$x_{2}x_{4}$	0.	067 (0.164)	0.	362 (0.150)	$-$ 0.	116 (0.149)	$-$ 0.	121 (0.156)
$x_{2}x_{5}$	$-$ 0.	375 (0.133)	0.	126 (0.108)	0.	106 (0.176)	$-$ 0.	527 (0.166)
$x_{3}x_{4}$	0.	086 (0.153)	$-$ 0.	140 (0.129)	0.	134 (0.172)	$-$ 0.	155 (0.164)
$x_{3}x_{5}$	$-$ 0.	248 (0.144)	0.	072 (0.155)	$-$ 0.	116 (0.157)	0.	300 (0.152)
$x_{4}x_{5}$	$-$ 0.	373 (0.145)	$-$ 0.	344 (0.123)	$-$ 0.	141 (0.172)	$-$ 0.	154 (0.159)

We run the initial Gibbs steps for estimating the model parameters 25,000 times with 15,000 burn-ins. Table 4 shows the posterior estimates of the model coefficients. To maintain consistency with the original dataset, we define $x_{1}$ , $x_{2}$ , $x_{3}$ , $x_{4}$ , $x_{5}$ to represent $V_{9}$ , $V_{10}$ , $V_{11}$ , $V_{12}$ , $V_{13}$ respectively, as the first 8 variables correspond to the responses. We will only work with the 4 categorical responses $V_{5}$ , $V_{6}$ , $V_{7}$ , $V_{8}$ and represent them as $y_{1}$ , $y_{2}$ , $y_{3}$ , $y_{4}$ respectively. Next, we then run the MCMC iterations on the posterior predictive density 1,000 times for each point in the input space to estimate the probability $P(y_{1}=$ 0, $y_{2}=$ 0, $y_{3}=$ 0, $y_{4}=$ 0 $|\underset{\sim}{x})$ . We run the MCMC estimates of the probabilities over a grid on the covariate space and then optimize over the grid. Thinning has been applied to the posterior to reduce autocorrelation between successive samples.

The decreasing pattern for higher lag in the partial autocorrelation plots for the intercepts and the main effects for response $y_{1}$ in Fig. 3 indicates a good mixing of the MCMC chain.

Based on the grid search method, the optimized points over the input space are given by, $x_{1}=0.926$ , $x_{2}=$ $-$ 0.971, $x_{3}=$ 0.944, $x_{4}=$ $-$ 0.782, and $x_{5}=$ $-$ 0.579. Moreover, the optimized probability is given by $P(y_{1}=$ 0, $y_{2}=$ 0, $y_{3}=$ 0, $y_{4}=$ 0 $|x)=$ 0.703. Hence, it can be concluded if the covariates can be set to the optimized values, there is a 70.3% chance that the proportion of ignition on the 4 given fabrics can be lowered by 40% or less (2 or less out of 5 trials).

It is quite important to run the Gibbs sampling until proper convergence in order to ensure an optimal representation of the binary responses in terms of the latent continuous variables. The optimization problem in the input space through ridge analysis or grid search method depends on the number of grid points or radius selected while doing the optimization so that they can represent the input space well enough. To have a decent understanding on the operating region for future target output, very fine segments of the input space has to be made in either case. A logistic model can also be used to model the probability surface though a higher degree polynomial so that it can easily capture the local maximas or minimas in case the output become multimodal. We might also encounter the issue of flat probability surface over a large neighborhood of the input space which will lead the estimation of the optimized input space to be unstable. Although, an optimized operational region is easy obtainable if the experimenter tries to retrieve the region instead of a point that has a probability higher than some pre-defined cut off. In general, the user would like to retrieve the region with highest 5% or 10% probability. A good understanding of the optimized operating region can be easily obtained from the different plots with respect to different pairs of covariates in Fig. 4.

Figure 3.

Partial autocorrelation plots for the MCMC samples of the intercept and the main effects corresponding to $y_{1}$ .

Figure 4.

Probability plots for the target response region over the domain of a covariate pairs when other covariates are set to their optimized values.

5. Extension to ordered multilevel responses

Let us consider $\underset{\sim}{y}=(y_{1},y_{2},\ldots,y_{p})$ is a multivariate $p\times 1$ response vector where each $y_{i}$ can be observed in any of it’s $q_{i}$ different levels, $\forall i\in 1,2,\ldots,p$ . In this scenario, we can easily extend the idea of binary response optimization if the levels are ordered in some format. For example, suppose the idea is to predict the grades of the students in some course based on the hours they studied in the preceeding week of exam and the hours they slept. Now, the response variable in this model is grades which can be ordered in a low to high format.

Suppose the different levels of $y_{i}$ ’s are encoded in an ordered format as $\{1,2,\ldots,q_{i}\}$ that is, with first $q_{i}$ natural numbers. Let us consider a latent variable $t_{i}$ where,

$\displaystyle t_{i}\in\left\{\begin{array}[]{ll}(-\infty,c_{1})&\textit{if}\ y% _{i}=1\\ {[}c_{y_{i}-1},c_{yi})&\textit{if}\ y_{i}\in\ \{2,3,\ldots,q_{i}-1\}\\ {[}c_{q_{i}-1},\infty)&\textit{if}\ y_{i}=q_{i}\end{array}\right.$

We can assume, $y_{i}$ ’s are conditionally independent given $t_{i}$ ’s and,

$\displaystyle y_{i}|t_{i}\sim\textit{Multinomial}(1,p_{1},p_{2},\ldots,p_{q_{i% }-1})$

where,

$\displaystyle p_{1}=P(t_{i}<c_{1})$ $\displaystyle p_{k}=P(c_{y_{k}-1}\leqslant t_{i}<c_{y_{k}})\ \forall y_{k}\in% \{2,3,\ldots,q_{i}-1\}$ $\displaystyle p_{q_{i}}=1-\sum_{k=1}^{q_{i}-1}p_{k}$

Hence we have a similar modeling scenario as in Section 2 where we now have conditionally independent responses distributed as multinomial random variables instead of Bernoulli. The posteriors densities of the parameters and the posterior predictive density of $\underset{\sim}{y}$ can be derived in a very similar fashion as in Section 2.

6. Discussion

The main contribution of the paper lies in obtaining a closed form Bayesian technique for multivariate binary response optimization problems. Looking at the problem from an unsupervised learning perspective of classifying the responses through a fully Bayesian computation not only has made the problem mathematically sound, but also has given tremendous computaional advantages. The interpretability of the latent continuous responses from a classification problem perspective allows us the extend the idea beyond a design of experiment problem as it carries over the covariance structure of any categorical responses to a continuous domain.

This method also provides a parameter uncertainty measure instead of a point estimate as in frequentist methods. This gives a great advantage for process control as the experimenter would not only have a proper knowledge about the optimal input settings, but also would know how many times the process can fail even when the inputs are set to their optimal settings. Extension of the whole idea to incorporate mutivariate categorical responses will simply broaden the above perspective.

Footnotes

Appendix

Appendices

References

Chen,

, & Ye,

(2011). A Bayesian hierarchical approach to dual response surface modelling. Journal of Applied Statistics, 38, 1963-1975.

Chib,

, & Albert,

J. H.

(1993). Bayesian analysis of binary and polychotomous response data. J. Am. Statist. Assoc, 88, 669-679.

Del Castillo,

Montgomery,

., & McCarville,

D. R.

(1996). Modified desirability functions for multiple response optimization. Journal of quality technology, 28, 337-345.

Derringer,

(1980). Simultaneous optimization of several response variables. Journal of quality technology, 12, 214-219.

Eberhardt,

K. R.

Levenson,

M. S.

, & Gann,

R. G.

(1997). Fabrics for testing the ignition propensity of cigarettes. Fire and Materials, 21, 259-264.

Jeong,

I. J.

Kim,

K. J.

, & Lin,

D. K.

(2010). Bayesian analysis for weighted mean-squared error in dual response surface optimization. Quality and Reliability Engineering International, 26, 417-430.

Johnson,

R. T.

, & Montgomery,

D. C.

(2009). Choice of second-order response surface designs for logistic and Poisson regression models. International Journal of Experimental Design and Process Optimisation, 1, 2-23.

Krasny,

Gann,

, & Eberhardt,

(1986). NIST Dataplot datasets. http://www.itl.nist.gov/div898/software/dataplot/datasets.htm.

Khuri,

A. I.

, & Cornell,

J. A.

(1996). Response surfaces: designs and analyses. CRC press, 152.

10.

Ko,

Y. H.

Kim,

K. J.

, & Jun.

C. H.

(2009). A new loss function-based method for multiresponse optimization. Journal of Quality Technology, 37, 50-59.

11.

Miro-Quesada,

Del Castillo,

, & Peterson,

J. J.

(2004). A Bayesian approach for multiple response surface optimization in the presence of noise variables. Journal of Applied Statistics, 31, 251-270.

12.

Myers,

R. H.

, & Carter,

W. H.

(1973). Response surface techniques for dual response systems. Technometrics, 15, 301-317.

13.

O’brien,

S. M.

, & Dunson,

D. B.

(2004). Bayesian multivariate logistic regression. Biometrics, 60, 739-746.

14.

Peterson,

J. J.

(2008).A Bayesian approach to the ICH Q8 definition of design space.Journal of biopharmaceutical statistics, 18, 959-975.

15.

Peterson,

J. J.

(2004). A posterior predictive approach to multiple response surface optimization. Journal of Quality Technology 36, 139.

16.

Peterson,

J. J.

Miro-Quesada,

, & del Castillo,

(2009). A Bayesian reliability approach to multiple response optimization with seemingly unrelated regression models. Quality Technology & Quantitative Management, 6, 353-369.

17.

Pignatiello,

J. J.

, Jr. (1993). Strategies for robust multiresponse quality engineering. IIE transactions, 25, 5-15.

18.

Roy,

, & Hobert,

J. P.

(2010). On Monte Carlo methods for Bayesian multivariate regression models with heavy-tailed errors. Journal of Multivariate Analysis, 101, 1190-1202.

19.

Sadilingam,

G. K.

(2011). Adaptive response surface method for efficient Bayesian reliability based design optimization. Proceedings of the 7th Annual GRASP Symposium, Wichita State University, 2011.

20.

Smith,

(1932). The titration of antipneumococcus serum. Journal of Pathology, 35, 509-526.

21.

Stockdale,

G. W.

, & Cheng,

(2009). Finding design space and a reliable operating region using a multivariate Bayesian approach with experimental design. Quality Technology & Quantitative Management, 6, 391-408.

22.

Vining,

, & Myers,

(1990). Combining Taguchi and response surface philosophies- A dual response approach. Journal of quality technology, 22, 38-45.

23.

Wang,

Ma,

Ouyang,

, & Tu,

(2016). A new Bayesian approach to multi-response surface optimization integrating loss function with posterior probability. European Journal of Operational Research, 249, 231-237.

Bayesian reliability optimization for multivariate binary responses

Abstract

Keywords

1. Introduction

2. Bayesian reliability for multivariate binary response

3.1 Scenario 1

Table 1 Posterior means of the model coefficients (with Geweke Statistics) for scenario 1

Table 2 True values of the model coefficients for scenario 2

Table 4 Posterior means of model coefficients (with posterior standard deviations)

6. Discussion

Footnotes

Appendix

Appendices

References

Table 1
Posterior means of the model coefficients (with Geweke Statistics) for scenario 1

Table 2
True values of the model coefficients for scenario 2

Table 4
Posterior means of model coefficients (with posterior standard deviations)