Abstract
Response Surface Methodology is a popular set of statistical techniques used to improve a system process. Peterson (2004) proposed a Bayesian multivariate response optimization method that considers the dependence structure among the responses when locating the optimal region as defined by some loss or desirability function. The main contribution of this paper lies in addressing the Bayesian reliability optimization for multivariate binary responses where the logistic models with traditional Bayesian reliability approach suffers from computational complexities. This work is focused on reducing the computational complexities by introducing latent variables in the response structure.
Introduction
Numerous Response Surface and Design of Experiment (DOE) methodologies including both frequentist and Bayesian approaches have been proposed to address the problem of optimizing multiple responses. Frequentist approaches include Harrington (1965), Derringer and Suich (1980), Del Castillo, Montgomery, and McCarville (1996) which are all based on the optimization of a desirability function with respect to the experimental factor levels. Pignatiello (1993), Ames et al. (1997) and Vinning (1998) proposed procedures based on minimization of a quadratic loss function. A Bayesian approach has been proposed by Raghunathan (2000) which analyzes the proportion of nonconforming units but it does not address optimization over response surfaces. Peterson (2004) introduced a Bayesian posterior predictive approach which optimizes a user specified target region on the response space based on the posterior predictive probability with respect to the input space. Peterson’s method also addressed specific weaknesses of the frequentist approach by taking into account the covariance structure of the responses and the parameter uncertainty of the models. Wang et. al. incorporated the measures of bias or robustness by combining the idea of quality loss function (Ko et. al., 2005) with Peterson’s Bayesian reliability approach and showed that the proposed method provides a more reasonable solution.
Although, Peterson’s approach has considered optimizing multivatiate quantitative responses only, the same idea can be extended to qualitative responses as well. For example, an output of interest can be defined in terms of whether or not multiple outputs meets certain quality characteristics or tests, whether the output product is free of multiple possible defects, and so on. Currently, no widely accepted methods exist for how to optimize response surfaces derived from experimental designs to address such questions. Moreover, the proposed method – which is based on Bayesian statistical theory – can be extended to a broader audience as it can solve many other experimental design problems like drug tests in the pharmaceutical industries or checking the effects of different chemical components (e.g. ignition tests) in the chemical industries or even in manufacturing domain. For a specific research interest, we can consider the serum testing experiment on infected mice (Smith, 1932) where different doses of serum were injected to mice and the response was their survival after seven days. Different other qualitative characteristics of those mice can be obtained as responses to retrieve the effects of different dosage of serum.
Recent advancements is different computation techniques and efficiency gained through higher computing power opened many opportunities to handle complex modeling scenarios where the data need to be handled in an indirect way. In many situations, unsupervised learning methods need to be implemented to gain better insight from the dataset. For example, let us consider a situation of a psychometric test where some patients have been given with different dosages of multiple madicines and after completing a full course of treatment, they are provided with a questionnaire with some sample questions as whether they have any pain (yes or now), whether they have developped any rashes (yes or no) or how they are feeling compared to the time before the treatment started (on a scale of 1 to 10) and so on. One another example could be a market survey for a cookie to optimize it’s quality based some responses like chewiness, hardness, sweetness etc. These are couple of typical scenarios where the responses are qualitative but correlated. A typical arcsine transformation would treat the responses independently and also will tend to skew if number of categories grows moderately large. Hence, it is important to handle the responses in a data driven way such that we can preserve the dependence structure among themselves.
In this paper, we extend the implementation of Peterson’s (2004) posterior predictive approach to binary responses. Although we limit our approach to the binary case, it can, however, be easily generalized to categorical responses with more than two levels. We illustrate this approach with a case study.
Implementation of the Bayesian posterior predictive approach to the binary logistic case induces a complicated posterior formulation. As a simple example, consider a univariate binary logistic model as,
where
Hence a closed form expression of the posteriors as well as the posterior predictive density is difficult to obtain even in the univariate case.
A nice approximation in the multivariate logistic regression case comes from the multivariate t distribution (Albert and Chib, 1993),
where,
and,
Here
The marginal posteriors do not to have closed form expressions and require the Metropolis-Hastings to approximate.
The Bayesian method proposed in this paper approaches the problem from a Bayesian classification modeling perspective. We iteratively map the latent response variables to the original observed binary responses in such a way that we get a optimal representation of the binary variables in terms of a separated non-overlapping hyperplane. In Section 2, we present a binary response optimization model along with the corresponding prior structure. Gibbs steps are proposed on how to obtain samples so that MCMC estimates of the posterior predictive probabilities can be obtained. In Section 3, simulation studies have been carried out for multiple scenatios to check the performance of the model and some convergence diagnostics for the MCMC have been provided. In Section 4, the model is implemented to a real data through different numerical optimizations over the posterior predictive density domain. In Section 5, we conclude with key findings along with the scope of the method’s applicability.
Bayesian reliability for multivariate binary response
Suppose
Let us define a latent variable
We adapt the Bayesian classical response surface technique to model
where,
The conditional independence simply translates the propagation of the orthogonality among the dimensions of the hyperplane to the dimensions of the multivariate responses. Hence, the conditional independence of the
where
Although the underlying model we have considered is a multivariate binary response model, considering a nested auxiliary model through latent variables simplifies the extraction of the dependence structure between the responses. The covariance matrix
The joint posterior density of
The posterior expression shows the underlying model is a simple Bayesian regression model nested within a binary structure. As the expressions containing
Since binary response modeling has been considered in this paper, the target for the experimenter would be of the form
Start with
Use the fact that:
where,
Use the fact that:
We repeat the above steps until convergence. We then finally predict
where,
Here
Autocorrelation plots for the MCMC samples of the model coefficients in scenario 1.
Since the target is not to find the optimized
The final goal is to find the optimum region in the input space, where the probability that
Another process mathematically more sophisticated is to obtain the optimized point through ridge analysis where the input space is optimized first over the circumference of a hypersphere of radius
In order to check the performance of modeling multivariate binary responses through latent continuous variables, we perform simulation study on 2 different scenarios.
Scenario 1
In the first scenario, we consider a
Posterior means of the model coefficients (with Geweke Statistics) for scenario 1
Posterior means of the model coefficients (with Geweke Statistics) for scenario 1
We run the Gibbs steps 10,000 times with a burn-in period of 8,000. Table 1 shows the posterior means for the model coefficients along with the Geweke summary
In the second scenario, we consider a
True values of the model coefficients for scenario 2
True values of the model coefficients for scenario 2
We run the Gibbs sampling 25,000 times with a burn-in period of 15,000. The posteror mean for the model coefficient along with the Geweke summary Z-scores for convergence diagnostics has been reported on Table 3. We also report the partial autocorrelation plot for the intercept and the main effects for
Posterior means of the model coefficients (with Geweke Statistics) for scenario 2
Partial autocorrelation plots for the MCMC samples of the intercept and the main effects corresponding to 
It is evident from both of the simulation studies in Scenario 1 and Scenario 2 that the proposed algorithm is quite efficiently estimating the true underlying model. The technique that has been implemented to generate the true model is to first generate true latent responses from the randomly generated model parameters and then generate the binary responses thrugh the conditional density given the auxiliary latent responses. In that sense, the proposed model is only able to see the data through the binary responses and has no knowledge of the underlying latent responses. Instead of that, the highest posterior density intervals and the posterior means for the model coefficients indicates that the model is well able to get close to the true scenario.
Also, the MCMC diagnostics through Geweke Z-scores and autocorrelation and partial autocorrelation plots indicates the MCMC has attained good mixing and proper convergence to the stationary distribution. Although, thinning has been applied to decrease the autocorrelation between the successive samples.
In this section, we consider a case study which involves cigarette ignition testing on soft furnishing. The goal of the study is to determine whether potency of cigarette as an ignition source could be moderated by modifying different cigarette characteristics. We consider a part of the John Kransky, Richard Gann, Keith Eberhardt 1986 data which consists of 8 responses and 5 predictors. The key aspect of this study is to check the resistance of the fabric to cigarette ignition. Among those 8 responses, 4 are categorical stating the number of ignitions out of 5 trials over different types of fabrics. The 5 predictors represent the different characteristic of cigarettes.
Since, the motivation of this paper is to find a reliable operating region for multiple binary responses, we will consider working only with the 4 categorical responses namely “California Test Fabric/Cotton Batting”, “100% Cotton Splendor Fabric/Polyur, 2045”, “100% Cotton Denim Fabric/Polyur, 2045” and “100% Cotton Splendor Fabric/Polyur, 2045”. We convert the categorical response problem to a binary response in the following way. The converted response variables take the value 0 if their observed value is less than or equal to 2 and 1 otherwise. As the goal for this experiment is to locate a region within the input space that minimizes the number of ignitions, the response vector of interest is (0,0,0,0). The 5 predictors are respectively “Citrate Concentration (0.8 and 0)”, “Paper Porocity (Low and High)”, “Expanded (No Fluff and Fluff)”, “Tobacco Type (Burley and Flu-Cured)”, “Circumference in MM”. The two levels for each predictor have been denoted as
Posterior means of model coefficients (with posterior standard deviations)
Posterior means of model coefficients (with posterior standard deviations)
We run the initial Gibbs steps for estimating the model parameters 25,000 times with 15,000 burn-ins. Table 4 shows the posterior estimates of the model coefficients. To maintain consistency with the original dataset, we define
The decreasing pattern for higher lag in the partial autocorrelation plots for the intercepts and the main effects for response
Based on the grid search method, the optimized points over the input space are given by,
It is quite important to run the Gibbs sampling until proper convergence in order to ensure an optimal representation of the binary responses in terms of the latent continuous variables. The optimization problem in the input space through ridge analysis or grid search method depends on the number of grid points or radius selected while doing the optimization so that they can represent the input space well enough. To have a decent understanding on the operating region for future target output, very fine segments of the input space has to be made in either case. A logistic model can also be used to model the probability surface though a higher degree polynomial so that it can easily capture the local maximas or minimas in case the output become multimodal. We might also encounter the issue of flat probability surface over a large neighborhood of the input space which will lead the estimation of the optimized input space to be unstable. Although, an optimized operational region is easy obtainable if the experimenter tries to retrieve the region instead of a point that has a probability higher than some pre-defined cut off. In general, the user would like to retrieve the region with highest 5% or 10% probability. A good understanding of the optimized operating region can be easily obtained from the different plots with respect to different pairs of covariates in Fig. 4.
Partial autocorrelation plots for the MCMC samples of the intercept and the main effects corresponding to 
Probability plots for the target response region over the domain of a covariate pairs when other covariates are set to their optimized values.
Let us consider
Suppose the different levels of
We can assume,
where,
Hence we have a similar modeling scenario as in Section 2 where we now have conditionally independent responses distributed as multinomial random variables instead of Bernoulli. The posteriors densities of the parameters and the posterior predictive density of
Discussion
The main contribution of the paper lies in obtaining a closed form Bayesian technique for multivariate binary response optimization problems. Looking at the problem from an unsupervised learning perspective of classifying the responses through a fully Bayesian computation not only has made the problem mathematically sound, but also has given tremendous computaional advantages. The interpretability of the latent continuous responses from a classification problem perspective allows us the extend the idea beyond a design of experiment problem as it carries over the covariance structure of any categorical responses to a continuous domain.
This method also provides a parameter uncertainty measure instead of a point estimate as in frequentist methods. This gives a great advantage for process control as the experimenter would not only have a proper knowledge about the optimal input settings, but also would know how many times the process can fail even when the inputs are set to their optimal settings. Extension of the whole idea to incorporate mutivariate categorical responses will simply broaden the above perspective.
