Abstract
Quantile regression models with errors in variables have received a great deal of attention in the social and natural sciences. Some efforts have been devoted to develop effective estimation methods for such quantile regression models. In this paper we propose a kernel-based orthogonal quantile regression model that effectively considers the errors on both input and response variables. We also provide a generalized cross validation method for choosing the hyperparameters and the ratios of the error variances which affect the performance of the proposed models. The proposed method is evaluated through simulations.
Keywords
Introduction
A great deal of attention has been focused on the problem of quantile regression (QR) estimation. Most of this attention has been paid to data measured exactly without error. The introductions and current research areas of the quantile regression can be found in Koenker (2005) and Takeuchi et al. (2006). On the other hand, QR analysis with errors in variables (EIV) is evolving, albeit slowly. See, for example, He and Liang (2000), Chesher (2001), Barnes and Hughes (2002), Steinwart and Christmann (2008), Ioannidesa and Matzner-Løber (2009), Wei and Carroll (2009), Ma and Yin (2011), Montes-Rojas (2011), Wang et al. (2012). Many areas of applied statistics have become aware of the problem of measurement error-prone variables and their appropriate analysis. However, less attention has been paid to QR with EIV than to mean regression with EIV because of two main difficulties for correcting the bias in QR caused by EIV (Wang et al. 2012). One is that a parametric regression-error likelihood is usually not specified in QR. The other is that the quantile of the sum of two random variables is not necessarily the sum of the two marginal quantiles. In addition, most literature has centered around the parametric approach in which the QR function is assumed to take on a particular functional form. The desire to investigate the effect of EIV in nonparametric QR leads to the subject of this paper.
In this paper we propose a kernel-based orthogonal QR (KBOQR) model with EIV by applying quantile loss function of orthogonal residuals to the formulation of support vector QR (SVQR) of Takeuchi and Furuhashi (2004). Unlike He and Liang (2000), the KBOQR avoids the assumption that the random errors in the response variable and the measurement errors in the input variables follow the same symmetric distribution. This is the first paper which utilizes the idea of support vector machine (SVM) for QR when the input variables have measurement errors. The SVM, first developed by Vapnik (1995) and his group at AT&T Bell Laboratories, has been successfully applied to a number of real world problems related to classification and regression problems. Takeuchi and Furuhashi (2004) first considered SVQR. Takeuchi et al. (2006) discussed several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Li et al. (2007) proposed a SVQR and derived a simple formula for the effective dimension of the SVQR model, which allows convenient selection of the hyperparameters.
The rest of this paper is organized as follows. Section 2 briefly describes the basic principle of orthogonal QR (OQR). Section 3 proposes the KBOQR and also presents a generalized cross validation (GCV) technique in order to choose the hyperparameters in the proposed KBOQR. Sections 4 and 5 present simulation study and conclusion, respectively.
Principle of OQR
In this section we briefly illustrate the principle of OQR which utilizes the quantile loss function of orthogonal residuals of Van Gorp et al. (2000).
Suppose that we have a sequence of samples
where
In typical statistical modeling, the form of
In the case of least squares estimation for the conditional mean, some authors proposed methods for correction of the measurement error (Van Gorp et al. 2000; Carroll et al. 2006). For convenience of illustration, we restrict ourselves to the case that
where
can be considered as the orthogonal residual rather than the vertical distance in regression space. In order to implement the estimator given by the OR problem, the error variances’ ratios must be specified a priori. Hence, we posit the parameter
For estimating the conditional quantile function we apply to orthogonal residuals the quantile loss function which is called check function and defined as
where
For our purpose we now reexpress the OQR problem Eq. (5) as follows. Since the check function
where
the OQR problem Eq. (5) can be written as
where
In the next section we will use Eq. (7) instead of Eq. (5) when deriving the KBOQR.
In this section we present a learning algorithm and a model selection procedure for KBOQR. For convenience, we illustrate the KBOQR under the setting that
Learning algorithm of KBOQR
Using the connection between Reproducing Kernel Hilbert Space (RKHS) and feature spaces we write the model
where the nonlinear function
where
We now propose an iterative procedure for learning KBOQR. The idea is to obtain
where
where
Let
where
Here
We now describe an iterative re-weighted least squares (IRWLS) procedure for solving the minimization problem Eq. (13). Similar IRWLS procedures were used in Shim and Hwang (2009) and Reiss and Huang (2012). Given the
with weights
are approximately equivalent to the estimating equations for the minimization problem Eq. (13). Here
Since the solutions to the linear equation system Eq. (17) cannot be obtained in a single step, we need to apply an iterative method which starts with initialized values of
Set the initial values Calculate Obtain Iterate steps until convergence.
The algorithm is iterated until the following stop criterion is satisfied:
where
We now illustrate the model selection method which chooses the appropriate values of the error variances’ ratio
where
we have
Then the ordinary cross validation (OCV) function can be obtained as
where
Replacing
In this section we perform simulation study to understand the effects of measurement errors and to demonstrate the performance of KBOQR under different error distributions and quantile levels. We are concerned with the KBOQR in which
Design
We generate 100 data sets of size 50 from each of the following 3 nonlinear EIV models:
Model 1: Model 2: Model 3:
Here, we assume that
For each simulated data set we compare the proposed KBOQR with SVQR. We are basically interested in estimating
For 3 nonlinear EIV models the
where
Comparison of MSEs for 100
Comparison of MSEs for 100
Comparison of MSEs for 100
The way of computing MSEs for training and test data sets can be explained as follows. First, we obtain the estimated QR function
Tables 1–3 show the results for the mean and standard deviation of 100 MSEs for each estimated QR function. Standard deviations are given in parenthesis. Boldfaced values indicate best performance/result in the particular categories of
Concluding remarks
In this paper, we dealt with estimating QR function of the nonlinear EIV model with KBOQR. We found that the KBOQR provides good results in estimating QR function for the given examples. The KBOQR also makes the model selection easier and faster than a leave-one-out cross validation or
To conclude, the KBOQR basically have two advantages. One is that this method takes over advantages that SVM works very well for a number of real world problems and overcomes the curse of dimensionality. Thus, the KBOQR can be applied easily and effectively to the nonlinear EIV model with high dimensional input vector. The other is that this method can estimate QR function without knowledge of
Footnotes
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology with grant no. (NRF-2014R1A1A 2054917, NRF-2015R1D1A1A01056582). This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2015S1A3A2046715). The present research was conducted by the research fund of Dankook University in 2017.
