Abstract
Abstract
Mathematical models are widely used to predict removal rates of heavy metals from aqueous solutions. In this study, partial least squares (PLS), wavelet neural network (WNN), and support vector regression (SVR) were used to predict the amount of nickel (Ni) removal by dried sunflower stalks from a synthetic wastewater, based on experimental data sets from a laboratory batch mode. Effect of pH, initial concentration of the adsorbate, contact time, and dose of the adsorbent was considered in the adsorption process. Results showed that the coefficient of determination (R2 or q2) for the relationship between the model-predicted and experimental data of the final concentration of Ni at calibration stage was 0.87, 0.98, and 0.99 and for cross-validation was 0.73, 0.8, and 0.91 for PLS, WNN, and SVR models, respectively. It was concluded that the SVR model performed relatively better than the other models due to its capability in capturing the nonlinear relationships between the variables. Grid search was a fast and effective method that optimized the hyperparameters in SVR modeling. The SVR and WNN models were also used to investigate the effect of different variables on Ni removal efficiency. The results showed that initial concentration of Ni and pH of the solution were more important in the adsorption process, relative to contact time and dose of the adsorbent.
Introduction
Adsorption and biosorption are efficient methods that can be used for removal of HMs. Applying biotechnology in controlling and removing metal pollution is under much attention, and has gradually become a hot topic in the field of pollution control. Biosorption uses biomass to do the separation process (Basci et al., 2004; Zhang and Banks, 2006; Romera et al., 2008; Khambhaty et al., 2009; Bhatnagar and Sillanpaa, 2010).
Natural materials available in large quantities, or certain waste products from agricultural operations, may acquire potential as inexpensive adsorbents (Bhattacharya et al., 2006; Amarasinghe and Williams, 2007). Plant residues, which are mainly ligno-cellulosic materials, can inherently adsorb waste chemicals such as dyes and cations in water (Sun and Shi, 1998). One of the most effective adsorbents in this regard is sunflower stalks, which have a relatively large surface area that can provide intrinsic adsorptive cites for many adsorbates. The removal of metal ions such as copper, cadmium, zinc, and chromium ions from aqueous solutions has been studied using sunflower stalks as adsorbent (Sun and Shi, 1998; Benaissa and Elouchdi, 2007; Jain et al., 2009).
An attempt has been made in this article to study the adsorption behavior of dried sunflower biomass in aqueous solution containing Ni. The adsorption efficiency of a biosorbent can be evaluated quantitatively by a simple series of experiments including isotherms, kinetics (Zhang and Banks, 2006), and some computational methods. So, it is necessary to find some computational methods to correlate the removal efficiency of Ni(II) from wastewater with the process parameters.
There are a few researches in relation to modeling the removal efficiency of HMs through the biosorption process. The artificial neural network has been used successfully to predict the removal efficiency of some HMs from aqueous solutions (Prakash et al., 2008; Yetilmezsoy and Demirel, 2008; Sahinkaya, 2009). However, to the best of our knowledge, there is no report about using partial least squares (PLS), wavelet neural network (WNN), and support vector regression (SVR) to do this modeling.
The main objective of the present work was to test the PLS, WNN, and SVR models for prediction of Ni removal efficiency from wastewater by utilizing dried sunflower stalks as an economical bioadsorbent material.
Theory
Partial least squares
PLS is a multivariate statistical linear regression technique to extract the relationship between an array of output variables and an array of input variables. In this method, reduction in the dimensionality of the raw data is based on the input (X matrix) as well as the output data (Y matrix) and not just on the input data. Decomposition of X and Y is accomplished simultaneously as follows (Geladi and Kowalski, 1986):
where T and U are the X- and Y-block score matrices, P and Q are the X and Y loadings, and D and F are the residuals. PLS modeling contains the creation of relationship between projections of the dependent and independent variables (U and T, respectively), according to
Wavelet neural network
Wavelet is a type of transformation that retains both time and frequency information of the signal (Zhong et al., 2001). In chemical studies, the time domain can be replaced by other domains such as wavelength. Wavelet transformation (WT) has versatile basis functions to be selected based on the type of the signal analyzed. In WT, all basis functions ψa,b(X) can be derived from a mother wavelet, ψ(x), through the following dilation and translation processes:
where the parameters of translation are
where the asterisk (*) represents the complex conjugate.
The WNN consists of three layers: input, hidden, and output. The calibration steps of WNN are described in Zhang et al. (2001). Briefly, the connections between input and hidden units and between hidden and output units are called weights, wti and Wt, respectively. The dilation and translation parameters, at and bt, of the Morlet function for each node in the hidden layer are different, and they need to be optimized.
In WNN, the gradient descending algorithm is employed, and the error is minimized by adjusting Wt, wti, at, and bt parameters. These parameters are adjusted using ΔWt, Δwti, Δat, and Δbt formulae as follows:
where j is the number of iterations, and η and α are the learning rate and momentum term, respectively. The error function E is written as
where Vcn and Ven are the calculated and experimental values, respectively, and N is the number of data for calibration.
Support vector machine
Support vector machine (SVM) was introduced by Vapnik (1998). For a given regression problem, the goal of SVM is to find the optimal hyperplane, from which the distance to all the data points is minimum (Smola and Scholkopf, 1998). Here, a basic comprehensive description of the concept underlying SVR modeling will be presented.
Consider a data set consisting of
where ω
i
are the coefficients, and b is a threshold value. This approximation can be considered a hyperplane in the D-dimensional feature space F defined by the functions φ
i
(x) where the dimensionality can be very high, possibly infinite. Since φ is fixed, ω
i
can be determined from the data by minimizing the sum of empirical risk and a complexity term defined by
where ɛ is a parameter to be set a priori, and an error below ɛ is not penalized according to the following error function:
The SVM performs linear regression in a high-dimensional feature space using ɛ insensitive loss and at the same time, tries to reduce model complexity by minimizing ||ω||2. The constant γ>0 is a regularization constant determining the trade-off between training error and model flatness. Introducing the slack variables ξ and ξ*, SVM regression is formulated as a minimization of the following optimization problem:
The solution to the optimization just mentioned is given as (Vapnik, 1998) follows:
where the Lagrange multipliers α
i
and
The coefficients α and α* are obtained by maximizing the following quadratic form subject to the conditions stated earlier:
Once the coefficients are determined, the regression estimate is given by Equation (16). The threshold b is computed from the constraints in Equation (14) using the fact that the first constraint becomes an equality with ξ
i
=0 if 0<α
i
<γ, and the second constraint becomes an equality with
Materials and Methods
Batch experiments
A stock solution of Ni2+ (1000 mg/L) was prepared in deionized double distilled water using Ni nitrate. All working solutions of varying concentrations were obtained by successive dilution. The solution pH was adjusted to the required value by adding either 0.1 M HCl or 0.1 M NaOH using a pH meter (Metrohm, 827 pH Lab). Residual Ni concentration in the filtrate was determined by Atomic Absorption Spectrophotometer (Perkin Elmer 3030). The batch mode operation was used to study the removal of Ni from the synthetic wastewater. Due to the sole existence of Ni ions in the wastewater, there was no competence between Ni and other HMs. Adsorption experiments were carried out using 100 mL of Ni solution of desired concentration (10, 55, and 100 mg/L) at an initial pH of 2, 4.5, and 7, with three adsorbent dosages (0.5, 1.25, and 2 g sunflower stalks per 100 mL) in 150 mL plastic containers at room temperature of 25°C±1°C and an agitation speed of 180 rpm on a shaker (Edmund Buhler, SM 30 control) for 10, 65, and 120 min. At these predetermined times, the samples were filtered by using Whatman 42 filter paper. The average particle size of the adsorbent was 0.5–0.7 mm. The initial and final concentrations of Ni in the solution are denoted by Ci and Ce, respectively.
Data set
The data set consisted of 31 wastewater samples (Table 1). The dependent variable that each of the three models (PLS, WNN, and SVR) would predict was Ce of Ni ion after adsorption by sunflower stalks. The independent variables affecting Ce are dose of the adsorbent, Ci of Ni ion, pH of the solution, and contact time. The data set was randomly divided into two groups: calibration set (15 samples) and cross-validation set (16 samples).
C0=Initial concentration of Ni, C e =Final concentration of Ni after adsorption, Δ=Absolute error.
PLS, partial least squares; WNN, wavelet neural network; SVM, support vector machine; Ni, nickel.
Data analyses
The calculations were carried out by using a Pentium IV 1400 MHz computer running Windows 2000 operating system. The WNN and PLS models were programmed in our laboratory referring to the literature (Khayamian and Esteki, 2004; Esteki et al., 2007). The SVM software package ChemSVM including SVR was programmed by Suykens et al. (2002). Validation of this software has been tested in some applications in chemistry and chemical technology (Esteki et al., 2010).
PLS model
In order to evaluate the PLS model, the root mean squared error of calibration (RMSEC), root mean squared error of cross-validation (RMSECV), and RMSECVi were calculated. Internal consistency of the training set was confirmed by using leave-one-out (LOO) cross-validation method. The RMSEC and RMSECV included both interpolation and extrapolation information (samples within and beyond the range used for constructing the model), and the RMSECVi used only interpolation information (Quinones-Torrelo et al., 1999). Small differences between these three criteria would mean a robust model.
One of the most important parameters that should be optimized in PLS modeling is the number of latent variables (LVs). The number of LVs was selected on the basis of minimum RMSECV.
WNN model
The WNN model was constructed using the four effective parameters (i.e., dose of the adsorbent, Ci of Ni ion, solution pH, and contact time) as inputs. The network architecture consists of four neurons in the input layer corresponding to the four mentioned parameters. The number of neurons in the hidden layer was unknown and needed to be optimized. The output layer had one neuron that predicted the Ce of Ni. The WNN parameters consist of learning rate, momentum, and number of iterations. In order to determine the optimum number of neurons in the hidden layer, the RMSE against a different number of neurons in the hidden layer was plotted for calibration and cross-validation stages.
Momentum and learning rate are two other parameters of WNN modeling that should be optimized. In order to optimize these two parameters, all combinations of momentum and learning rate were used to construct the model and then, the RMSEC and RMSECV were calculated.
SVR model
The four aforementioned effective parameters were used to construct the LS-SVR model. To get the best performance of the LS-SVR model, corresponding parameters needed to be optimized. One of these parameters is kernel function. There is no systematic methodology for selection of the kernel function (Liu et al., 2008a). Moreover, the RBF could handle the nonlinear relationships between the affecting parameters and the target parameter (Liu et al., 2008b). Furthermore, the RBF kernel is often used for regression analysis because of its effectiveness and speed in training process (Pan et al., 2008). In addition to the selection of the kernel type function, there are two other parameters that need to be tuned: the regularization parameter γ in Equation (14) and the kernel parameter σ2. These two parameters are usually referred to as hyperparameters. The objective of tuning the hyperparameters is to make the LS-SVR model have a better generalization ability, which is usually evaluated by an estimated generalization error (Duan et al., 2003).
Different methods have been used to find the optimized values of hyperparameters, such as one at a time (Chen, 2008), grid search (García-Reiriz et al., 2008) and genetic algorithm (Kang et al., 2008). In the present work, two optimization methods were used. The first method was grid search, and the second is described as follows. In the second method, the models were constructed with all possible combinations of γ and σ2. Then, RMSEC and RMSECV were calculated for the calibration set. Finally, the model with minimum values for both RMSEC and RMSECV was selected, and the parameters of the model were chosen as the optimized values of γ and σ2. The γ and σ2 values were checked from 100 to 15,000 with the step of 100, because of the time needed to construct the model for each pair of the γ and σ2.
Results and Discussion
PLS model
Figure 1a shows the relationship between experimental and PLS-predicted Ce for calibration and cross-validation modes. The plots with high values of R2 and the random distribution of the residuals suggest appropriateness of the model. Figure 2a shows the residuals plot for PLS model, which can be more informative regarding model fitting to a data set. Figure 2a shows a random pattern in distribution of the residuals and suggests that the PLS model fits all the data points appropriately well.

Experimental versus calculated values of nickel (Ni) concentration for calibration and cross-validation in

Residual plots for
The results of the predicted Ce using LOO are presented in Table 1. It can be seen that the predicted values are not in good agreement with the experimental ones.
In addition, the PLS model was run for the complete calibration data set using the five LVs. The model's performance criteria are summarized in Table 2. The five LVs yielded the RMSEC and RMSECV values of 5.45 and 6.78, respectively (Table 2) with an R2 of 0.86 and q2 of 0.73. According to Table 2, regression coefficients are quite low. In addition, there is too much difference between R2 and q2, which obviously indicates that the correlation is poor, the response seems to be nonlinear, and the PLS model is inapplicable to adsorption studies.
RMSEC, root mean squared error of calibration; RMSECV, root mean squared error of cross-validation; RMSECVi; RMSEP, root mean squared error of prediction.
The calibrated model was applied to the test data set for prediction of Ce of Ni. The root mean squared error of prediction (RMSEP) is shown in Table 2. The PLS model yielded an RMSEP value of 7.89.
WNN model
The experimental values of Ce of Ni were plotted against the predicted ones by the WNN model (Fig. 1b). The R2 of calibration was 0.98, and the q2 of cross-validation was 0.80.
Figure 2b shows the residual plot, which reveals the appropriateness of WNN model in comparison with the PLS model. The values of RMSEC, RMSECV, and RMSECVi for the constructed model are shown in Table 2. According to Table 2, these three values are comparable, which means that the model is appropriate. It can be seen that according to all criteria, the WNN model is better than the PLS model.
In the next step, the constructed model was used to predict the Ce of Ni. The predicted values and the absolute errors are shown in Table 1. According to this table, there is a good agreement between experimental concentrations and the predicted ones using the WNN model. The RMSEP of this model was 6.29, which is lower than the PLS model (Table 2). This means that the correlation between parameters and Ce of Ni is not linear, and there is some nonlinearity in the system which may be modeled better with a nonlinear function.
Figure 3 shows the plot for different combinations of momentum and learning rate. It can be seen that the RMSE has its minimum value for all the tested combinations when the number of neurons in the hidden layer was eight.

RMSE versus number of hidden layers for
The RMSE against different numerical values of momentum and learning rate are plotted in Fig. 4. According to this figure, some large values of RMSEC and RMSECV prevent distinguishing the best point graphically. However, according to the values of errors, the optimized values of momentum and learning rate were 0.0055 and 0.078, respectively.

Plots of:
In the next step, the number of iterations should be optimized for constructing the model. Figure 5 shows the RMSEC and RMSECV in different iterations. This figure shows that the RMSE decreases for the calibration set when the number of iterations increases from 100 to 15,000. However, the RMSE increases for cross-validation when the number of iterations increases from 2,000. Therefore, the optimum number of iterations is selected as 2,000 to prevent over-fitting of the model.

Variation of RMSE versus number of iterations for calibration and cross-validation in the WNN model.
SVR model
The predicted Ce of Ni against the experimental ones for calibration and cross-validation modes of the SVR model are plotted in Fig. 1c. In addition, the main statistical parameters of the LS-SVR model are listed in Table 2. According to this table, the values of RMSEC, RMSECV, and RMSECVi are comparable. These results suggest that both interpolations and extrapolations of Ce values by the SVR model are reasonably adequate. The high calculated q2 of 0.91, and the low value of RMSECV of 3.64, as compared with the RMSEC of 2.5, suggests a good internal consistency as well as the predictive ability of the SVR model. The Ce of Ni predicted by LOO cross-validation are listed in Table 1. As can be seen, the predicted Ce are in good agreement with the experimental values. It is shown in Table 2 that RMSEP is 4.52 for the SVR model. The residual plot of SVR (Fig. 2c) shows a random pattern, which again confirms the suitability of the SVR model.
Figure 6 shows RMSEC and RMSECV for different γ and σ2 values. As shown, the optimized parameters are not the same for all the points in both calibration and cross-validation graphs. However, the optimized values should be selected based on the minimum values for both criteria. The level of errors for RMSEC (Fig. 6a) and RMSECV (Fig. 6b) tends to a minimum value as σ2 and γ decrease toward 100, and, therefore, the optimized σ2 and γ were selected as 100. The RMSEC and RMSECV were 2.75 and 3.74, respectively, for these selected hyperparameters.

Tuning of γ and σ2 for LS-SVR.
In the next step, the parameters were optimized using the grid search method. This method gave the optimized values of 73.47 and 48.72, respectively, for γ and σ2. The corresponding RMSEC and RMSECV to these values of hyperparameters were 2.50 and 3.64. The results are comparable for the proposed method and grid search, but the grid search has slightly better results. Additionally, it can be concluded that the grid search is a fast and effective method to optimize the hyperparameters in SVR modeling.
Comparison of the models
According to Tables 1 and 2, the SVR and WNN models have similar calibration statistics, but they differed in stability and prediction capability as measured by cross-validation using the external prediction set. The PLS model had weaker results in this respect as compared with the WNN and SVR models.
The cross-validation statistics of the SVR model were similar to those of calibration, which indicates the stability of this model. A weaker cross-validation performance was observed for the WNN model. The results of external predictions also support the fact that the SVR represented better prediction results than the WNN, whereas the WNN produced better results than the PLS.
According to the explanations just mentioned, it can be concluded that the performance of nonlinear calibration methods (SVR and WNN) in the prediction of adsorption of Ni to sunflower stalks is superior to the linear method (PLS model), whereas among the two nonlinear regression methods, the SVR represented slightly better prediction results.
Effect of the variables
In order to investigate the effect of variables (i.e., pH, contact time, dose of the adsorbent, and Ci of Ni), the SVM and WNN models were used. The models were constructed using all four variables, and then, the effect of each variable was evaluated by omitting it from the model. The RMSECV was calculated for the constructed models.
Figure 7 shows the results of this process. It can be seen that for both SVR and WNN models, the maximum increment of RMSECV was due to Ci, followed by pH, contact time, and dose of the adsorbent. This result has been previously proved, because at a low concentration, the ratio of available surface to the adsorbate ion concentration is larger; so, the removal is higher. However, in case of higher concentrations, this ratio is low; hence, the percentage removal is also less, and, therefore, the removal of Ni is dependent on the Ci (Jain et al., 2009).

Histogram of RMSECV corresponding to omitting different variables from the WNN and SVM models.
The second effective parameter was the solution pH. pH is among the important parameters for adsorption process. Experimental results showed that the amount of Ni removal was relatively low at a pH less than 2.0. This may be due to the fact that at a pH lower than 3.0, high concentrations of H+ ions compete with Ni for active sites, which results in the suppression of Ni adsorption on the surface of sunflower stalks. In addition, this batch experiment showed that adsorption of Ni ions decreased when the pH was higher than 7.0. This can be attributed to the fact that a high pH condition reduces the mobility of Ni due to the decrease in the exchangeable form, resulting in a decrease in the contact probability between adsorbent and adsorbate (Yetilmezsoy and Demirel, 2008).
The third effective factor is contact time. Basically, removal of the adsorbate is rapid, but it gradually decreases with time until it reaches equilibrium. The experimental data showed that a contact time of 60 min is generally sufficient to achieve equilibrium, and the adsorption does not change significantly thereafter. In most cases, equilibrium was almost attained in 10 or 20 min, depending on the values of operating variables. Therefore, contact time has relatively the same effect in the model.
The fourth variable (dose of adsorbent) is less effective in the Ni adsorption process, which means that probably 0.5 g of sunflower stalks is enough for effective adsorption in this range of Ni concentration in the aqueous solution. According to Fig. 7, both SVR and WNN models gave similar results in the adsorption of Ni using sunflower stalks.
Conclusions
In this study, on the basis of batch adsorption experiments performed with four different process variables (pH, Ci of adsorbate, contact time, and dose of adsorbent), an important objective was to obtain a model that could make reliable prediction of Ce of Ni in wastewater using the sunflower stalks. The linear and nonlinear models included PLS, WNN, and SVR. These models were validated using the LOO cross-validation method. Performance of the selected models was evaluated using criteria such as RMSEC, RMSECV, R2 for calibration, R2 for cross-validation, and RMSE of prediction. All the three models predicted Ce of Ni satisfactorily. However, the performance of SVR and WNN nonlinear models was relatively better than that of the PLS model. The SVR model can be used as a powerful tool for modeling Ni removal using sunflower stalks. It was observed that there was an acceptable agreement between the SVR model results and experimental data.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
