The artificial neural network-based QSPR and DFT prediction of lipophilicity for thioguanine

Abstract

By the importance of exploring anti-cancer properties of thioguanine (TG), the relationships between quantum chemical indices and lipophilicity of TG tautomers were investigated using the quantitative structure-property relationship (QSPR) approach in two isolated and chitosan-encapsulated states. Accordingly, twenty numbers of different tautomeric forms of TG were selected to predict the logP using the QSPR models. Density functional theory (DFT) calculations along with Dragon package were applied to estimate the required quantum chemical descriptors. The Pearson correlation coefficient statistical test and Kennard-Stone algorithm were used to measure the statistical relationship and data splitting into training and testing set, respectively. Furthermore, the multiple linear regression (MLR) and artificial neural network (ANN) methods were employed for generating the models. In this regard, the analysis of variance (ANOVA) was used to form a basis criterion for testing the significance of MLR and ANN results. Moreover, the leave one out (LOO) method was used for examining the prediction efficiency of select models. The obtained result indicated benefits of proposed models for predicting reliable results of logP.

Keywords

Thioguanine QSPR models lipophilicity DFT artificial neural network

1 Introduction

Thioguanine (TG), or 6-thioguanine, has been used as an anti-cancer drug with the metabolic inhibition of acute lymphocytic leukemia, acute myeloid leukemia, and chronic myeloid leukemia [1, 2]. TG is indeed a derivative of purine nucleobases with the sulfur atom instead of the oxygen atom of guanine [3]. Accordingly, TG has been found as a compound with important pharmacological activity in cancer treatment such with the characteristic antitumor and antineoplastic functions [4]. The incorporation of TG into DNA and RNA in place of the original guanine could lead to the death of cancer cells [5]. Being a heterocyclic compound, different tautomeric forms of TG could be found similar to the condition of original nucleobases with impacts on the electronic and structural features of the parent TG compound [6]. Generally, the tautomerism is a common occurring process in biological systems with impacts on pharmacological activity of drug compounds; hence, it is an important issue to explore the features of tautomerism of drug compounds regarding the drug design purposes [7, 8]. To prevent occurrence of such process, additional of functional groups to the molecular structure could help as a protecting agents [9 –11]. In this regard, evaluating the electronic features of compounds could help to predict their pharmacological properties for developing new therapeutic agents [12, 13]. The electronic features such as ionization energy, chemical potential, electron affinity, frontier molecular orbitals energy, chemical hardness and softness, electronegativity, and electrophilicity are some of known quantum chemical descriptors for approaching the goals of learning structure-activity (or property) relationships [14]. Accordingly, the quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) analyses have been developed for approaching the prediction of activity and property of compounds with significant roles in various application fields of chemistry, biochemistry, and pharmacology [13]. In both of QSAR and QSPR analyses, the relationship between pharmacological activities and properties and the microscopic electronic features are investigated [14]. Earlier works showed benefits of employing such predicting methods for approaching the goals drug design and development processes [15 –17]. Indeed, in the cases of so many problems of complicated systems, performing such molecular modelling study is a must to predict and interpret the investigating results [18 –20].

Lipophilicity, as indicated by logP in this work, is an important physicochemical property for conducting the efficiency of a drug by providing insightful information about solubility, adsorption, and membrane penetration of a new synthetic compound [21]. Hence, several works have been dedicated to know about prediction of such important property of a compound even prior to its synthesis [22 –24]. Accordingly, the present work was done to find available relationships between the calculated quantum chemical indices and logP of TG anti-cancer drug. To this aim, twenty tautomeric structures of TG were involved in prediction of logP using the QSPR models. In this regard, the electronic features including optimized energy, dipole moment, energies of the highest occupied and the lowest unoccupied molecular orbitals (HOMO and LUMO), energy gap, molecular polarizability, molecular quadrupole moment, and molecular volume were employed as the main quantum chemical descriptors. As earlier works indicated the importance of computer-based works for predicting important features and activity/property of the compounds and processes [25 –27], this work was done to see what has been going on during the tautomerism of TG. It is worth to mention that both of compounds and processes are able to be predicted by the computer-based tools to show the advantages/disadvantages of the proposed processes [28 –30]. In this regard, several approaches have been developed to approach the goals besides developments of mathematical theories and algorithms [31 –33]. In the cases of investigating complicated systems, predicting the desired activity/property is essential before performing expensive experiments [34 –36]. Indeed, the field of materials design and developments is a non-stop field depending on developments of more insightful information [37 –40]. Both of QSAR and QSPR methods are among the computer-based predicating tools for providing information for the chemical/biochemical related compounds and processes, in which such advantages were employed to perform the current research work.

2 Materials and methods

The materials of this work were all possible tautomeric structures of TG including twenty structures as shown in Fig. 1 in two isolated and chitosan-encapsulated states [41, 42]. All structures were optimized using the Gaussian program package through the B3LYP/6-31G^* density functional theory (DFT) calculations [43]. Subsequently, the quantum chemical descriptors were evaluated for the optimized structures including optimized energy, dipole moment, HOMO, LUMO, energy gap, molecular polarizability, molecular quadrupole moment, and molecular volume. Moreover, other required descriptors were obtained from the Dragon program package [44]. Total numbers of obtained descriptors through DFT calculations and Dragon package evaluation were 572 and 706 for the isolated and chitosan-encapsulated states, respectively. The Pearson correlation coefficient statistical test was used to delete the data which met the R² > 0.8 using the MATLAB program package [45, 46]. By doing such process, the numbers of descriptors were decreased to 66 and 100 for the isolated and chitosan-encapsulated states, respectively. Finally, the lipophilicity was obtained for the tautomers by calculating the values of partition coefficients (logP) using the Hyperchem program package [47]. To perform the regression analyses, the methods of multiple linear regression (MLR) and artificial neural network (ANN) were selected for generating the models using the SPSS program package [48]. The existing data were divided into training set and testing set groups by including fourteen TG structures in the training set and six TG structures in the testing set. The data splitting procedure was performed based on the Kennard-Stone algorithm using the Chemoface program package [49].

Fig. 1

Molecular structures of twenty different tautomeric forms of TG. The ball with white, gray, blue and yellow color shows the hydrogen, carbon, nitrogen, and sulfur atom, respectively.

After obtaining the optimized structures, the required quantum chemical descriptors could be mainly evaluated by the values of HOMO and LUMO. Based on the Koopmans’ approximation [50], each of the ionization energy (I) and the electron affinity (A) could be found by Equations (2). $I \approx - HOMO$ (1) $A \approx - LUMO$ (2)

Accordingly, the Mulliken electronegativity (χ) [51] could be expressed by Equation (3). $χ = \frac{I + A}{2}$ (3)

Each of the chemical hardness (η) and the chemical softness (S) [52] could be approximated by Equations (5). $η = 0.5 (\frac{\partial^{2} E}{\partial N^{2}}) \approx I - A$ (4) $S = \frac{1}{η}$ (5)

Finally, the expressed global electrophilicity (ω) in terms of chemical hardness and chemical potential (µ) [53] could be calculated by Equation (6). $ω = \frac{μ^{2}}{2 η}$ (6)

To deal with the number of calculated descriptors and the number of tautomers, MLR and ANN methods were employed for generating the models. The prediction of one dependent variable through relationship description between this dependent (response) variable and two or more independent (explanatory) variables were explored by MLR [54]. Accordingly, ANOVA was used to form a basis criterion for testing the significance of MLR based on the number of explanatory variables reflected by the degree of freedom to show the effects of parameters [55]. For a linear regression model as described by Equation (7), the ANOVA analysis could be described by Equation (8). $Y_{i} = β_{0} + β_{1} X_{1 i} + β_{2} X_{2 i} + . . . + β_{α} X_{α i} + ɛ_{i} i = 1, 2, 3, . . ., n$ (7)

Where Y, X, B₀, B, ɛ, i, and α are response variable, explanatory variable, intercept term, model parameters, residual or error term, i^th observation, and number of explanatory variables, respectively. $\sum_{i = 1}^{n} {Y_{i}}^{2} = \sum_{j = 1}^{α} \overset{\land}{β_{j}} \sum_{i = 1}^{n} Y_{i} X_{ji} + \sum_{i = 1}^{n} {ɛ_{i}}^{2}$ (8)

Where the first, second, and third term are total sum of squares (SST), sum of squares due to regression (SSR) and sum of squares of error/residuals (SSE), respectively, providing the ANOVA table (Table 1) [55].

Table 1

The analysis of variance (ANOVA) table

Source of variation	Degree of freedom	Sum of square	Mean sum of square	F
Due to regression (ESS)	p	$\sum_{j = 1}^{α} \overset{\land}{β_{j}} \sum_{i = 1}^{n} Y_{i} X_{ji}$	$MSR = \frac{\sum_{j = 1}^{α} \overset{\land}{β_{j}} \sum_{i = 1}^{n} Y_{i} X_{ji}}{p}$	$F = \frac{MSR}{MSE}$
Due to residual (RSS)	n-p-1	$\sum_{i = 1}^{n} {ɛ_{i}}^{2}$	$MSE = \frac{\sum_{i = 1}^{n} {ɛ_{i}}^{2}}{n - p - 1}$
Total	n-1	$\sum_{i = 1}^{n} {Y_{i}}^{2}$

Earlier works have also reported benefits of employing non-linear approaches such as ANN, to predict more efficient characters in comparison with linear based methods [56]. In this regard, ANN needs at least three main layers of input, output, and hidden layers, in which the number of input and output layers and included neurons are dependent on the number of input variables and the associated outputs with each input whereas the number of hidden layers could be different according to the problem state [57]. In this work, the perceptron model of ANN, as a convenient and basic building block of ANN, was applied including four parts of input values, weights and bias, weighted sum, and activation function as described by Equations (10) (Fig. 2) [58].

Fig. 2

The perceptron rule of ANN.

$y = \sum_{i = i}^{n} (w_{i} . x_{i}) + bias$ (9) $w_{i, new} = w_{i} + (r . (y_{expected} - y_{calculated}) . x_{i})$ (10)

Where y, w, x, and r are output, weight, input and learning rate, respectively. As described by Equation (9), the numerical inputs multiply with respective weights (called weighted sum) and then those products together are added to bias. Finally, the activation function based on a set of rules, returns a final output by taking the weighted sum and the bias. However, the perceptron must meet the bias as a threshold before the output production.

The Weights’ method was initiated for interpretation of ANN in order to show the relative importance of various inputs [59]. Using connecting weights could provide understanding of an input relative contribution, as well as a hidden neuron itself [60]. Hence, this algorithm works based on partitioning the connecting weights to determine the relation contribution of each input of hidden neuron in hidden-output connection weights of that hidden neuron. Indeed, the relative importance of each input could be calculated through dividing the absolute value of input-hidden layer connecting weight by the sum of absolute value of input-hidden layer connecting weight of all input neurons multiplied by 100. For one hidden layer like our work, the relative importance could be estimated by Equation (11) [61]. $Q_{i} = \frac{| w_{i} |}{\sum_{i = 1}^{n} | w_{i} |} \times 100$ (11)

Where Q, w, and n are relative importance, connecting weight, and number of inputs, respectively.

Besides developing and employing the models, their validation is at the most importance especially for proposing in QSPR analysis; therefore, the leave one out (LOO) technique was employed in this work for such models validations [62]. The validation process of developed model was done by estimating two statistical parameters including determination coefficient (R²) and cross-validation coefficient of determination $(R_{cv}^{2})$ for external and internal validations, respectively. The value of R², as one of the statistical measurement parameter, helps to analyze the strength of model to replicate observed estimations outcomes using Equation (12) [63]. $R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(X_{i} - Y_{i})}^{2}}{\sum_{i = 1}^{n} {(X_{i} - \overset{__}{X_{i}})}^{2}}$ (12)

Where $X, \bar{X}, Y$ , and n are the observed data, mean of observed data, calculated data by model, and the values of data set, respectively. Additionally, the value of $R_{cv}^{2}$ could provide a reasonable way to validate the power of prediction of proposed regression model without the new set data or data splitting evaluated by Equation (13) [62, 63]. $R_{cv}^{2} = 1 - \frac{\sum_{i = 1}^{training set} {(X_{i} - Y_{i})}^{2}}{\sum_{i = 1}^{training set} {(X_{i} - \overset{__}{X_{i}})}^{2}}$ (13)

Furthermore, in order to make validation of model robustness and reliability, the prediction coefficient of determination (R²_prediction) was evaluated by Equation (14) not to use the date of data set composition for developing the prediction models [63]. $R_{prediction}^{2} = 1 - \frac{\sum_{i = 1}^{test set} {(X_{i} - Y_{i})}^{2}}{\sum_{i = 1}^{test set} {(X_{i} - \overset{__}{y_{tr}})}^{2}}$ (14)

Where X, Y, and ${\bar{y}}_{tr}$ are the observed value, calculated value, and mean value of the activity in the training set, respectively.

3 Results and discussion

3.1 Multiple Linear Regression (MLR) model

By the employed ANOVA for forming a basis criterion for testing the significance of MLR model, the performed analyses were listed in Table 2. It is known that the F-test is a tool to determine the difference of variances among two populations (between and within the groups) [64]. Accordingly, the ANOVA analysis was run with regarding the H₀ and H₁ hypotheses for measuring the significant level (α) as described by Equation (15).

Table 2
ANOVA analysis results

State Source of variation Degree of freedom Sum of square Mean sum of square F

Isolated ESS 9 2.759 0.307 1.624

RSS 10 1.888 0.189

Total 19 4.647 –

Encapsulated ESS 11 60.054 5.459 1.456

RSS 8 29.997 3.750

Total 19 90.051 –

State	Source of variation	Degree of freedom	Sum of square	Mean sum of square	F
Isolated	ESS	9	2.759	0.307	1.624
	RSS	10	1.888	0.189
	Total	19	4.647	–
Encapsulated	ESS	11	60.054	5.459	1.456
	RSS	8	29.997	3.750
	Total	19	90.051	–

$H_{0} : μ_{1} = μ_{2} = μ_{3} = \dots = μ_{k} k : number of groups H_{1} : Means are not equal$ (15)

The null hypotheses (H₀) tells no difference in mean values and the against hypotheses (H₁) tells inequality in mean values, which could be accepted/rejected by the value of 0.05 at the 95% significant level [64]. Considering the degree of freedom ESS and RSS, the critical values of different tautomeric forms of TG in the isolated and chitosan-encapsulated states are 3.0204 and 2.9480, respectively [64]. Referring to Table (2) and comparing the F values with the critical values, 1.624 < 3.0204 for TG the TG in isolated state and 1.456 < 2.9480 for the TG in chitosan-encapsulated state were found revealing that the H₀ could not be rejected. In other words, no significant difference was seen for the mean values revealing insufficiency of MLR model for predicting logP.

3.2 Perceptron Artificial Neural Network (PANN) model

In contrast with other common statistical modelling techniques, the PANN model is independent of conditions of normal distribution, linearity, homogeneity of variance, and so on [65]. Hence, based on such characteristic benefits and insufficiency of the MLR model, the PANN model was employed to drive a relationship between the evaluated descriptors and the corresponding logP of TG in the isolated and chitosan-encapsulated states. After splitting existing data into training set and test set, the PANN model was applied using the hyperbolic tangent function as the activation function and setting the initial values of weights to zero. The weights adjustment of network was performed by PANN through back propagating errors to minimize the differences between the target data and the estimated outputs. The data of sum of squared error, relative error, and training time of PANN for training and testing sets of TG were listed in Table 3. Additionally, the content of Table 4 and Fig. 3 show the PANN estimation of existing data of TG in the isolated and chitosan-encapsulated states, numerically and schematically, respectively. By showing the effects of hidden layer on the evaluated values of logP of the isolated TG, it could be mentioned that the independent variables with negative values of weight have positive effects on the evaluated value of logP and vice versa. On the other hand, the effects of hidden layer on the evaluated values of logP of the chitosan-encapsulated TG were positive and negative/positive values of weight of independent variables showed negative/positive effects on the logP. Moreover, the relative contributions of independent variables (inputs) were exhibited in Table 5 and Fig. 4 based on the weight method. In order to clarify the present or absent effects of chitosan-encapsulation on the evaluated values of logP, the differences between relative contributions of isolated and encapsulated states (Q_encapsulated –Q_isolated) were summarized in Table 5 and Fig. 5. The PANN results indicated that electronegativity and polarizability showed the maximum variation in the presence of chitosan-encapsulation in comparison with the isolated state among all the independent variables.

Table 3
PANN results

Group Parameter Isolated state Encapsulated state

Training set Sum of squared errors 7.311 7.504

Relative error 0.975 0.536

Training time 00:00:00.012 00:00:00.008

Testing set Sum of squared errors 1.527 1.918

Relative error 0.955 0.435

Group	Parameter	Isolated state	Encapsulated state
Training set	Sum of squared errors	7.311	7.504
	Relative error	0.975	0.536
	Training time	00:00:00.012	00:00:00.008
Testing set	Sum of squared errors	1.527	1.918
	Relative error	0.955	0.435

Table 4

PANN estimations

Independent variables		Dependent variables
		Isolated state		Encapsulated state
		Hidden layer	Output layer	Hidden layer	Output layer
		H(1:1)	logP	H(1:1)	logP
Input layer	Bias	0.472	n/a	0.290	n/a
	Hardness	–0.277		–0.375
	Softness	0.321		0.301
	Electrophilicity	–0.367		0.290
	Electronegativity	0.135		–0.478
	Gap	0.305		0.385
	Energy	–0.412		–0.352
	Polarizability	0.364		–0.214
	Volume	–0.315		–0.462
	Quadrupole	–0.033		0.201
	Dipole moment	–0.382		0.411
	HOMO	0.322		0.325
	LUMO	0.230		–0.433
Hidden layer	Bias	n/a	0.018	n/a	–0.010
	H(1:1)		–0.317		–0.199

Fig. 3

The PANN estimation of TG in (a) isolated and (b) chitosan-encapsulated states.

Table 5

The relative contribution (%) of each descriptors in the output

Descriptors	Q_isolated	Q_encapsulated	Q_encapsulated –Q_isolated
Hardness	7.999	8.872	0.873
Softness	9.269	7.121	–2.149
Electrophilicity	10.598	6.861	–3.737
Electronegativity	3.898	11.308	7.410
Gap	8.807	9.108	0.301
Energy	11.897	8.327	–3.570
Polarizability	10.511	5.063	–5.448
Volume	9.096	10.930	1.834
Quadrupole	0.953	4.755	3.802
Dipole moment	11.031	9.723	–1.308
HOMO	9.298	7.689	–1.610
LUMO	6.642	10.244	3.602

Fig. 4

The assessment of relative contribution of TG in (a) isolated and (b) chitosan-encapsulated states.

Fig. 5

The effect of encapsulated state on the logP using difference of relative importance.

3.3 Quantitative Structure-Property Relationship (QSPR) model

Three linear models were driven using the PANN model between descriptors and logP; one model for TG in the isolated state and two models for TG in the chitosan-encapsulated state.

For the isolated TG, the model was driven as mentioned below: $logP = 70.668 (\pm 20.748) - 3.334 (\pm 0.648) Mor 32 v + 1.008 (\pm 0.291) Volume$ (16)

where N_training = 14, N_test = 6, R = 0.4617, R² = 0.2132, RMSE = 0.4359, R_cv = 0.0678, R²_cv = 0.0046, RMSE_cv = 0.572, R_prediction = 0.6797, R²_prediction = 0.4620

For the chitosan-encapsulated TG, two models were driven as mentioned blew: $logP = - 19 . 102 (\pm 2.977) + 66.071 (\pm 6.996) Du - 91 . 875 (\pm 12.288)$ (17) $Dm + 19.943 (\pm 3.773) MATs 3 p + 13.481 (\pm 3.515) Mor 26 u$

where N_training = 14, N_test = 6, R = 0.9675, R² = 0.9360, RMSE = 0.6397, R_cv = 0.9136, R²_cv = 0.8346, RMSE_cv = 1.068, R_prediction = 0.8491, R²_prediction = 0.7210. $logP = - 27 . 003 (\pm 4.110) + 74.834 (\pm 6.772) Du - 97 . 108 (\pm 10.218) Dm +$ (18) $17.529 (\pm 3.228) MATs 3 p + 10 . 651 (\pm 3.092) Mor 26 u + 5.737 (\pm 2.260) H 2 u$

where N_training = 14, N_test = 6, R = 0.9811, R² = 0.9625, RMSE = 0.4897, R_cv = 0.8631, R²_cv = 0.7450, RMSE_cv = 0.9185, R_prediction = 0.9110, R²_prediction = 0.829.

In the reported models; logP, Mor32v, volume, lipophilicity, Du, Dm, MATs3p, Mor26u, and H2u are the signal 32 / more weighted by van der Waals volume, the molecular volume calculated by Gaussian, the D total accessibility index/ unweighted, the D total accessibility index / weighted by mass, the Moran autocorrelation of lag weighted by polarizability, the signal 26 / unweighted and H autocorrelation of lag 2 / unweighted, respectively. All the standard deviations were written in the parentheses for each coefficient. In general, a reliable model should possess lower standard errors and higher correlation coefficients. In other words, a developed QSPR model is reliable in the satisfactory levels of R≥0.8 or R²≥0.6, R²_cv≥0.5, and R²_prediction≥0.6, for its correlation coefficient, cross-validated coefficient, and correlation coefficient, respectively, with the significant level F higher than 95% [66]. Consequently, the proposed equation of logP showed higher satisfactory in the chitosan-encapsulated stare in comparison with the isolated state. Additionally, the ANOVA analysis for three proposed models were reported in the Table 6. Regarding the degrees of freedom ESS and RSS; 3.9823, 3.6331, and 3.6875, were obtained for the critical values of first, second, and third model of TG [64]. Referring to Table 6 and comparing the F values with critical values, it could be mentioned that the proposed models of PANN were reliable for predicting logP of TG.

Table 6

ANOVA analysis of three proposed QSPR models

Model	Source of variation	Degree of freedom	Sum of square	Mean sum of square	F
1	ESS	2	3.090	1.545	15.097
	RSS	11	1.126	0.102
	Total	13	4.215	–
2	ESS	4	83.724	20.931	32.887
	RSS	9	5.428	0.636
	Total	13	89.452	–
3	ESS	5	86.095	17.219	41.03
	RSS	8	3.357	0.420
	Total	13	89.452	–

4 Conclusion

In the present report, the existence of a relationship between the quantum chemical descriptors and the lipophilicity of TG in the isolated and chitosan-encapsulated states was investigated using the QSPR method. Based on the employed methods and obtained results, it was found that the main goal of this work could be achieved by the ANN model better than the MLR model. Accordingly, three QSPR models were obtained based on the ANN model for showing the relationships of descriptors and logP. The results also indicated the chitosan-encapsulated TG structures were more suitable than the isolated sate for achieving such QSPR models. As a consequence, the values of logP of TG, which are important regarding their roles in dug efficiency, could be predicted by the proposed QSPR models of this work.

Authors’ Contribution Statement

The contribution of Somaye Mir Mohammad Hoseini Ahari was to preparing all parts of this manuscript besides conducting the whole project, and the contribution of Mahmoud Mirzaei was to editing the written manuscript.

References

Schmiegelow

and Bruunshuus

, 6-Thioguanine nucleotide accumulation in red blood cells during maintenance chemotherapy for childhood acute lymphoblastic leukemia, and its relation to leukopenia, Cancer Chemotherapy and Pharmacology 26 (1990), 288–292.

Špačková

N.A.

, Cubero

, Šponer

and Orozco

, Theoretical study of the guanine⟶ 6-thioguanine substitution in duplexes, triplexes, and tetraplexes, Journal of the American Chemical Society 126 (2004), 14642–14650.

Civcir

P.U.

, Tautomerism of 6-thioguanine in the gas and aqueous phases using AM1 and PM3 methods, Journal of Molecular Structure: THEOCHEM 536 (2001), 161–171.

Mirmomtaz

and Ensafi

A.A.

, Voltammetric determination of trace quantities of 6-thioguanine based on the interaction with DNA at a mercury electrode, Electrochimica Acta 54 (2009), 4353–4358.

Heidarnezhad

, Heidarnezhad

, Ganiev

, Obidov

and Sharifi

M.S.

, Theoretical survey on tautomerism of thioguanine tautomers by polarisable continuum method (PCM), Oriental Journal of Chemistry 29 (2013), 53–58.

Kasende

O.E.

, Infrared spectra of 6-thioguanine tautomers. An experimental and theoretical approach, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 58 (2002), 1793–1808.

Martin

Y.C.

, Experimental and pKa prediction aspects of tautomerism of drug-like molecules, Drug Discovery Today: Technologies 27 (2018), 59–64.

Masand

V.H.

, Mahajan

D.T.

, Ben Hadda

, Jawarkar

R.D.

, Alafeefy

A.M.

, Rastija

and Ali

M.A.

, Does tautomerism influence the outcome of QSAR modeling? Medicinal Chemistry Research 23 (2014), 1742–1757.

Yaraghi

, Ozkendir

O.M.

and Mirzaei

, DFT studies of 5-fluorouracil tautomers on a silicon graphene nanosheet, Superlattices and Microstructures 85 (2015), 784–788.

10.

Pari

A.A.

and Yousefi

, Exploring formations of thio-thiol and keto-enol tautomers for structural analysis of 2-thiouracil, Advanced Journal of Science and Engineering 2 (2021), 111–114.

11.

Zandi

and Harismah

, Density functional theory analyses of non-covalent complex formation of 6-thioguanine and coronene, Lab-in-Silico 2 (2021), 57–62.

12.

Lukovits

, Quantitative structure-activity relationships employing independent quantum chemical indexes, Journal of Medicinal Chemistry 26 (1983), 1104–1109.

13.

Thanikaivelan

, Subramanian

, Rao

J.R.

and Nair

B.U.

, Application of quantum chemical descriptor in quantitative structure activity and structure property relationship, Chemical Physics Letters 323 (2000), 59–70.

14.

Pandith

A.H.

, Giri

and Chattaraj

P.K.

, A comparative study of two quantum chemical descriptors in predicting toxicity of aliphatic compounds towards tetrahymena pyriformis, Organic Chemistry International 2010 (2011), 545087.

15.

Basak

S.C.

, Mills

and Mumtaz

M.M.

, A quantitative structure–activity relationship (QSAR) study of dermal absorption using theoretical molecular descriptors, SAR and QSAR in Environmental Research 18 (2007), 45–55.

16.

Sarmah

and Deka

R.C.

, DFT-based QSAR and QSPR models of several cis-platinum complexes: solvent effect, Journal of Computer-Aided Molecular Design 23 (2009), 343–354.

17.

Hui-Ying

, Jian-Wei

, Gui-Xiang

and Wei

, QSPR/QSAR models for prediction of the physico-chemical properties and biological activity of polychlorinated diphenyl ethers (PCDEs), Chemosphere 80 (2010), 665–670.

18.

Yan

, Yao

, Yan

, Gao

, Lu

and He

, Chiral protein supraparticles for tumor suppression and synergistic immunotherapy: an enabling strategy for bioactive supramolecular chirality construction, Nano Letters 20 (2020), 5844–5852.

19.

Jin

, Yan

, Chen

, Wang

, Pan

, Liu

, Lou

, Wang

and Ye

, Multimodal deep learning with feature level fusion for identification of choroidal neovascularization activity in age-related macular degeneration, Acta Ophthalmologica 100 (2022), e512–20.

20.

Wang

, Ning

, Yu

and Wang

, USP14: structure, function, and target inhibition, Frontiers in Pharmacology 12 (2021), 801328.

21.

Mälkiä

, Murtomäki

, Urtti

and Kontturi

, Drug permeation in biomembranes: in vitro and in silico prediction and influence of physicochemical properties, European Journal of Pharmaceutical Sciences 2 (2004), 13–47.

22.

Zhao

T.H.

, Khan

M.I.

and Chu

Y.M.

, Artificial neural networking (ANN) analysis for heat and entropy generation in flow of non-Newtonian fluid between two rotating disks, Mathematical Methods in the Applied Sciences. 2021: in press.

23.

Zou

, Xing

, Wei

and Liu

, Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA, RNA 25 (2019), 205–218.

24.

Wang

, Khan

M.N.

, Ahmad

, Abu-Zinadah

and Chu

Y.M.

, Numerical solution of traveling waves in chemical kinetics: time fractional fishers equations, Fractals 30 (2022), 22400051.

25.

Zha

T.H.

, Castillo

, Jahanshahi

, Yusuf

, Alassafi

M.O.

, Alsaadi

F.E.

and Chu

Y.M.

, A fuzzy-based strategy to suppress the novel coronavirus -NCOV) massive outbreak, Applied and Computational Mathematics 20 (2021), 160–176.

26.

Nazeer

, Hussain

, Khan

M.I.

, El-Zahar

E.R.

, Chu

Y.M.

and Malik

M.Y.

, Theoretical study of MHD electro-osmotically flow of third-grade fluid in micro channel, Applied Mathematics and Computation 420 (2022), 126868.

27.

Chen

, Zou

and Li

, DeepM6ASeq-EL: prediction of human N6-methyladenosine (m6A) Sites with LSTM and ensemble learning, Frontiers of Computer Science 16 (2022), 1–7.

28.

Zhao

T.H.

, Qian

W.M.

and Chu

Y.M.

, Sharp power mean bounds for the tangent and hyperbolic sine means, Journal of Mathematical Inequalities 15 (2021), 1459–1472.

29.

, Hu

, Yang

, Zhang

, Wu

J.J.

, Cheng

, Wang

and Zheng

, Capacitive aptasensor coupled with microfluidic enrichment for real-time detection of trace SARS-CoV-2 nucleocapsid protein, Analytical Chemistry 94 (2022), 2812–2819.

30.

Rastogi

and Choudhary

, Face recognition by using neural network, Acta Informatica Malaysia 3 (2019), 7–9.

31.

Wang

, Li

, Zhang

, Yang

, Li

, Dong

and Wang

, Processing characteristics of vegetable oil-based nanofluid MQL for grinding different workpiece materials, International Journal of Precision Engineering and Manufacturing-Green Technology 5 (2018), 327–339.

32.

Song

Y.Q.

, Zhao

T.H.

, Chu

Y.M.

and Zhang

X.H.

, Optimal evaluation of a Toader-type mean by power mean, Journal of Inequalities and Applications 2015 (2015), 408.

33.

, Fan

, Huang

, Zhao

, Xiong

, Yin

and Zhang

, Effect of micro-and nano-starch on the gel properties, microstructure and water mobility of myofibrillar protein from grass carp, Food Chemistry 366 (2022), 130579.

34.

Chu

Y.M.

, Nazir

, Sohail

, Selim

M.M.

and Lee

J.R.

, Enhancement in thermal energy and solute particles using hybrid nanoparticles by engaging activation energy and chemical reaction over a parabolic surface via finite element approach, Fractal and Fractional 5 (2021), 119.

35.

Chu

Y.M.

, Shankaralingappa

B.M.

, Gireesha

B.J.

, Alzahrani

, Khan

M.I.

and Khan

S.U.

, Combined impact of Cattaneo-Christov double diffusion and radiative heat flux on bio-convective flow of Maxwell liquid configured by a stretched nano-material surface, Applied Mathematics and Computation 419 (2022), 126883.

36.

, Li

, Zhang

, Wang

, Jia

and Yang

, Grinding temperature and energy ratio coefficient in MQL grinding of high-temperature nickel-base alloy by using different vegetable oils as base oil, Chinese Journal of Aeronautics 29 (2016), 1084–1095.

37.

Yang

, Li

, Zhang

, Jia

, Zhang

, Hou

, Li

and Wang

, Maximum undeformed equivalent chip thickness for ductile-brittle transition of zirconia ceramics under different lubrication conditions, International Journal of Machine Tools and Manufacture 122 (2017), 55–65.

38.

Gao

, Li

, Zhang

, Yang

, Jia

, Jin

, Hou

and Li

, Dispersing mechanism and tribological performance of vegetable oil-based CNT nanofluids with different surfactants, Tribology International 131 (2019), 51–63.

39.

Guo

, Li

, Zhang

, Wang

, Li

, Yang

, Zhang

and Liu

, Experimental evaluation of the lubrication performance of mixtures of castor oil with other vegetable oils in MQL grinding of nickel-based alloy, Journal of Cleaner Production 140 (2017), 1060–1076.

40.

Zhang

, Li

, Zhang

, Yang

, Jia

, Liu

, Hou

, Li

, Zhang

, Wu

and Cao

, Experimental assessment of an environmentally friendly grinding process using nanofluid minimum quantity lubrication with cryogenic air, Journal of Cleaner Production 193 (2018), 236–248.

41.

Yekeler

, Ab initio calculations of solvent effects on guanine and thioguanine tautomerism, Indian Journal of Chemistry -Section A 39 (2000), 1231–1240.

42.

Contreras

J.G.

and Alderete

J.B.

, AM1 studies on the prototropic tautomerism of 6-thioguanine, Journal of Molecular Structure: THEOCHEM 283 (1993), 283–287.

43.

Frisch

M.J.

, Trucks

G.W.

, Schlegel

H.B.

, Scuseria

G.E.

, Robb

M.A.

, Cheeseman

J.R.

, et al. Gaussian 03 program. Gaussian Inc., Wallingford, CT. 2003.

44.

Mauri

, Consonni

, Pavan

and Todeschini

, Dragon software: an easy approach to molecular descriptor calculations, Match 56 (2006), 237–248.

45.

Kirch

, Pearson’s correlation coefficient, Encyclopedia of Public Health 1 (2008), 1090–1091.

46.

Gupta

A.K.

, Introduction to MATLAB, Numerical Methods using MATLAB (2014), 1–12.

47.

Froimowitz

, HyperChem: a software package for computational chemistry and molecular modeling, Biotechniques 14 (1993), 1010–1013.

48.

Anthony

, Discovering statistics using SPSS, Nurse Researcher 17 (2010), 91–93.

49.

Kennard

R.W.

and Stone

L.A.

, Computer aided design of experiments, Technometrics 11 (1969), 137–48.

50.

Koopmans

, Ordering of wave functions and eigenenergies to the individual electrons of an atom, Physica 1 (1933), 104–113.

51.

Putz

M.V.

, Russo

and Sicilia

, About the Mulliken electronegativity in DFT, Theoretical Chemistry Accounts 114 (2005), 38–45.

52.

Pearson

R.G.

, Maximum chemical and physical hardness, Journal of Chemical Education 76 (1999), 267–275.

53.

Parr

R.G.

and Yang

, Density functional approach to the frontier-electron theory of chemical reactivity, Journal of the American Chemical Society 106 (1984), 4049–4050.

54.

Crisan

, Bora

, Kurunczi

, VLAIA

and Simon

, Multiple linear regression and partial least squares QSAR modeling applied to a series of antipsychotic sertindole derivatives, Revue Roumaine de Chimie 55 (2010), 941–946.

55.

Jahan

, ANOVA Procedures for Multiple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach, American Journal of Mathematics and Statistics 7 (2017), 169–178.

56.

Safa

, Nejat

, Nuthall

P.L.

and Greig

B.J.

, Predicting CO emissions from farm inputs in wheat production using artificial neural networks and linear regression models - case study in Canterbury, New Zealand. International Journal of Advanced Computer Science and Applications 7 (2016), 268–274.

57.

Samarasinghe

, Neural networks for applied sciences and engineering: from fundamentals to complex pattern recognition, Auerbach Publications; 2016.

58.

Rosenblatt

, The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review 65 (1958), 386–408.

59.

Garson

D.G.

, Interpreting neural network connection weights, Artificial Intelligence Expert 6 (1991), 47–51.

60.

Žuvela

, David

and Wong

M.W.

, Interpretation of ANN-based QSAR models for prediction of antioxidant activity of flavonoids, Journal of Computational Chemistry 39 (2018), 953–963.

61.

Gevrey

, Dimopoulos

and Lek

, Review and comparison of methods to study the contribution of variables in artificial neural network models, Ecological Modelling 160 (2003), 249–264.

62.

Gadžurić

S.B.

, Kuzmanović

S.O.

, Vraneš

M.B.

, Petrin

, Bugarski

and Kovačević

S.Z.

, Multivariate chemometrics with regression and classification analyses in heroin profiling based on the chromatographic data, Iranian Journal of Pharmaceutical Research 15 (2016), 725–734.

63.

Elidrissi

, Ousaa

, Ghamali

, Chtita

, Ajana

M.A.

, Bouachrine

and Lakhlifi

, QSPR and DFT studies on the melting point of carbocyclic nitroaromatic compounds, Journal of Physical Chemistry & Biophysics 7 (2017), 2161–0398.

64.

Dinov

, F-Distribution Tables https://www.socr.ucla.edu/Applets.dir/F_Table.html (Accessed on 24 May 2021).

65.

Bishop

C.M.

, Neural networks for pattern recognition. Oxford University Press. New York; 1995.

66.

Veerasamy

, Rajak

, Jain

, Sivadasan

, Varghese

C.P.

and Agrawal

R.K.

, Validation of QSAR models-strategies and importance, International Journal of Drug Design and Discovery 3 (2011), 511–519.

The artificial neural network-based QSPR and DFT prediction of lipophilicity for thioguanine

Abstract

Keywords

1 Introduction

2 Materials and methods

3.1 Multiple Linear Regression (MLR) model

Table 2 ANOVA analysis results State Source of variation Degree of freedom Sum of square Mean sum of square F Isolated ESS 9 2.759 0.307 1.624 RSS 10 1.888 0.189 Total 19 4.647 – Encapsulated ESS 11 60.054 5.459 1.456 RSS 8 29.997 3.750 Total 19 90.051 –

Table 3 PANN results Group Parameter Isolated state Encapsulated state Training set Sum of squared errors 7.311 7.504 Relative error 0.975 0.536 Training time 00:00:00.012 00:00:00.008 Testing set Sum of squared errors 1.527 1.918 Relative error 0.955 0.435

Authors’ Contribution Statement

References

Table 2
ANOVA analysis results

State Source of variation Degree of freedom Sum of square Mean sum of square F

Isolated ESS 9 2.759 0.307 1.624

RSS 10 1.888 0.189

Total 19 4.647 –

Encapsulated ESS 11 60.054 5.459 1.456

RSS 8 29.997 3.750

Total 19 90.051 –

Table 3
PANN results

Group Parameter Isolated state Encapsulated state

Training set Sum of squared errors 7.311 7.504

Relative error 0.975 0.536

Training time 00:00:00.012 00:00:00.008

Testing set Sum of squared errors 1.527 1.918

Relative error 0.955 0.435