Exploratory and machine learning analysis of the stability constants of Hg II

Abstract

Knowing stability constants for the complexes Hg^II with extracting ligands is very important from environmental and therapeutic standpoints. Since the selectivity of ligands can be stated by the stability constants of cation–ligand complexes, quantitative structure–property relationship (QSPR) investigations on binding constant of Hg^II complexes were done. Experimental data of the stability constants in ML₂ complexation of Hg^II and synthesized triazene ligands were used to construct and develop QSPR models. Support vector machine (SVM) and multiple linear regression (MLR) have been employed to create the QSPR models. The final model showed squared correlation coefficient of 0.917 and the standard error of calibration (SEC) value of 0.141 log K units. The proposed model presented accurate prediction with the Leave-One-Out cross validation ( $Q_{LOO}^{2}$ = 0.756) and validated using Y-randomization and external test set. Statistical results demonstrated that the proposed models had suitable goodness of fit, predictive ability, and robustness. The results revealed the importance of charge effects and topological properties of ligand in Hg^II - triazene complexation.

Keywords

Stability constants triazene ligands QSPR SVM MLR

1 Introduction

Mercury is one of the most dangerous heavy metals found in nature’s trace elements, and it has a high toxicity to the environment and microorganisms even at low doses. It is a carcinogen, teratogen, and mutagen with no known biological roles and considerable health effects [1, 2]. Similarly, HgII is hazardous in the same way as mercury metal is, but methylmercury, a neurotoxin that is biomagnified through the food chain from marine species to humans, is the most harmful. The toxicity of mercury is highly altered by organic ligands during complexation, and chronic or acute mercury poisoning by chelating agents has been studied in Hg^II complexes [3 –5]. Furthermore, the binding of Hg^II to natural organic matter has a major influence on biological availability of Hg^II in ecosystems [6] which limits the availability of methylating bacteria to Hg^II [7]. From an environmental and medicinal aspect, knowing the stability constants for Hg^II complexes with extracting ligands is critical.

Triazenes are relatively old compounds, which defined by containing a diazoamino functional group (–N–N = N–). These compounds have been analyzed for over 150 years regarding their interesting activity [8]. The research of transition-metal complexes containing substituted triazene ligands has increased substantially in recent years due to the reactivity potential of these ligands. Substituted triazenes have recently gotten a lot of attention because of their anti-tumor, information-storage, and insecticide properties [9]. Triazenes have the ability to bind to mercury ions, and the resulting compounds are moisture and air resistant. However, a spectrophotometric analysis of complexation between Hg^II and triazene ligands in acetonitrile revealed that the Hg^II ion complex has a high stability binding constant. the complex stability constant (here log K_Hg^II_- triazene) expresses the ligand’s selectivity for the ion. As a result, triazene ligands have a high level of complex selectivity for Hg^II, resulting in excellent recognition properties for mercury ions. Hence, triazene ligands have a high level of complex selectivity for Hg^II, resulting in excellent recognition properties for mercury ions. The selective complexing capabilities of triazenes with mercury ions allow them to be employed in a variety of applications, including mercury ion pre-concentration and determination, adsorption and separation, and the creation of mercury ion selective electrodes.

Experimental measurement of the stability constants of Hg^II binding by triazene ligands is time consuming and costly. It is, hence, favorable to calculate those stability constants without incurring any more costs for getting additional experimental data is advantageous [10]. Mercuric ion occupational exposure, on the other hand, can occur through skin contact and the respiratory system. Mercuric ion exposure at work can have negative health consequences, particularly for the kidneys and central nervous system [2, 11]. In addition, workers who were exposed to chemicals on a regular basis experienced gum inflammation and excessive salivation. Gradually, the accumulation of mercuric ions in the human body causes long-term brain damage, manifesting as deafness, headaches, forgetfulness, and cognitive impairments [12, 13]. Hence, in studies of high-toxicity materials like mercuric ion, the importance and central health of computational approaches is highly noticeable compared to laboratory methods [14, 15] ensuring the health of researchers and practitioners working with such dangerous chemicals.

Since the selectivity of ligands (as extractant) can be stated by the stability constants of cation–ligand complexes, quantitative structure–property (QSPR) relationship investigations on binding constant of Hg^II complexes were done. This research develops and validates a QSPR for estimating equilibrium constant of Hg^II binding by triazene ligands. A suitable QSPR model for predicting mercuric ion binding by substituted triazene ligands, should require as minimum information as possible, information about constitutional descriptors such as topological structure and molecular charges [16, 17].

The complex stability constants are an essential experimental quantity describes the binding consistency of metal-ion with (in) organic ligands. The complex stability constants are an important experimental variable that describes how well metal ions bond to organic ligands. The stability constants are particularly important in controlling the complexation process in the environment [18 –21], biology [22], medicine [23 –25], and analytical chemistry [26 –29]. At present, QSPR, quantum chemical modelling [30 –34] and linear free energy relationships [35] are utilized for computational prediction of the stability constants. QSPR method has been utilized to simulate complex stability constants of organic ligands with various metal ions in the field of computer aided stability constant prediction. The QSPR study is a significant field of research in computational chemistry and has been successfully used in the prediction of stability constant of complexes [36, 37].

Available empirical data on stability constants can be used as a guideline in QSPR modeling to predict the stability constants of newly synthesized ligand complexes [30] and each of a compound’s observed physicochemical behaviors is correlated with its numerical descriptors. The most main structural descriptors affecting the physicochemical properties of a structure are identified and separated in QSPR approach hence, this method typically needs variable selection for building well-fitted models. One of the most significant advantages of such a strategy is that it is confined to chemical structural knowledge and does not necessitate any experimental properties. As a result, it can be used to estimate the properties of novel compounds that have yet to be synthesized, discovered, or tested. As a result, the QSPR approach can hasten the creation of molecules with desirable features that have not yet been produced.

In this paper, we establish new QSPR models for predicting the stability constants of Hg^II - triazene complexes from the molecular descriptors and to look for structural variables that are connected to this trait. To create nonlinear and linear models, two modeling approaches were used: SVM and MLR. The obtained models’ efficiency was compared to one another. It should be noted that, to our knowledge, no research on QSPR modeling of the stability constants of Hg^II binding by triazene ligands has been published.

2 Materials and methods

2.1 Data set

The substituted triazene ligands (Table 1) were synthesized as stated in the literature [38]. In order to specify stability and stoichiometry of the resulting Hg^II - triazene complexes, in a conventional process, 2.0 mL of 5.0×10^–5 mol L^–1 triazene solution in ACN and 25°C were put in the spectrophotometer cell (10 mm) and the absorbance of solution was measured in range of 250 –550 nm. Then a certain amount of the Hg^II concentrated solution (1.4×10^–3 mol L^–1) in ACN was added in a step-by-step manner using a 10μL Hamilton syringe. After each addition the absorbance spectrum of each solution was acquired. In the following, the binding constant of the Hg^II -triazene resulting complex was assessed from the absorbance versus [Hg^II]/[triazene] mole ratio data by means of applying a nonlinear least squares curve fitting program, KINFIT. Finally, the obtained stability constant values (log K_Hg^II_- triazene) were used as experimental input data for modeling in this study. The chemical structures of substituted triazene ligands with their observed and predicted values of binding constants in the complex with Hg^II are presented in Table 1.

Table 1
The structure, value of descriptors selected, experimental and predicted of formation constant of Hg^II complexes

Compound Ligand structure ATSC3s GATS6s Log K_exp Log K_MLR Log K_SVM

Hg (ACT)₂ –13.2488 0.375632 4.7 4.91 4.87

Hg (AMT)₂ –12.8356 0.378896 4.87 4.94 4.91

Hg (APT)₂ –14.2286 0.380049 4.78 4.82 4.68

Hg (CAT)₂ –16.7005 0.495465 4.55 4.36 4.45

Hg (NAT)₂ –15.8998 0.618772 4.2 4.18 4.30

Hg (AYT)₂ –12.7645 0.408959 4.99 4.89 4.89

Hg (OClOM)₂ 2.628301 1.068207 5.05 4.89 5.15

Hg (OClMM)₂ –1.70019 1.241584 4.1 4.15 4.20

Hg (OClPM)₂ –0.1446 1.153751 4.23 4.47 4.14

Hg (MClOM)₂ 1.142049 1.087104 4.2 4.72 4.34

Hg (MClMM)₂ –3.18644 1.251588 4.23 4.00 4.13

Hg (MClPM)₂ –1.63089 1.164624 4.69 4.32 4.36

Hg (PClOM)₂ 2.261391 1.107537 5.05 4.78 4.84

Hg (PClMM)₂ –2.0671 1.272838 3.99 4.06 4.17

Hg (PClPM)₂ –0.51154 1.187722 4.22 4.37 4.12

Hg(b-PT)₂ –1.57694 0.872987 5.079 4.92 4.98

Hg (OClPM) 0.414442 1.298062 3.2 3.33 3.30

Compound	ATSC3s	GATS6s	Log K_exp	Log K_MLR	Log K_SVM
Hg (ACT)₂	–13.2488	0.375632	4.7	4.91	4.87
Hg (AMT)₂	–12.8356	0.378896	4.87	4.94	4.91
Hg (APT)₂	–14.2286	0.380049	4.78	4.82	4.68
Hg (CAT)₂	–16.7005	0.495465	4.55	4.36	4.45
Hg (NAT)₂	–15.8998	0.618772	4.2	4.18	4.30
Hg (AYT)₂	–12.7645	0.408959	4.99	4.89	4.89
Hg (OClOM)₂	2.628301	1.068207	5.05	4.89	5.15
Hg (OClMM)₂	–1.70019	1.241584	4.1	4.15	4.20
Hg (OClPM)₂	–0.1446	1.153751	4.23	4.47	4.14
Hg (MClOM)₂	1.142049	1.087104	4.2	4.72	4.34
Hg (MClMM)₂	–3.18644	1.251588	4.23	4.00	4.13
Hg (MClPM)₂	–1.63089	1.164624	4.69	4.32	4.36
Hg (PClOM)₂	2.261391	1.107537	5.05	4.78	4.84
Hg (PClMM)₂	–2.0671	1.272838	3.99	4.06	4.17
Hg (PClPM)₂	–0.51154	1.187722	4.22	4.37	4.12
Hg(b-PT)₂	–1.57694	0.872987	5.079	4.92	4.98
Hg (OClPM)	0.414442	1.298062	3.2	3.33	3.30

2.2 Molecular optimization

All the chemical structures of complexes were checked, drawn, and conformational minimized by HyperChem 8.0 software. Pre-optimization of the geometrical structure was done by the MM⁺ Polake-Ribiere algorithm (molecular mechanics method), and by a more accurate optimization with the semi-empirical Austin Model-1 method (AM1), the final geometries of the minimum energy configuration with a gradient norm criterion of 0.005 kcal/molÅ were obtained.

2.3 Descriptor calculation

PaDEL-descriptor calculates molecular 0D-3D descriptors and fingerprints. PaDEL-descriptor is available as free and open-source software (http://padel.nus.edu.sg/software/padeldescriptor/). Various 2D and 1D molecular descriptors such as electrotopological state indices descriptors, chi indices descriptors autocorrelation descriptors, constitutional descriptors, topological descriptors and BCUT descriptors were computed using PaDEL-Descriptor [39].

2.4 Variable selection

Stepwise regression procedure is widely used in QSPR analysis for select an appropriate and limited number of descriptors. It is one of the easiest and suitable methods for descriptor election in QSAR research [40]. Moreover, this approach can offer a number of the substantial descriptors for modeling against other procedures. Nevertheless, when the descriptor pool is vast, stepwise regression cannot give satisfactory results [38].

Enhanced replacement method (ERM) algorithm provides specifying an appropriate subset of descriptors from a wide descriptors dataset and N objects using MLR procedure. It requires lesser amount of linear regressions than a time-consuming complete research method whereas attaining similar results [41]. algorithm of ERM comprises the following steps: [42, 43].

An initial collection of descriptor d (d = {X₁, X₂,..., X_d}) is randomly designated from D (D = {X₁, X₂..., X_d . . . . . . X_D}. According to d, an MLR model is created. The residual standard deviation (RSD) of the created model is estimated in accordance with using following equation:

RSD = \sqrt{\frac{1}{(N - d - 1)} \sum_{i = 1}^{N} {res}_{i}^{2}}

N = umber of data points in the training dataset. res_i = discrepancy between the predicted and experimental value for the data point i in the training dataset.

From this resulting set, the descriptor was selected with the highest standard deviation in the coefficient and all the remaining (D-d) descriptors were replaced one by one, for it. This procedure was repeated until the set remained unmodified. In each cycle, the descriptor optimized in the previous period was not modified. Thus, the candidate dm (i) was obtained from the so constructed path i.

The above process was carried out for all the probable routes i = 1, 2, 3, 4 . . . , d and was kept dm with the minimum standard deviation.

2.5 Development of QSPR model and screening

2.5.1 MLR

Multiple linear regression (MLR) procedure is extensively worked in linear problem. MLR models the relation between two or more independent or predictor variables, and a dependent or criterion response variable, by fitting a linear mathematical equation to experimental data. Each value of X (the independent variable) is related with a value of Y (the dependent variable). The MLR method presented a mathematical linear equation linking the logarithm stability constants of complexes to the structural descriptors. By this approach the two most related descriptors were chosen to develop the model. MLR advantage is its simply interpretable mathematical expression.

2.5.2 Support vector machine

The support vector machine (SVM) which developed by Vapnik [44], is a modern algorithm acquired from the machine learning community, and has increasingly popularity and obtained wide applications in various aspects such as drug design [45], pattern recognition problems, [46] and QSPR analysis [47] because of its attractive aspects and significant generalization operation. SVM is a normal progress of neural networks and is categorized according to the optimal hyperplane generation method. The SVM approach outperforms the artificial neural network algorithm in terms of efficiently avoiding data over-fitting [48]. The model with optimal parameters supplying best efficiency of SVM on the cross-validation training set, were confirmed on the corresponding test set. The QSPR model was established using the following process: (I) build of many base models using diverse descriptor subsets, (II) elect appropriate base models to create a model, and (III) evaluating the model efficiency. Base models are separate predictor models that are added to construct consensus models. In this work, base models were built from the training set utilizing diverse descriptor subset(s). The efficiency of the consensus model was then assessed using the validation set.

All calculations were accomplished with MATLAB. PLS toolbox 4.2 was used to create SVM and MLR models.

3 Results and discussion

The MLR and SVM were used to obtain quantitative relationships between the stability constants of Hg^II complexes with substituted triazene organic ligands and the computed descriptors. The ERM variable selection technique was used to select the significant variables.

The summary of statistical results of MLR and SVM models are depicted in Table 2. The plot of calculated versus experimental stability constants of Hg^II - triazene ligands obtained by the MLR and SVM modeling are shown in Fig. 1.

Table 2
Summary of statistical results of MLR and SVM models

Validation parameter MLR model SVM model

R²_c 0.804 0.917

Q²_(LOO) 0.757 0.756

Q²_LGOn = 4 0.721 0.705

SEC 0.215 0.141

SEP 0.251 0.263

Y²_random 0.265±0.102 0.325±0.132

Validation parameter	MLR model	SVM model
R²_c	0.804	0.917
Q²_(LOO)	0.757	0.756
Q²_LGOn = 4	0.721	0.705
SEC	0.215	0.141
SEP	0.251	0.263
Y²_random	0.265±0.102	0.325±0.132

Q²Loo: Leave one-out cross validation correlation coefficient, Q²LGO _n _= 4: Leave four-out cross validation correlation coefficient.

Fig. 1

Plot of calculated versus experimental stability constants of Hg^II - triazene ligands complexes two-variable MLR and SVM models.

Because of its significant generalization efficiency, the SVM has possessed attention and obtained widespread application, such as QSPR/QSAR [49] and drug design [50] analysis. In most cases, efficiency of SVM modeling either competes or is dramatically improved than that of conventional machine learning methods. In light of the foregoing, we chose the MLR and SVM models because the MLR model allows us to comprehend the link between response and explanatory factors, and the SVM model, as a machine learning approach, is supposed to provide great prediction accuracy. SVM regression analysis was used to create a non-linear model using the same subset of descriptors, and the statistical findings of SVM and MLR were compared (see Table 2).

The best subset of descriptor selected by ERM variable selection techniques are ATSC3s and GATS6s. The MLR model based on the selected descriptors and the log K_Hg^II_- triazene gave a two-variable equation as follows: $\log K_{H g^{II} - triazene} = 6.866 + 0.089 * ATSC 3 s - 2.064 * GATS 6 s$ (1)

The developed model reveals significance of GATS6s and ATSC3s descriptors in logarithm stability constants prediction. GATS6s is the Geary autocorrelation of lag 6 weighted by I-state, and it presents data about the distribution of inherent state along the topological structure.

Both descriptors selected are autocorrelation descriptors. Autocorrelation descriptors are topological descriptors that encoding both molecular structure and physico-chemical properties of a molecule. ATSC3s is the Centered Broto-Moreau autocorrelation weighted by charges and it provides information about the molecular charges. GATS6s is the Geary autocorrelation of lag 6 weighted by I-state, and it offers data about the intrinsic state distribution along the geometrical structure. This descriptor had negative contribution to log K_Hg^II_- triazene values. The negative coefficient of this descriptor as a mark that the value of this descriptor changes inversely with log K_Hg^II_- triazene amount and the spatial barrier created in the topological structure by substitution, reduce the metal-ligand interaction. The ATSC3s and GATS6s descriptors have been weighted by charge and I-state respectively. These indicated that the charge descriptors along with the topological properties of the ligand have a great influence on the formation of the Hg^II - triazene complex.

The efficiency of the generated models was assessed using the leave-one-out (LOO-CV) and leave-group-out (LGO) cross validation procedures during the model development stage. The resulting regression model has an excellent ability to predict both internally and externally, according to cross-validation indices. In the LOO-CV method, a QSPR model was generated from a total data set (n–1) samples, and the property of the left-out sample was assessed by the constructed model, and the estimated property for that point was compared to its factual quantity. This process was continued until all of the samples in the data set had been excluded. For LGO-CV, a subset of the data was removed from the dataset and the model was rebuilt; the estimated values were then compared to the actual values for the omitted data. This method was repeated once more until all data points had been deleted. Other statistical indices were also computed, including the standard error of prediction (SEP), standard error of calibration (SEC), and square of the correlation coefficient (R²). As shown in Table 2, the R² values of the original models were much higher than those obtained from the randomization test, indicating that the produced models are statistically robust and significant. The statistical indices show well-fitting ability for internal as well as external sets (great values of R²) and high predictability (low value of SEC and SEP) and low relative difference between Q² and R². The designed model was too investigated by Y-randomization test for robustness. The vector of dependent variable (log K_Hg^II_- triazene) was shuffled randomly and by using the matrix of original independent variable a new QSPR model was designed. According to Table 2, though SVM seemed more robust due to the nonlinear effects of the modeling approach and was expected to obtain better result from the MLR, here it did not provide only slightly improved modeling ( $R_{SVM}^{2} = 0.9171$ , $R_{MLR}^{2} = 0.8013$ ) and the obtained results (LOO-CV and LGO) are similar to the MLR predicted, which may be due to the linear effects of the descriptors on complex formation.

Whereas even a validated and robust QSPR, it is not expected to predict the modeled property of compounds globally with certainty; hence, the applicability domain (AD) of the ultimate proposed model should be determined. AD is a theoretic chemical [51] area in the space specified by the model descriptors and modeled response, for which a particular QSPR model must make valid predictions. The credibility of the model prediction and the AD analysis of ERM-MLR are checked by the leverage methodology, where, the Williams plan (h) can be applied for a simple and urgent graphical finding of the response outliers and structurally effective chemicals in a model. In this plan, the two parallel horizontal dotted lines demonstrate the limit of usual values for Y outliers (that’s mean, samples with standardized residuals more than±3.0 standard deviation units); the vertical dash line demonstrate the limit of usual values for X outliers (that’s mean, samples with leverage values more than the critical value, h > h^*). On the model AD analyzing in the Williams plot of MLR model no compound in the data set was identified as outlier (see Fig. 2). The applicability domain analysis of the two-variable models is shown in Fig. 2.

Fig. 2

Plot of standardized residuals versus leverages. Dotted lines represent±3.0 standardized residuals and dash line represents warning leverage (h^* ≈.0.35).

4 Conclusion

In this work, some QSPR models were developed to predict the stability constants for Hg^II- triazene ligands using MLR and SVM methods. The most obvious conclusion to be drawn from this study is that Hg^II- triazene complexation mainly affected by two main factors: the distribution of inherent state along the topological structure and topological properties. These descriptors may be useful in synthesizing and developing triazene ligands that are highly efficient in the Hg^II selective extraction and/or separation procedure. The developed MLR and SVM models were successfully employed to predict Hg^II - triazene complexation stability constants after their robustness and accuracy were evaluated. Finally, it is expected that the hidden information in the generated models will lead to more cost-effective, faster, and environmentally friendly methods of reaching the stability constant of HgII using novel triazene ligands.

Conflict of interest

The authors declare there is no conflict of interest.

Funding

No funding was received for this work.

Authors’ contributions

Ahmadreza Hajihosseinloo and Mohammad Kazem Rofouei performed the experiments; Ahmadreza Hajihosseinloo, Maryam Salahinejad and Jahan B. Ghasemi made the modeling and developed the theoretical framework; Ahmadreza Hajihosseinloo and Maryam Salahinejad wrote the article. We confirm that the manuscript has been read and approved by all named authors.

Availability of data and material

The authors confirm that the data supporting the findings of this study are available within the article.

References

Haritash

and Kaushik

, Biodegradation aspects of polycyclic aromatic hydrocarbons (PAHs): a review, Journal of Hazardous Materials 169(1–3) (2009), 1–15.

Noyes

, Hamdy

and Muse

, Control of mercury pollution, Journal of Toxicology and Environmental Health, Part A Current Issues 1(3) (1976), 409–420.

Casas

and Jones

M.M.

, Mercury (II) complexes with sulfhydryl containing chelating agents: Stability constant inconsistencies and their resolution, Journal of Inorganic and Nuclear Chemistry 42(1) (1980), 99–102.

Basinger

M.A.

, Casas

, Jones

M.M.

, Weaver

A.D.

and Weinstein

N.H.

, Structural requirements for Hg (II) antidotes, Journal of Inorganic and Nuclear Chemistry 43(6) (1981), 1419–1425.

, Quan

, Tan

, Fu

J.H.

, Liang

Y.J.

and Li

J.X.

, Synthesis and Properties of Dimercury(I) Crystal Network Constructed with Functionalized Pyrazine Sulfonate and Nitrate Linkers, , Russian Journal of General Chemistry 91(5) (2021), 910–914. doi: 10.1134/S1070363221050224.

Haitzer

, Aiken

G.R.

and Ryan

J.N.

, Binding of mercury (II) to dissolved organic matter: the role of the mercury-to-DOM concentration ratio, Environmental Science & Technology 36(16) (2002), 3564–v.

Ravichandran

, Interactions between mercury and dissolved organic matter––a review, Chemosphere 55(3) (2004), 319–331.

Rekhis

, Labat

, Ouamerali

, Ciofini

and Adamo

, Theoretical analysis of the electronic properties of N3 derivatives, The Journal of Physical Chemistry A 111(50) (1310), 13106–13111.

Nuyken

, Stebani

, Wokaun

and Lippert

, Polymers with Triazene Units in the Main Chain. In: Macromolecular Engineering. Springer (1995), 303–318.

10.

Mousavi

, Predicting mercury (II) binding by organic ligands: a chemical model of therapeutic and environmental interests, Environmental Forensics 12(4) (2011), 327–332.

11.

Brambila

, Liu

, Morgan

D.L.

, Beliles

R.P.

and Waalkes

M.P.

, Effect of mercury vapor exposure on metallothionein and glutathione s-transferase gene expression in the kidney of nonpregnant, pregnant, and neonatal rats, Journal of Toxicology and Environmental Health, Part A 65(17) (2002), 1273–1288.

12.

Onyido

, Norris

A.R.

and Buncel

, Biomolecule–mercury interactions: Modalities of DNA base–mercury binding mechanisms. remediation strategies, Chemical Reviews 104(12) (2004), 5911–5930.

13.

Eley

and Cox

, The release, absorption and possible health effects of mercury from dental amalgam: a review of recent findings, British Dental Journal 175(10) (1993), 355–362.

14.

Jacquemin

, Perpete

E.A.

, Ciofini

and Adamo

, Assessment of the ωB97 family for excited-state calculations, Theoretical Chemistry Accounts 128(1) (2011), 127–136.

15.

Jacquemin

, Perpete

E.A.

, Scuseria

G.E.

, Ciofini

and Adamo

, TD-DFT performance for the visible absorption spectra of organic dyes: conventional versus long-range hybrids, Journal of Chemical Theory and Computation 4(1) (2008), 123–135.

16.

Fayet

, Rotureau

, Joubert

and Adamo

, QSPR modeling of thermal stability of nitroaromatic compounds: DFT vs. AM1 calculated descriptors, Journal of Molecular Modeling 16(4) (2010), 805–812.

17.

Fayet

, Jacquemin

, Wathelet

, Perpete

E.A.

, Rotureau

and Adamo

, Excited-state properties from ground-state DFT descriptors: a QSPR approach for dyes, Journal of Molecular Graphics and Modelling 28(6) (2010), 465–471.

18.

Thakur

, Pathak

and Choppin

, Complexation thermodynamics and the formation of the binary and the ternary complexes of tetravalent plutonium with carboxylate and aminocarboxylate ligands in aqueous solution of high ionic strength, Inorganica Chimica Acta 362(1) (2009), 179–184.

19.

Choppin

, Thakur

and Mathur

, Complexation thermodynamics and the structure of the binary and the ternary complexes of Am3+, Cm3+and Eu3+with IDA and EDTA+IDA, Inorganica Chimica acta 360(6) (2007), 1859–1869.

20.

Daniele

P.G.

, Foti

, Gianguzza

, Prenesti

and Sammartano

, Weak alkali and alkaline earth metal complexes of low molecular weight ligands in aqueous solution, Coordination Chemistry Reviews 252(10–11) (2008), 1093–1107.

21.

Bruce

E.D.

, Autenrieth

R.L.

, Burghardt

R.C.

, Donnelly

and McDonald

T.J.

, Using quantitative structure–activity relationships (QSAR) to predict toxic endpoints for polycyclic aromatic hydrocarbons (PAH), Journal of Toxicology and Environmental Health, Part A 71(16) (2008), 1073–1084.

22.

van Rijt

S.H.

and Sadler

P.J.

, Current applications and future potential for bioinorganic chemistry in the development of anticancer drugs, Drug Discovery Today 14(23–24) (2009), 1089–1097.

23.

Di Bernardo

, Melchior

, Portanova

, Tolazzi

and Zanonato

, Complex formation of N-donor ligands with group 11 monovalent ions, Coordination Chemistry Reviews 252(10–11) (2008), 1270–1285.

24.

Ronconi

and Sadler

P.J.

, Applications of heteronuclear NMR spectroscopy in biological and medicinal inorganic chemistry, Coordination Chemistry Reviews 252(21–22) (2008), 2239–2277.

25.

Bruijnincx

P.C.

and Sadler

P.J.

, New trends for metal complexes with anticancer activity, Current Opinion in Chemical Biology 12(2) (2008), 197–206.

26.

Pletnev

I.V.

and Zernov

V.V.

, Classification of metal ions according to their complexing properties: a data-driven approach, Analytica Chimica Acta 455(1) (2002), 131–142.

27.

Dimmock

, Warwick

and Robbins

, Tutorial review. Approaches to predicting stability constants, Analyst 120(8) (1995), 2159–2170.

28.

Fayet

, Joubert

, Rotureau

and Adamo

, On the use of descriptors arising from the conceptual density functional theory for the prediction of chemicals explosibility, Chemical Physics Letters 467(4–6) (2009), 407–411.

29.

Campetella

, Maschietto

, Frisch

M.J.

, Scalmani

, Ciofini

and Adamo

, Charge transfer excitations in TDDFT: A ghost-hunter index, Journal of Computational Chemistry 38(25) (2017), 2151–2156.

30.

Solov’ev

, Sukhno

, Buzko

, Polushin

, Marcou

, Tsivadze

and Varnek

, Stability constants of complexes of Zn^2 +, Cd^2 +, and Hg^2 + with organic ligands: QSPR consensus modeling and design of new metal binders, Journal of Inclusion Phenomena and Macrocyclic Chemistry 72(3–4) (2012), 309–321.

31.

Casasnovas Perera

, Ortega Castro

, Donoso Pardo

J.L.

, FrauMunar

and Muñoz Izquierdo

, Theoretical calculations ofstability constants and pKa values of metal complexes in solution:application to pyridoxamine-copper (II) complexes and theirbiological implications in AGE inhibition, Physical ChemistryChemical Physics, 2013 15 (2018), 16303–16313.

32.

Salahinejad

, Quantitative structure property relationships on formation constants of radiometals for radiopharmaceuticals applications, Journal of Radioanalytical and Nuclear Chemistry 303(1) (2015), 671–680.

33.

Frank

R.A.

, Sanderson

, Kavanagh

, Burnison

B.K.

, Headley

J.V.

and Solomon

K.R.

, Use of a (quantitative) structure–activity relationship [(Q) Sar] model to predict the toxicity of naphthenic acids, Journal of Toxicology and Environmental Health, Part A 73(4) (2009), 319–329.

34.

Benigni

, Cotta-Ramusino

and Andreoli

, Relationship between chlorofluorocarbon chemical structure and their Salmonella mutagenicity, Journal of Toxicology and Environmental Health 34(3) (1991), 397–407.

35.

Carbonaro

R.F.

and Di Toro

D.M.

, Linear free energy relationships for metal–ligand complexation: monodentate binding to negatively-charged oxygen donor atoms, Geochimica et Cosmochimica Acta 71(16) (2007), 3958–3968.

36.

Salahinejad

and Zolfonoun

, 3D-QSAR studies of polyazaheterocyclic ligands used in lanthanide and actinide extraction processes, Solvent Extraction and Ion Exchange 32(1) (2014), 59–77.

37.

Ahmadi

, Application of GA-MLR method in QSPR modeling of stability constants of diverse 15-crown-5 complexes with sodium cation, Journal of Inclusion Phenomena and Macrocyclic Chemistry 74(1–4) (2012), 57–66.

38.

and Zhang

W.-J.

, Comparison of different methods for variable selection, Analytica Chimica Acta 446(1–2) (2001), 475–481.

39.

Parker

C.N.

and Bajorath

, Towards unified compound screening strategies: A critical evaluation of error sources in experimental and virtual high-throughput screening, Qsar & Combinatorial Science 25(12) (2006), 1153–1161.

40.

Ghandadi

, Shayanfar

, Hamzeh-Mivehroud

and Jouyban

, Quantitative structure activity relationship and docking studies of imidazole-based derivatives as P-glycoprotein inhibitors, Medicinal Chemistry Research 23(11) (2014), 4700–4712.

41.

Duchowicz

P.R.

, Castro

E.A.

, Fernández

F.M.

and Gonzalez

M.P.

, A new search algorithm for QSPR/QSAR theories: Normal boiling points of some organic molecules, Chemical Physics Letters 412(4–6) (2005), 376–380.

42.

Mercader

A.G.

, Duchowicz

P.R.

, Fernández

F.M.

and Castro

E.A.

, Replacement method and enhanced replacement method versus the genetic algorithm approach for the selection of molecular descriptors in QSPR/QSAR theories, Journal of Chemical Information and Modeling 50(9) (2010), 1542–1548.

43.

Levet

, Bordes

, Clément

, Mignon

, Chermette

, Marote

, Cren-Olivé

and Lantéri

, Quantitative structure–activity relationship to predict acute fish toxicity of organic solvents, Chemosphere 93(6) (2013), 1094–1103.

44.

Cortes

and Vapnik

, Mach Learn 20 (1995), 273.

45.

Wang

, Xuan

, Yan

and Yu

, Classification models of HCV NS3 protease inhibitors based on support vector machine (SVM), Combinatorial Chemistry & High Throughput Screening 18(1) (2015), 24–32.

46.

Zhu

, Li

, Yu

, Zhang

, Mao

and Hou

, Insight into the structural requirements of narlaprevir-type inhibitors of NS3/NS4A protease based on HQSAR and molecular field analyses, Combinatorial Chemistry & High Throughput Screening 15(6) (2012), 439–450.

47.

Llinas-Brunet

, Bailey

M.D.

, Goudreau

, Bhardwaj

P.K.

, Bordeleau

, Bös

, Bousquet

, Cordingley

M.G.

, Duan

and Forgione

, Discovery of a potent and selective noncovalent linear inhibitor of the hepatitis C virus NS3 protease (BI 35), Journal of Medicinal Chemistry 53(17) (2010), 6466–6476.

48.

Zhao

, Zhang

, Deng

and Zhang

, Prediction of viscosity of imidazolium-based ionic liquids using MLR and SVM algorithms, Computers & Chemical Engineering 92 (2016), 37–42.

49.

Liu

, Zhang

, Yao

, Liu

, Hu

and Fan

B.T.

, Prediction of the isoelectric point of an amino acid based on GA-PLS and SVMs, Journal of Chemical Information and Computer Sciences 44(1) (2004), 161–167.

50.

Burbidge

, Trotter

, Buxton

and Holden

, Drug design by machine learning: support vector machines for pharmaceutical data analysis, Computers & Chemistry 26(1) (2001), 5–14.

51.

Sharma

and Yap

C.W.

, Consensus QSAR model for identifying novel H5N1 inhibitors, Molecular Diversity 16(3) (2012), 513–524.

Exploratory and machine learning analysis of the stability constants of Hg II - triazene ligands complexes

Abstract

Keywords

1 Introduction

2 Materials and methods

2.1 Data set

2.3 Descriptor calculation

2.4 Variable selection

2.5 Development of QSPR model and screening

2.5.1 MLR

2.5.2 Support vector machine

3 Results and discussion

Table 2 Summary of statistical results of MLR and SVM models Validation parameter MLR model SVM model R2c 0.804 0.917 Q2 (LOO) 0.757 0.756 Q2LGOn = 4 0.721 0.705 SEC 0.215 0.141 SEP 0.251 0.263 Y2random 0.265±0.102 0.325±0.132

Conflict of interest

Funding

Authors’ contributions

Availability of data and material

References

Table 2
Summary of statistical results of MLR and SVM models

Validation parameter MLR model SVM model

R²_c 0.804 0.917

Q²_(LOO) 0.757 0.756

Q²_LGOn = 4 0.721 0.705

SEC 0.215 0.141

SEP 0.251 0.263

Y²_random 0.265±0.102 0.325±0.132