Abstract
The greatest challenge for healthcare in drug repositioning and discovery is identifying interactions between known drugs and targets. Experimental methods can reveal some drug-target interactions (DTI) but identifying all of them is an expensive and time-consuming endeavor. Machine learning-based algorithms currently cover the DTI prediction problem as a binary classification problem. However, the performance of the DTI prediction is negatively impacted by the lack of experimentally validated negative samples due to an imbalanced class distribution. Hence recasting the DTI prediction task as a regression problem may be one way to solve this problem. This paper proposes a novel convolutional neural network with an attention-based bidirectional long short-term memory (CNN-AttBiLSTM), a new deep-learning hybrid model for predicting drug-target binding affinities. Secondly, it can be arduous and time-intensive to tune the hyperparameters of a CNN-AttBiLSTM hybrid model to augment its performance. To tackle this issue, we suggested a Memetic Particle Swarm Optimization (MPSOA) algorithm, for ascertaining the best settings for the proposed model. According to experimental results, the suggested MPSOA-based CNN- Att-BiLSTM model outperforms baseline techniques with a 0.90 concordance index and 0.228 mean square error in DAVIS dataset, and 0.97 concordance index and 0.010 mean square error in the KIBA dataset.
Keywords
Introduction
Repositioning drugs (or repurposing them) is a growing trend in the pharmaceutical industry. As opposed to conventional approaches to drug development, it can save both money and time. Drug repositioning aims to identify and develop novel therapeutic applications for previously approved or withdrawn pharmaceutical products [1]. It is crucial to predict drug-target interactions (DTIs) when searching for potential new drug candidates based on known drug targets, such as proteins. Finding DTIs through experimentation is difficult, time-consuming, and costly. Over the past decade, several algorithms have been developed to anticipate drug-target interactions (DTI) [2, 3]. These strategies can be broken down into three categories, namely, ligand-based [4], target-based [5], and chemogenomic-based approaches [6, 7]. The idea behind ligand-based methods is that similar substances frequently bind to related proteins. These approaches employ machine learning to anticipate the interaction between the two by comparing a candidate ligand to a database of recognized active ligands for a target protein [8]. However, when there are few known active ligands, the effectiveness of these techniques might increase.
Nevertheless, molecular docking and other target-based techniques can predict many protein-ligand binding affinities. Their utilization is time-consuming and unfeasible for several targets, such as ion channel proteins and G-protein coupled receptors (GPCRs), for which the 3D structure is not known [9, 10]. As a result, drug discovery and repositioning are becoming more attractive in chemogenomic-based techniques to anticipate drug-target interactions. This is due to the limitations of the strategies above [6, 10]. Chemogenomic-based techniques combine the available space of the compounds and the genomic space of targeted proteins. To learn how drugs interact with their targets, we map all possible DT interaction networks into a single euclidean space [6, 11]. In chemogenomics, several computational methods have been developed to find novel DTIs. Subsets of these techniques, such as those that use graph theory, networks, or machine learning, can be further divided. Machine learning-based approaches have grown in significance in the chemogenomic landscape in recent years due to their greater capacity to anticipate novel DT interactions. Supervised and semi-supervised machine learning are the two main methods utilized for DTI prediction.
To solve the DTI problem, semi-supervised algorithms merged a sizable amount of unlabeled data with a small amount of labeled data. For supervised learning algorithms, positive and negative labels are necessary to classify known drug-target interactions and the remaining potential combinations [9, 12]. While DTI predictions are interesting, researchers are more curious about the drug-target binding affinity (DTA) [13, 14] which can reveal the intensity of a DT pair’s interaction. Therefore, predicting DTA can significantly increase the success of drug discovery by eliminating ineffective DT pairs with low binding affinity scores, thereby shrinking the search space. Based on the outcome of the prediction, these techniques can be broken down into either regression or binary classification.
Over the past three decades, a wide range of strategies have been developed to predict DTIs, including those based on ligands and receptors [15, 16], as well as text mining [17], gene ontologies [18], and reverse virtual screening techniques (reverse docking) [19, 20, 21]. These methods all have flaws. Thus, improvements are continuously being made to them. However, receptor-based approaches do not always use docking simulation since it requires the 3D structures of the target proteins, which are not always available. The procedure is also expensive. However, because they rely on their DTI predictions on the similarity between potential ligands and the known ligands of the target proteins, ligand-based methods perform worse when there are fewer functioning target proteins. The main drawback of techniques based on gene ontologies and text mining is the data contained in the text itself.
Furthermore, it is not easy to find new insights because the text-mining approach only uses already gathered knowledge (i.e., published material). By using models that learn the characteristics of current medicines and their targets to anticipate novel DTIs, deep learning (DL), machine learning (ML), and artificial intelligence (AI), in general, can get beyond these restrictions. It is occasionally clear what would be strictly an ML approach and what would be an AI method, even if machine learning methods are only a subset of AI methods. This is especially evident when conventional machine learning methods combine search, network, and graph analysis methods. DL methods are easy to identify from shallow ML approaches since they are a subset of ML approaches based on modifying the original input data representation across several information processing layers.
Most studies [13, 14, 22, 23] have yet to frame the DTI prediction as a regression problem. Recently, deep learning has revolutionized machine learning by exceeding conventional techniques in numerous tests [24]. As an illustration, consider the NLP tasks like speech recognition [25], sentiment analysis [26], and text categorization [27]. Standard deep learning models like Convolutional Neural Networks (CNN) [28] and Recurrent Neural Networks (RNN) have significantly advanced in the field of drug development because of the recent growth of datasets for drug-target interactions and protein structural data [29, 30]. Predictions of the amino acid sequence type present a unique challenge to the already difficult task of modeling protein structures and functions. More and more people are turning to deep learning solutions in bioinformatics for tasks like analyzing protein-related data, and the results have been promising. For predicting the accurate drug-target binding affinities, we proposed a novel approach in this paper, an intelligent deep learning model named “Attention-based bidirectional long short-term memory (Att-BiLSTM) with convolution layer” (CNN-AttBiLSTM), which combines an attention-based BiLSTM with a convolution layer. To benefit from the high-level local feature representation acquired by the Convolution Neural Network, the proposed model leverages label encoding (CNN). The AttBiLSTM network accurately depicts the semantic relationship through long-term correlation. The proposed model is trained via label encoding, and the inputs for DP (i.e., drug and protein) are provided as sets of texts. These data are transmitted to two distinct CNN blocks, which evaluate the resulting substance and drug representations. The DP sequence is supplied with additional information to extract sentence features. Att-BiLSTM enables the collection of data across all forward- and backward-facing phrases and the comprehension of token sequence connections. The output of the AttBiLSTM layer is then transmitted to the dense layer, where it is modified with a sigmoid function to determine the affinity value.
Finding the optimal settings for a DL model in the DTI prediction task typically entails manual experimentation or using previously successful models. It can be challenging to develop a reliable CNN-AttBiLSTM model for predicting drug-target affinities. We propose a bio-inspired algorithm-based strategy for hyperparameter optimization to find the best configurations for the CNN-AttBiLSTM model. To implement this and optimize the performance of our model, we suggest implementing the Memetic Particle Swarm Optimization Algorithm (MPSOA), one of the newest bio-inspired optimization methods often used for a variety of optimization tasks. The comparison of our approach to benchmark procedures and a specially created CNN-AttBiLSTM model is shown in the results section. The following are some of the paper’s key points:
The novel Bio-Inspired based CNN-AttBiLSTM is developed to determine the drug-target binding affinity score. The modified architecture CNN-AttBiLSTM performs well against all the state of art methods. The MPSOA-based deep learning model outperforms the current state-of-the-art techniques with a CI score of 0.90 and MSE of 0.22 in the DAVIS dataset and a CI score of 0.97 and MSE of 0.010 in the KIBA dataset. Advising the most optimal model can be groundbreaking in discovering new drugs with its target.
The remaining portion of the paper is structured as follows. Section 2 covers the study of DTI prediction and the numerous models used to evaluate how effectively a drug binds to its target. The proposed framework is discussed in Section 3. The datasets, experimental results, and machine parameters are presented in Section 4. Section 5 describes the conclusion and subsequent work, followed by a list of citations.
This section explores DTI classification and regression methods motivated by bio-inspired learning, machine learning, and deep learning. The numerous hyperparameter optimization techniques required for DTA prediction are also covered.
The related studies on DTI prediction
In many prior studies [31, 32], DTI prediction was seen as a binary classification problem. The suggested models are used to find drug-target interactions. In any scenario, by using predetermined cutoff values for binding affinities, such strategies made the DTI problem easier [22, 23, 33]. Not taking into account the binding affinity value, which measures how well a medicine binds to its target, is a significant oversight. Thus, DTI prediction is simplified as a regression model that may accurately predict the binding affinity value. In recent years, substantial effort has been put into developing regression-based methods for estimating DTI. Successful predictions of drug-target binding affinities have been made using strategies based on the random forest algorithm [33, 34]. Furthermore, for regression-based DTI prediction, similarity-based approaches were available, such as SimBoost [14] and KronRLS [13], which used the similarity information of drugs and targets.
Since deep neural networks are frequently used in computer simulations, speech recognition, and natural language processing, many deep learning-based models for predicting DTIs based on regression (NLP) have been developed. Recently, both sequence representation-based approaches that take DTI sequence information into account and graph representation method-based approaches that use structure information as inputs have been applied to the development of deep models for DTI prediction [35, 36]. A branching diagram in Fig. 1 illustrates how computational strategies might be grouped before being applied to anticipate DTI.
Methods for predicting DTI based on classification [37].
Due to its recurrent nature, the recurrent neural network is a type of artificial neural network designed to handle sequential data, such as time series, natural language, and speech. RNNs have a feedback loop that allows information to be passed from one step in the sequence to the next, enabling them to use context and previous inputs to inform current predictions. This makes them particularly useful for language modeling, speech recognition, and machine translation [38]. For sentiment analysis [39], speech recognition [40], and machine translation [41], a long short-term memory (LSTM) network performs better than a standard RNN. This network can learn new data and discard old information by switching the self-connected hidden units with memory blocks, thereby overcoming the Hochreiter and Schmidhuber-introduced vanishing gradient problem with conventional RNNs [38]. Bidirectional LSTM networks add a second layer to unidirectional LSTM networks to depict a sequence in both the forward and backward directions (see Fig. 2) [27]. By connecting the hidden states of forward and backward LSTMs, this model can use information from the past and future to its advantage, enabling it to extract usable features of high-semantic information from word embeddings. BiLSTM requires unfolding both forward and backward hidden states at each time step, whereas conventional networks can conduct forward and backward passes without doing so. Use a time-varying backpropagation strategy to train the BiLSTM networks.

Recent studies have shown that coupling BiLSTM with an attention mechanism improves the performance of many NLP tasks, including sentiment analysis [43], machine translation [44], and document categorization [45]. For the attention mechanism to be intuitively applied in a DTI prediction task, it is crucial to understand how to assign a weight to each word in a sentence, select the most significant words, and combine their representations into a sentence vector that influences the classification or regression results. The output from the BiLSTM units is not sent straight to the additional attention layer. However, the attention layer acquires a top-down perspective of the complete sequence of BiLSTM units. It assigns various attention weights to each branch depending on its level of relative significance [46].
Like any machine learning model, deep neural networks have hyper-parameters like the dropout rate, the number of hidden layers, and the number of filters. To train models effectively, specific hyperparameters must be configured. The hyperparameters of neural network models can be changed using various methods, including grid search, random search, Bayesian optimization, and evolutionary algorithms [46]. To determine the optimal values for a learning algorithm’s hyperparameters, a grid search does an exhaustive search across a subset of the whole hyperparameter space. The optimization of deep neural networks is conceivable with this technique, but only if the number of hyperparameters is large enough to support a grid of many different settings. The drawbacks of grid search led [47] to propose a random search strategy, demonstrated by random sampling the hyperparameter on many datasets, which is shown to be more interesting than grid search. Each hyper-value parameter is chosen for the random search based on a predetermined distribution. When this method is successful, vast search areas benefit from it. However, Bayesian optimization has been the subject of several recent research because of its potential use in hyperparameter optimization. There is proof that it outperforms grid and random searches while requiring fewer tests. Using the Bayesian optimization method, you can enhance a neural network without completely running it. This approach is employed in various industries, including robots, recommendation engines, the autonomous synthesis of chemical compounds, and the analysis of medication interactions with their targets. However, Bayesian optimization techniques are challenging to use and expensive due to the enormous dimensionality of the hyperparameter space. Hyperparameter optimization has advanced thanks to the use of evolutionary algorithms.
Evolutionary algorithms excel at optimization tasks with many candidate solutions compared to other computational methods. Table 1 above shows a comparison of the drug-target binding affinity approaches that have been employed so far in the Standard DAVIS and KIBA datasets. To that end, we apply the bio-inspired algorithm to the DTI prediction problem in this study, hoping to discover a good hybrid architecture.
Existing approaches for the standard DAVIS and KIBA dataset
Existing approaches for the standard DAVIS and KIBA dataset
This section consists of the proposed architecture followed by datasets and methods used for evaluation against the different state-of-the-art methods.
Proposed architecture
According to this study, CNN-AttBiLSTM can forecast drug-target interactions. This hybrid approach combines attention-based bidirectional long short-term memory with convolutional neural networks (CNNs) (Att-BiLSTM). Our model is built using two parallel CNN networks followed by the Att-BiLSTM network. For training, we supply sequences containing information about drugs and proteins. First, we use a character-based embedding technique to record the semantic information of drug and protein structures as a sequence of characters. The SMILES string “C1=CC2=C (C=C1C3=NC (=NC=C3) N) NN=C2N” is made up of the following one-character words: “C”, “1”, “=”, “C”, “C”, “C”, “2”,… “2”, “N”. The convolutional neural network (CNN) layer that generates sentence attributes is then fed the encoded sequence. The results of the pooling layer can be more broadly applied and require fewer samples. The BiLSTM network receives concatenated outputs from CNN and attention learning layers. The BiLSTM is used to gather data across all sentences in both directions and learn the links between tokens in the sequence. As the output from the BiLSTM layer, the sigmoid function predicts the affinity value. This value is then given to the attention learning layer and the fully connected layer. Finally, the loss function of the network was optimized using backpropagation with the Adam optimizer. Our model’s deep learning architecture uses the CNN model’s encoded local features extraction and the RNN model’s ability to capture long-term inter-dependencies between drugs and proteins. The proposed CNN-AttBiLSTM model is depicted in Fig. 3.
Proposed CNN-AttBiLSTM architecture.
As shown in Fig. 3, our model accepts the protein sequence and the drug sequence as inputs in the form of a Simplified Molecular Input Line Entry Specification (SMILES) string. Our model uses a character-based embedding strategy to represent inputs using integer (label) encoding. This approach is similar to the earlier techniques shown in Table 1. Unlike tokenization, which reduces each sequence to a set of words, character embedding reduces each sequence to a set of characters. To be more specific, we took the compounds’ canonical SMILES from the PubChem Compound database and applied encoding techniques to turn the SMILES sequence into one-character chemical words. Protein sequences, defined as strings spanning 20 different amino acids, are likewise stored using label encodings. To extract 1-residue terms from each dataset’s associated protein sequences, we first collected those sequences from UniProt. The sequence “MEPAAGFLSPRPFQRAA LLRDRSPA” for the protein kinase PLK3 can be written as [“M”, “E”, “P”, “A”, “A”, “G”, “S”, “P”, “A”] using this notation.
An embedding layer in artificial neural networks is commonly used in NLP tasks. The purpose of an embedding layer is to map high-dimensional sparse input data, such as words or word-like tokens, into a lower-dimensional dense vector space while preserving their semantic relationships. Using embeddings, the model can represent words more compactly and meaningfully, whereas words with similar meanings have similar embeddings. For the DTI task, the embedding layer captures semantic and syntactic information of the sequences. We employ an embedding matrix to decode the semantic information included in drug and protein encodings, as illustrated in Eqs (1) and (2). Word embeddings are intense representations of words and their meanings.
Semantic information for both drug and protein structures can be obtained by inserting sequences into the label encoding simultaneously. Making the labels machine-readable requires a procedure called “Label Encoding”. The correct application of these labels can then be determined using machine learning techniques. It’s essential for the dataset’s initial processing. Then the results of the label encoding are passed on to the embedding layer. Word embedding can work when words or entire pages must be represented visually. Instead, words are projected onto a continuous vector space in an embedding, where dense vectors represent them. Where a word should be placed in the vector space depends on how it is used in the text. A word’s embedding describes its position within the learned vector space. The proposed model takes advantage of Keras’s embedding layer, which maximizes each word’s average log probability while capturing their semantic relationships.
Convolutional Neural Networks (CNNs), which are a type of Deep Neural Networks, are used by a lot of researchers to analyse images. These new ideas can be used to recognise pictures and videos, classify images, use image analysis in health, improve computer vision, and understand natural language. The results of both embedding layers are sent to the CNN network, which helps reduce the number of dimensions in the input data and makes a number of feature maps that help us train our model well. The feature map gives information about the picture, like where the corners and edges are. Then, later layers use this feature map to learn more about the picture that was sent in. The feature map is then run through a non-linear activation function, which adds non-linearity and improves the model’s ability to describe reality. After the convolutional operation, a pooling operation is usually done to lower the spatial dimensions of the feature map and make an abstract representation of the image data. Max pooling is a popular type of pooling operation that takes the highest value from a small area of the feature map and throws away the rest.
According to Eqs (3) and (4), to produce the mapping feature
In Eqs (3) and (4),
The final features of the drug and protein GMP layer are then fed to the Attention layer individually.
Although LSTM performs well when presented with sequences of varying lengths, it cannot use contextual information from future tokens or extract local contextual details. As an added complication, LSTM cannot distinguish between the relative importance of various document sections. These issues hamper the effectiveness of LSTM for text classification. By combining BiLSTM with the attention and convolution layers, this work enhances LSTM’s text classification performance while mitigating the problem mentioned above. The output of the CNN layer (drugs and proteins) is fed concurrently with the attention mechanism. An attention layer can be added to the model to selectively focus on certain parts of the input data relevant to the prediction. In drug target prediction, attention can be used to highlight essential substructures in the drug or important amino acid residues in the target that contributes to the interaction. To feed the output into the BiLSTM layer, the output must first be flattened. The capability of unidirectional LSTM networks is expanded by bidirectional LSTM networks, a subtype of RNN, by adding a second layer that may be trained to learn both the forward and backward representations of a sequence. By joining the hidden states of the forward and backward LSTMs, this model can predict and react to new input. Word embedding can also be used to derive high-level semantic information characteristics. The hidden state of BiLSTM is thus described by Eq. (7):
Assume that the BiLSTM layer generates an output vector. Given that
Where,
Each drug-protein pair’s output of the BiLSTM layer is fed back into the attention layer. The layer’s output is passed on to the fully connected one, and finally, a regression layer is used to estimate the drug-target affinity value based on the layer’s output. For each dense layer, i.e., in FC1 and FC2, we employ a Rectified linear activation function (ReLu), as shown in Eq. (12) below:
According to Eq. (13), the regression network uses Sigmoid as an activation function over the final output vectors
MPSOA (Multiobjective Particle Swarm Optimization Algorithm) is a metaheuristic optimization algorithm that can be used for hyperparameter tuning machine learning models, including hybrid deep learning models. In a hybrid deep learning model, different types of neural networks are combined to take advantage of their strengths and mitigate their weaknesses. For example, a convolutional neural network (CNN) might be used to extract features from an image, and the output from the CNN could then be fed into a recurrent neural network (RNN) for sequence modeling. Furthermore, developing a reliable hybrid CNN-AttBiLSTM model for predicting drug-target affinities can be challenging. Here, we provide an algorithm-based approach based on bio-inspiration for hyperparameter optimization to determine the best parameters for constructing and tuning a hybrid CNN-AttBiLSTM model. We advise applying the cutting-edge Bio-Inspired Memetic Particle Swarm Optimization Algorithm (MPSOA), which has proven efficient for numerous optimization problems, to increase the effectiveness of our model. The hybrid algorithm known as memetic PSO (MPSO), depicted in Fig. 4, combines the best PSO and local search properties.
While the local MPSO element concentrates on potential solutions, the global MPSO component scans the whole search area. Initial population formation is the first of the four essential processes that comprise the CNN-AttBiLSTM architecture based on MPSOA. Each individual is a random configuration picked from the range of possible values for each parameter. Then, each individual is evaluated using the MPSOA-based CNN-AttBiLSTM by computing the value of their fitness function in the pertinent circumstance. The MPSOA algorithm’s operators are then used to update each individual. In our work, the termination criterion is the return of the best member of the current generation after the maximum number of generations. One class of heuristic search algorithms known as MAs is created for global optimization using a large population to solve a problem. They were primarily inspired by Dawkins’ “meme”, which stands for a unit of cultural evolution that can display refinement, and models of adaptation in natural systems, which integrate individual learning over a lifetime with an evolutionary adaptation of individuals. In MAs, there is a stage for personal optimization or learning during the search process (typically a local search).
The architecture of the proposed framework. 
The simple pseudocode for the MPSO algorithm:
Input:
Repeat the steps below until a stopping point is reached
gbest
gbest
Until Termination criterion (
Output: Change the best particle in P into a partition solution and send the result to the output.
In MPSO, local search can be performed using hill climbing, simulated annealing, or tabu search. Local search helps escape from local optima and converge to a better global solution – each of these primary processes, as outlined in the algorithm’s pseudocode, is discussed below.
The Initialization step in the Memetic Particle Swarm Optimization Algorithm (MPSOA) involves setting up the initial conditions for the optimization process with a random discrete population
where,
Number of convolution filters for drugs (NCFD), number of convolution filters for proteins (NCFP), convolution filter size for drugs (CFDS), convolution filter size for proteins (CFDP), LSTM dimension (LSTMdim), number of neurons in fully connected (FC) layers (NNFCL), and the dropout rate for FC layer are the seven elements that makeup
Parameter value for chromosome structure
Chromosome representation of a potential solution.
Table 2 displays the possible results of the genes that constitute the chromosome structure. The MPSOA algorithm is used to tune several CNN-AttBiLSTM hyperparameters, as shown in Table 2. To obtain optimal performance, the batch size, the number of epochs, the optimizer, and the activation function are all hyperparameters that require human involvement. We trained the network for 200 iterations using Adam as the optimizer and a learning rate of 0.001. As an initial activation function, we chose the ReLU.
In this step, the CNN-AttBiLSTM model is built with hyperparameters optimized for the present state of the art. Five rounds of cross-validation (CV) are used to assess the model. One-fold is used for testing the model, another for training, and the final fold is utilized as a validation set. The fitness function of the appropriate solution is measured by the test Concordance index (C-index). It is the mean of the values collected in five separate samplings (five tests for the same solution). The mathematical definition of a solution’s fitness is given in Eq. (15) below.
Where,
The current model that is being built is represented by the parameter The
Hybrid Particle Swarm Optimization (HPSO) is a type of optimization algorithm that combines the principles of Particle Swarm Optimization (PSO) with other optimization algorithms. The objective of HPSO is to improve the global search ability and convergence speed of the standard PSO algorithm. Algorithm 1 depicts the HPSO process:
“Crossover” means developing a new strategy by merging a few different ones. The pbest or gbest particle is used as the target in this method, while a different particle is selected randomly from the population to act as the source. The size of the source particle’s cluster is then constrained by randomly selecting a vertex from the source particle, and the target particle’s vertices are ensured to all belong to the same cluster. In this study, we apply this method as a crossover. Under most iterations, each particle conducts a crossover between its local history best and the population’s global best. This allows particles to actualize their cognitive processes and accomplish goals related to information sharing and collaboration. Because of this, the particles can keep changing over and over again.
To promote genetic diversity, our technique includes a probabilistic one-point mutation operation on the chosen particle. Here we explain what we mean by “mutation operator”: Select vertices at random from the particle and move to a neighboring vertices cluster. The mutation process repeats itself
Local search (Tabu Search-TS)
The local search strategy in MPSOA is the TS. The Tabu Search (TS) algorithm could be used as the local search strategy in the Memetic Particle Swarm Optimisation (MPSO) algorithm to predict how well a drug will bind to a target. In this case, the TS algorithm would be used to improve the solutions found by the particle swarm optimisation algorithm and to make the expected binding affinities more accurate. The TS algorithm can be used as a local search method in the MPSO algorithm to find the best way for drug molecules to bind to a particular protein target. To keep from going back to the same answer, the TS algorithm keeps a list of moves that are not allowed. This list is called the Tabu list. The TS method can be used to look at all of the possible drug molecules and find the one that binds to the protein target best.
Discussion and results
The datasets, our experimental setup, some standard operating procedures, and the suggested MPSOA-based CNN-AttBiLSTM model findings are presented in this part.
Experimental setup
The Keras framework in Python was used for all the experiments in this paper (Version 3.7). The proposed bio-inspired algorithm MPSOA has also been implemented in Python. To evolve and train CNN-AttBiLSTM architectures, we used a personal computer with a Ryzen 7 3700x CPU and an NVIDIA GeForce RTX 2070 GPU. In addition, the best model was retrained and tested with the help of Google Colab.
Datasets
The research used the drug-target interaction databases Kiba [55] and Davis [56, 57] as shown in Table 3. As you can see from Table 1, these datasets have been used in a wide range of research studies. All drug-target pairs in each dataset have been assigned continuous binding affinity values. The following are some summary statistics for these data sets:
Data set
Data set
Kiba: As of the first release of KIBA, the dataset included 467 targets and 52498 drugs. The 2017 dataset includes 229 distinct proteins and 2111 specific drugs after being filtered to remove those with fewer than 10 interactions each [14]. Integrating data from several scales, including IC
Davis: In the interactions shown in Table 3, 450 proteins and 68 ligands are involved. Selectivity assays with
The concordance index and mean square error, metrics similar to those presented in [22] and [50] were used to gauge the effectiveness of our suggested model.
Concordance index (C-index): The Concordance Index, also known as the C-index, is a statistical measure used to evaluate the performance of a binary classifier system. It is defined as the proportion of all possible pairs of positive and negative samples correctly ranked by the classifier, with a C-index value of 1 indicating perfect discrimination and a value of 0.5 indicating random performance, as shown below in Eq. (17) [13, 58].
Equation (18) illustrates the step function
Mean Squared Error (MSE): is a commonly used loss function or performance metric in regression analysis. As seen in Eq. (19), the MSE is calculated as the average sum of squares of the differences between the projected value
Parameters settings of MPSOA
Parameters settings of MPSOA
Optimal configurations and metrics generated by the MPSOA-based CNN-AttBiLSTM model for both datasets
C-index and MSE comparison of the proposed bio-inspired hybrid deep learning model with the benchmark models
The development of a bio-inspired algorithm that optimizes the architecture of a CNN-AttBiLSTM to improve its C-index is one of the contributions of this work. Table 4 shows the empirical selection of MPSOA algorithm parameters or attributes like swarm population size (
(a) C-Index values after optimizing hyperparameters on the Davis dataset, (b) Mean squared error values after optimizing hyperparameters on the Davis dataset.
Here, we select a generalized model with a single optimal parameterization that works across all available data. A CNN-AttBiLSTM model is built, trained, and evaluated on the same datasets as in Table 3; the optimal configurations are then picked using the five-fold cross-validation method, as shown in Table 5. We compared the quality of the tested models using the C-index and MSE. The average C-index and MSE of the optimal models are compared to those of the baseline methods.
In this section, we examine the generic CNN-AttBiLSTM model based on MPSOA compared to the baseline methods and the CNN-AttBiLSTM structure optimized for the Davis and Kiba datasets. The datasets were divided into test and train sets using the same 5-fold cross-validation method described in [22]. Standard methods use chemical and biological textual sequences, such as LMCS, PS, LS & PDM, S-W, and the likeness among compounds to describe the properties of proteins and compounds. Table 6 compares the mean square error and concordance index for several models across both datasets. The manually created CNN-AttBiLSTM model on every dataset outperforms state-of-the-art models in terms of both metrics. On the Davis dataset, it outperforms SimBoost, KronRLS, DeepDTA, and WideDTA.
Comparative analysis between the proposed bio-inspired hybrid deep learning model and the benchmark models in terms of regression toward the mean and AUPR average scores
Comparative analysis between the proposed bio-inspired hybrid deep learning model and the benchmark models in terms of regression toward the mean and AUPR average scores
(a) C-Index values after optimizing hyperparameters on the KIBA dataset, (b) Mean squared error values after optimizing the KIBA dataset.
(a) Comparison of predicted binding affinities for the Davis dataset using MPSOA-based CNN-AttBiLSTM with the observed affinities values. (b) Comparison of predicted binding affinities for the KIBA dataset using MPSOA-based CNN-AttBiLSTM with the observed affinities values.
The outcomes show that our proposed MPSOA-based CNN-AttBiLSTM model outperforms manually parameterized CNN-AttBiLSTM structures on both the Kiba and Davis datasets. Our model exhibits statistically significantly increased performance as judged by the MSE and C-index compared to the most recent baseline techniques. With scores of 0.97 and 0.90, respectively, in C-index comparisons, the suggested model outperforms all baseline approaches on both datasets. Furthermore, our model’s MSE values were significantly lower than the MSE values of every other baseline model on the Davis and Kiba datasets. On the Davis Dataset, the MSE value is 0.228, while on the KIBA dataset, it is 0.010. Figures 6a, 6b, 7a, and 7b depict the CNN-AttBiLSTM learning curve constructed using the optimal produced model based on MSE and C-index values for the two datasets (b). Our CNN-AttBiLSTM model with a generalized MPSOA foundation performs well and does not overfit the training and validation datasets. These findings suggest that the deep CNN-AttBiLSTM network’s design can be globally optimized using the MPSOA method.
Two additional metrics, the regression averages toward the mean square
The average
The study’s findings show that the suggested methodology results in the highest mean regression and Area under the precision-recall curves for both datasets. To further illustrate and evaluate the prediction performance of our MPSOA-based CNN-AttBiLSTM, we show the predicted (p) and actual (a) binding affinity values for both datasets in Fig. 8a and b.
Hence, the traditional ways of doing experiments to see how well two things stick together usually take a lot of time, supplies, and people. By using models based on biological systems, researchers can reduce the amount of experiments they need to do. Computer predictions can help cut down on the number of compounds that need to be tried in the lab as a first step. Personalized medicine, in which medicines are made for each person based on their genes and how their disease looks, can grow with the help of bio-inspired models. By finding out how well different drugs bind to a patient’s target proteins, researchers can choose the best treatment and dose and improve the chances of success.
While many studies have treated DTI prediction as a regression problem, almost all of the current machine learning-based approaches frame it as a binary classification challenge. This paper introduced a novel hybrid model that predicts drug-target binding affinity by layering the CNN network over an attention-based BiLSTM with an additional attention mechanism. This model can use CNN to extract complicated features and resolve the time-dependency problem of the LSTM model. Sequence representations of the input proteins and medicines are used in the suggested method. SMILES strings specifically stand in for drugs, and amino acid strings for proteins. But the hyperparameters must be appropriately tweaked to create a powerful hybrid CNN-AttBiLSTM model for predicting drug-target affinities.
Using a bio-inspired algorithm for hyperparameter optimization, the best way to set up the parameters needed to build and train the CNN-AttBiLSTM model was found. The two evaluation measures showed that our MPSOA-based CNN-AttBiLSTM model did much better than the benchmark CNN-AttBiLSTM model and the CNN-AttBiLSTM models that were made by hand. Given the problems with drug-target binding affinity, it’s clear that the MPSOA method is a good way to make deep learning models stronger. Two benchmark kinase datasets were the subject of the current experiment. In our future work, we will propose a CNN-AttBiLSTM variant that is more bio-inspired and uses different ways to represent proteins, drugs, and drug targets. Also, we want to improve the performance of MPSOA by adding more variables that can be changed. This will make it more competitive with other hyperparameter optimization methods.
Footnotes
Abbreviations
| DTBA | Drug-target binding affinity |
| ML | Machine learning |
| DTIs | Drug-target interactions |
| SW | Smith-Waterman |
| DL | Deep learning |
| ML | Machine Learning |
| KIBA | Kinase Inhibitor BioActivity |
| SMILES | Simplified Molecular-Input line-entry system |
| 3D | Three-dimensional |
| CNN | Convolution Neural network |
| PS | Protein Sequence |
| LS | Ligand SMILES |
| PDM | Protein Domains and Motifs |
| LMCS | Ligand Maximum Common Substructures |
| ESPF | Explainable Substructure Partition Fingerprint |
| CPI | Compound Protein Interaction |
| BiLSTM | Bidirectional Long Short-Term Memory Network |
| Att-BiLSTM | Attention-based bidirectional Long Short-Term Memory network |
| KronRLS | Kronecker Regularized Least Squares |
| SimBoost | Similarity-Based Gradient Boosting Machine |
| GNN | Graph Neural Network |
| DTA | Drug Target Affinity |
| DeepCDA | Deep Cross Domain Compound-Protein Affinity |
| CI/C-index | Concordance Index |
| MSE | Mean Square Error |
| DT | Drug Target |
