Abstract
Protein Remote Homology and fold Recognition (PRHR) is the most crucial task to predict the protein patterns. To achieve this task, Sequence-Order Frequency Matrix-Sampling and Deep learning with Smith-Waterman (SOFM-SDSW) were designed using large-scale Protein Sequences (PSs), which take more time to determine the high-dimensional attributes. Also, it was ineffective since the SW was only applied for local alignment, which cannot find the most matches between the PSs. Hence, in this manuscript, a rapid semi-global alignment algorithm called SOFM-SD-GlobalSW (SOFM-SDGSW) is proposed that facilitates the affine-gap scoring and uses sequence similarity to align the PSs. The major aim of this paper is to enhance the alignment of SW algorithm in both locally and globally for PRHR. In this algorithm, the Maximal Exact Matches (MEMs) are initially obtained by the bit-level parallelism rather than to align the individual characters. After that, a subgroup of MEMs is obtained to determine the global Alignment Score (AS) using the new adaptive programming scheme. Also, the SW local alignment scheme is used to determine the local AS. Then, both local and global ASs are combined to produce a final AS. Further, this resultant AS is considered to train the Support Vector Machine (SVM) classifier to recognize the PRH and folds. Finally, the test results reveal the SOFM-SDGSW algorithm on SCOP 1.53, SCOP 1.67 and Superfamily databases attains an ROC of 0.97, 0.941 and 0.938, respectively, as well as, an ROC50 of 0.819, 0.846 and 0.86, respectively compared to the conventional PRHR algorithms.
Keywords
Introduction
PRHR is the primary difficulty in establishing the unique PS composition with the fewest similarities and identified proteins [1]. Few computing algorithms were created, that may be classified as alignment procedures, discriminative approaches, and scoring strategies [2]. Feature-based methods outperform string-based methods in all categories. The string-based arrangement methods rely on the relationship among a couple of PSs revealed via the adaptive learning framework, for example, global and local alignment [3], BLAST [4] and FASTA [5].
In contrast, these methods are unable to generate meaningful matches when the genome similarity is less than 35%. Some feature-based alignment strategies were invented to enhance the adaptability of these systems [6-10]. On the other hand, alignment strategies based on Hidden Markov Models (HMMs) transform a Multiple Sequence Alignment (MSA) into a location-specific scoring strategy [11] that does not provide a precise maximum-ranking sequence but rather a collection of possible combinations. For employing the class of positive and negative PS, some discriminative algorithms depending on characteristics derived from protein main behaviors like motifs analysis are developed [12-14]. Similarly, a few unique kernel procedures were also created, such as SVM-Fisher, SVM-pairwise, SVM-ensemble, and so on [15].
When compared to sequence-based attributes, feature-based attributes are more accurate. In this regard, feature-based methods gain higher precision for recognizing the PRH and folds. Unsupervised searching of the sample sequence against a comparable database yields a feature based on the MSAs obtained. The MSA is made up of relevant strand properties as well as the sample string. The spot-specific spectrum vector [16] and spot-specific scoring vector [17] are frequently considered features. Conversely, they believe that the amino acid residues in an MSA are self-governing from every other and that the matrix values are decided by the residues found in the respective columns in an MSA.
Under this perspective, Liu et al. [18] created a unique feature termed the SOFM for merging the SO impacts of amino acids in MSAs. In this algorithm, Top-N-grams (SOFM-Top) and SOFM-SW are incorporated. The SVM was then provided the local alignment similarity as input to find the PRH and fold. In contrast, the computation complexity of SOFM is great due to the analysis of all PSs. So, the Proportional Volume Sampling (PVS) approach was suggested to identify PRH and folds by assuming only the specific proteins [19]. Also, the SVM’s loss rate was reduced by considering the protein structural and local alignments. In the SOFM-SW, the substitution score was computed by using the k-Nearest Neighbor (kNN) algorithm to analyze the PS alignment. In addition, the MSA was used on the sequence alignment to obtain the refined SOFM matrix and AS. To achieve the PRHR, this AS was trained in SVM.
However, it was incapable of handling large-scale PS to retrieve high-dimensional attribute maps. With the support of a DCNN-based decision-making system, a high-dimensional attribute map extraction method was developed [20]. In this approach, DCNN was used to generate the attribute vector for a variable-length protein string. DCNN can retrieve hidden properties from any dimension PS via convolutional adaptation and provides the attributes for alignments. This was the first method that converts all PSs in the input space into a structural attribute. In addition, the hidden fold-related properties created by sequences are taken into account while aligning the PSs. Further, the AS was learned in SVM to find PRH and folds. Although this alignment of PSs using affine-gap scoring was a familiar method, it takes more time and suffers from less accuracy. Also, the SW was only utilized for local alignment which does not obtain the maximum match between the sequences.
Therefore, in this article, the SOFM-SDGSW algorithm is proposed for PS alignment that employs sequence similarity and aids affine-gap scoring. The major contributions of this article are: Rather than aligning the separate characters, this algorithm first obtains the MEMs using bit-level parallelism. Following that, a subgroup of MEMs is collected to calculate the AS using the new adaptive programming approach. Because this algorithm aims to replicate the SW algorithm’s alignment, the determined AS is analogous to the AS obtained by the SW local alignment. The local and global ASs are then added together to generate the final AS. Moreover, this AS is fed to the SVM to recognize PRH and folds accurately. Thus, this algorithm creates a semi-global alignment wherein the initial or last bases of the alignment are removed without altering the AS.
The rest of the portions of this paper are prepared as follows: The recent work linked with the PRHR is presented in Section 2. Section 3 describes the SOFM-SDGSW algorithm, while Section 4 demonstrates its efficacy. Section 5 summarizes this paper and suggests its possible improvement.
Related works
Chen et al. [21] developed an ensemble classifier termed SVM-Ensemble with the weighted voting scheme to identify protein remote homology. It integrated different base classifiers depending on multiple attribute spaces which include the string composition and the string-level data along with the PS. But, it needs to extract more effective attributes for enhancing PRHR.
Li et al. [22] developed a novel semi-supervised alignment technique that merges nearby linear restorations in all manifolds. The distributed inherent pattern was found via resolving an optimization dilemma that concurrently matches the respective data and maintains the local geometry of all manifolds. But, the mean alignment accuracy was not effective. Sudha et al. [23] suggested an enhanced artificial neural network to identify the protein folds and predict the structural class. Nonetheless, it has high difficulty if the number of neurons was high and has less accuracy since it needs more features.
Petegrosso et al. [24] designed tag dissemination on low-rank kernel estimation to search extremely large protein networks. The primary protein relationships in a low-rank table were propagated using Nyström estimation exclusive of determining each pairwise similarity. But, it has a high computational cost. Also, the interval required to create a protein model was slightly higher than the HMM. Liu and Zhu [25] developed an improved ProtDec-LTR1.0 and ProtDec-LTR2.0 estimators by merging 3 feature-based attributes into the training model to refine the predictive efficiency. Also, the ProtDec-LTR3.0 estimator was designed to recognize the PRH. On the other hand, its computation burden was high.
Akdel et al. [26] presented an adaptive programming scheme in all transitional pairwise alignments to create a primary superposition of patterns. The progressive alignment scheme was used to merge aligned patterns into an alignment transitional. But, its runtime was high while increasing the protein length. Makigaki & Ishida [27] developed an algorithm for appropriate alignment creation using a transitional string search with template-based modeling. The transitional string search was applied to identify remote homology and align them by creating the transitional strings. But, the homology recognition rate was not highly effective due to the use of a simple transitional string search.
Baharav et al. [28] introduced a min-hash method to determine the spectral Jaccard relationship by applying the singular value decomposition on a min-hash collision matrix. But, its computational complexity was high and only suitable for a smaller database. Delibaş et al. [29] developed an alignment-free DNA string correlation technique depending on top-k n-gram matches with the estimation that usual recurrent DNA substrings define greater correlation among DNA strings. But, its accuracy was not highly efficient. Gao et al. [30] designed a CONVERT method to recognize the protein homology via discovering the multiple-to-single correlation between proteins and agent proteins using the seq2seq model. Additionally, scoring was performed to discover the sorted list and align the PS. On the other hand, the runtime was high if the number of proteins was high.
Rashed et al. [31] developed FPGA and a modified CNN to accelerate DNA pairwise sequence alignment. It was based on the creation of a truth table of a look-up table of each possible mixture of the DNA strings after transforming the DNA string from alphabets to binary interpretations. But, the CNN performance was degraded due to the more labels and needed to adjust their hyper-parameters. Jin et al. [32] a Supervised-Manner-based Iterative BLAST (SMI-BLAST) depending on PSI-BLAST for PRHR. But, its complexity was high while increasing the number of PSs.
Proposed methodology
In this section, the SOFM-SDGSW algorithm for the PRHR model is described in brief. Figure 1 depicts the complete flow of the presented model.

Entire flow of proposed PRHR model.
Initially, the SOFM is generated for the PSs according to the MSA. Next, the PVS is performed on the SOFM of each PS to get highly significant target PSs [18]. These target PSs are provided to the DCNN to estimate the substitution (alignment) score used in SW for optimal local alignment [20]. Besides, the bit-level parallel scheme is applied to retrieve the MEMs and utilize the subgroup of MEMs for computing the global AS based on the new adaptive programming scheme. Then, the local and global ASs are added to create the final AS. Moreover, this resultant AS is learned by the SVM for PRHR.
In this presented algorithm, the primary task toward PS alignment is to obtain MEMs between PSs by directly analyzing them. An example of comparing a target and query sequence is illustrated in Fig. 2(a), where TCA and CGG are 2 MEMs recognized. All collections of continuous similar characters in the similarity outcome in a MEM, though it consists of a particular identical character.

(a) Recognize MEMs by simple comparison of PS.
To retrieve each MEM among the PSs, the query string (Q) should be moved the entire way to the right and the left one character at a moment (as depicted in Fig. 2(b)). After every shift, the comparison process is continued to find fresh MEMs.

(b) Extraction of MEM using move function.
For instance, the 3rd row in Fig. 2(b) denotes the scenario in which the Qis moved to the right one character and is analyzed with the desired string. The outcome of the analysis recognizes CCCCGT as a fresh MEM. Each other MEM retrieved by the move and evaluate processes are emphasized in Fig. 2(b). Three of the MEMs (M a , M b , M c ) are outlined in various colors. In the affine-gap ranking scheme, the AS (AlignScore) is determined as:
In Equation (1), N x denotes the number of matches all accepting an equal R x , N y denotes the number of mismatches all accepting a mismatch penalty of P y , N o denotes the number of gap openings all accepting a gap open penalty of P o and N g denotes the overall length of all gaps, all accepting a gap expansion penalty of P g . All sets of continuous gaps can have a gap opening.
Partial alignments are used to determine the alignment for a specific collection of every MEM. For instance, assume MEMs M a , M b and M c . The partial alignments are prepared by considering various arrangements of M a , M b and M c together with the number of matches, mismatches, gaps and the resulting ASs. The alignment contains only M a and M c provides the maximum AS. Observe that M b and M c overlap all others and if both are taken in similar alignment, then the overlap is removed from M c .
Thus, taking every MEM results in several arrangements where none of them attains a better score. But, determining each probable mixture of MEM is difficult. So, a new adaptive programming scheme is described in this study that effectively discovers the optimal arrangement without considering each scenario. It desires to understand which portions of the PSs identical however not to understand the real characters in the PSs. The entry to the adaptive programming scheme is the arrangement of MEMs in the desired and query strings that are acquired in the MEM retrieval task. The MEM M i is defined as the triplet of integer number: the initial locations in the desired string T and Q, (ST i and SQ i , correspondingly) and its size L i . Then, the closing locations in T and Q are determined.
The amount of mismatches and gaps amid neighboring M
i
and M
j
(M
i
is on the left of M
j
) is determined from their arrangement in strings. Initially, the gap amid M
i
and M
j
in T and Q represented by
If there are both mismatches and gaps between M
i
and M
j
, then every gap is measured regularly to decrease the gap open penalty. So, a particular gap open penalty is employed for each neighboring MEM which has gaps between them. The arrangement of mismatches and the just constant gap is not essential since it could not influence the AS. Consider that the mismatch penalty is fixed for PSs. While there is an overlap between M
i
and M
j
either in T and Q, the overlap must be removed from M
j
. The length of overlap
To extract the MEMs, rapid bit-level parallelism is introduced depending on the shift and compare functions. The initial task is to define PSs with bit-vectors, where the bases A, C, T and G are set as 00, 01, 10 and 11, correspondingly. By the bit-vector representation of PSs, shifting the PS by one character is similar to moving the bit-vector in 2 bits and evaluating PSs is performed with the XOR rule (32 characters at a moment). In the XOR outcome
Algorithm 1: Determination of E for PS in bit-vector form
E ← E ∨ ((E ∧ 0101 … 0101) ⪡ 1);
E ← E ⊕ (E ⪢ 1);
In this system, the adaptive programming scheme is used to find the alignment effectively. This scheme is the procedure of finding the result of a challenge by describing and resolving less significant sub-challenges. Results for sub-challenges are applied to resolve a larger challenge at all stages. This procedure is continued until each sub-challenge is resolved. Finally, the solution to one of the sub-challenges may be the solution to the major challenge. If each sub-challenge is resolved a backtracking task recognizes the string of results that add to the absolute result. In this scheme, there must be a ranking of the incoming records before the looping process continues.
Every MEM is ranked regarding the location of its end in (EQ). MEMs ending in a similar location is ranked randomly. The j th sub-challenge is to discover the alignment of substrings T and Q which ends at j th MEM (M j ), i.e. T [1 … ET j ] and Q [1 … ET j ], correspondingly. It is noticed that this ranking of MEM is satisfactory to assist the proper looping.
In the ranked collection of MEMs, EQ i = EQ j defines that any of M i or M j completely overlaps another MEM in Q. As the overlap area is removed, M i and M j cannot be in similar alignment. So, i th and j th sub-challenges are resolved separately from every other and the rank of i and j in the ranked collection may be random. When EQ k > EQ j (k > j in the ranked collection), M k is not a part of the alignment that ends in M j . So, j th sub-challenges are resolved separately from the solution to k th sub-challenge. Observe that it is promising to rank MEMs depending on their ending location in T (ET) by the comparable validation.
The input to the adaptive programming scheme is the collection of MEMs where all MEMs (M
j
) are a triplet of integers [L
j
, SQ
j
, ST
j
]. The 2nd input n denotes the amount of MEMs in the collection and the output S defines the global alignment (substitution) score for the PSs.The processes in this scheme are the following: Scoring all MEMs for each of their identical characters. Observe that there are L
j
identical characters in M
j
. S
j
is the maximum AS for the alignment finish at M
j
. Initializing S
j
is analogous to determining the partial AS if only M
j
is added in the alignment. W (j) is applied for backtracking. Here, -1 represents that the present S
j
is acquired by taking M
j
only in the alignment. Determining S
j
for all M
j
. To determine S
j
for all M
i
where M
i
exists before M
j
in the collection, the algorithm includes M
j
to the alignment ending at M
i
and seems for the expansion that increases S
j
. Omitexpansion if it is not achievable. When ET
i
> ET
j
, M
i
holds a portion of T which is beyond the alignment ending at M
j
and the expansion is not feasible. When EQ
i
= EQ
j
or ET
i
≥ ET
j
or SQ
i
≥ SQ
j
or ST
i
≥ ST
j
, any of the MEMs completely overlaps another MEM. In this situation, M
i
and M
j
are not able to align jointly. Determining the overlap length amid M
i
and M
j
. When Preserving a replicate of M
j
before exclusive of overlap. When overlap happens, then the overlap is removed from M
j
. Determining the ending location of M
j
in T and Q. Determining the distance (number of characters) between M
i
and M
j
. Determining the penalty for the mismatches and gaps among M
i
and Determining AS When expanding M
j
to the alignment finish in M
i
provides Regenerating the value of M
j
before exclusion so that M
j
is utilized in another alignment expansion. Searching for the MEM having the maximum A
j
which is the final MEM in the alignment (M
e
). The maximum score (A
e
) is obtained as A that is the maximum AS for the considered PSs. The MEM index which increases A
j
is accumulated in e to initiate backtracking from M
e
. In the configuration, the direct preceding MEM to M
e
increases the AS for M
e
. This MEM index is accumulated in W [e]. So, the iteration of f ← W (f) explores the index of every MEM in the alignment. If W (f) is equivalent to –1, M
f
is the initial MEM in the configuration and the iteration is ended.
W [j] merely maintains a particular index for M
j
. But, there may be situations in which many MEMs increase A
j
, i.e.
Algorithm 2: Determination of Global Alignment Score using Adaptive Programming Scheme with MEM Extraction
//Initialize the variables
A j ← L j × R x ;
W [j] ← -1;
//Recursion
TEMP ← M j ;
ET j ← (ST j + L j ) - 1;
EQ j ← (SQ j + L j ) - 1;
LT j ← (ST j - ET j ) - 1;
LQ j ← (SQ j - EQ j ) - 1;
W [j] ← i;
M j ← TEMP;
//Recognize the alignment from every determined alignment
A ← A1; e ← 1;
A ← A j ;
e ← j;
//Backtracking the alignment
f ← e;
f ← W [f];
After completing the adaptive programming scheme, the determined global AS is combined with the local AS obtained by the SOFM-SDSW algorithm to get the final AS. This resultant score is later trained by the SVM classifier to predict the PRH and recognize fold properly.
Assume X = x1x2 … x
n
and Y = y1y2 … y
m
be the PSs to be aligned, where n and m are the lengths of X and Y correspondingly; Create a SOFM matrix from adjacent amino acid substrings; Using PVS, decide on a set of columns in SOFM; Get the local arrangement through constructing SOFM
X
and SOFM
Y
and their substitution matrix S
XY
; Get the feature vector V
xy
(P) which is the concatenation of SOFM columns of the window subsequence; Execute the DCNN-based decision-making framework to get the local AS; Apply an adaptive programming scheme to extract MEMs and determine the global AS; Combine the local and global ASs to create a final scoring matrix; Refine the PS alignment based on the resultant AS matrix; Apply MSA to the best PS alignment to obtain a refined SOFM and AS; Train the SVM using the final scoring matrix for predicting the PRH and recognizing fold.
This part analyzes the efficiency of the SOFM-SDGSW algorithm and evaluates it with the existing algorithms. The evaluation is performed regarding the Receiver Operating Characteristics (ROC) and ROC50. For this evaluation, the Structural Classification of Proteins (SCOP) 1.53 [33], SCOP 1.67 [34] and superfamily databases [35] are taken. The SCOP 1.53 contains 4532 proteins from 54 genres and the SCOP 1.67 contains 4019 proteins from 102 genres. Similarly, the superfamily encloses 1195 folds of 1962 super-genres. Superfamily is an annotated collection of every PS morphological and chemical description. It is built on a set of HMM that describes structural protein motifs at the SCOP superfamily class. The description is created by comparing PS from over 2478 completely genetic markers to HMM. From these databases, 65% of data are considered for training and the remaining 45% are taken for testing. For comparative analysis, different existing state-of-the-art methods such as SOFM-Top [18], SOFM-SW [18], SOFM-SMSW [19], SOFM-SDSW [20], SVM-Ensemble [21], CONVERT [28] and SMI-BLAST [32] are considered.
Accuracy,Precision,Recall,ROC,and ROC50 performance measures are used to compare the existing and proposed algorithms.
Table 1 provides the accuracy results obtained by the proposed and existing PS alignment algorithms for PRHR executed on SCOP 1.53, SCOP 1.67 and superfamily databases.
Accuracy (%) of proposed and existing PS alignment algorithms for PRHR
Accuracy (%) of proposed and existing PS alignment algorithms for PRHR
Figure 3 displays the accuracy of different PS alignment algorithms for PRHR using 3 varieties of databases. For SCOP 1.53 database, the accuracy of proposed is 14.63% greater than the SOFM-Top, 9.3% greater than the SOFM-SW, 5.62% greater than the SOFM-SMSW and 2.17% greater than the SOFM-SMSW-DCNN algorithms. For SCOP 1.67 database, the accuracy of proposed is 16.25% greater than the SOFM-Top, 10.71% greater than the SOFM-SW, 5.68% greater than the SOFM-SMSW and 2.2% greater than the SOFM-SMSW-DCNN algorithms. Additionally, for superfamily database, the accuracy of proposed is 8.24% greater than the SOFM-Top, 5.75% greater than the SOFM-SW, 4.55% greater than the SOFM-SMSW and 2.22% greater than the SOFM-SMSW-DCNN algorithms.

Accuracy vs. databases.
Table 2 provides the results of precision determined by the proposed and existing PS alignment algorithms tested using 3 distinct databases.
Precision (%) of proposed and existing PS alignment algorithms for PRHR
Figure 4 portrays the precision of various PS alignment algorithms executed on 3 different databases for PRHR. For SCOP 1.53 database, the precision of proposed is 14.46% larger than the SOFM-Top, 11.76% larger than the SOFM-SW, 5.56% larger than the SOFM-SMSW and 2.15% larger than the SOFM-SMSW-DCNN algorithms. For SCOP 1.67 database, the precision of proposed is 18.99% larger than the SOFM-Top, 13.25% larger than the SOFM-SW, 6.82% larger than the SOFM-SMSW and 2.17% larger than the SOFM-SMSW-DCNN algorithms. For superfamily database, the precision of proposed is 13.25% larger than the SOFM-Top, 10.59% larger than the SOFM-SW, 9.3% larger than the SOFM-SMSW and 6.82% larger than the SOFM-SMSW-DCNN algorithms.

Precision vs. databases.
Table 3 provides the recall values obtained by the proposed and existing PS alignment algorithms executed on SCOP 1.53, SCOP 1.67 and superfamily databases for PRHR.
Recall (%) of proposed and existing PS alignment algorithms for PRHR
Figure 5 depicts the recall values of various PS alignment algorithms executed on 3 different databases for PRHR. For SCOP 1.53 database, the recall of proposed is 14.29% higher than the SOFM-Top, 10.34% higher than the SOFM-SW, 5.49% higher than the SOFM-SMSW and 2.13% higher than the SOFM-SMSW-DCNN algorithms. For SCOP 1.67 database, the recall of proposed is 18.75% higher than the SOFM-Top, 13.1% higher than the SOFM-SW, 6.74% higher than the SOFM-SMSW and 2.15% higher than the SOFM-SMSW-DCNN algorithms. For superfamily database, the recall of proposed is 9.64% higher than the SOFM-Top, 8.33% higher than the SOFM-SW, 4.6% higher than the SOFM-SMSW and 2.25% higher than the SOFM-SMSW-DCNN algorithms.

Recall vs. databases.
Table 4 lists the ROC and ROC50 values obtained by the proposed and existing PS alignment algorithms executed on SCOP 1.53, SCOP 1.67 and superfamily databases for PRHR.
ROC and ROC50 for proposed & existing PS alignment algorithms for PRHR
Figure 6 exhibits the ROC & ROC50 values for the proposed and existing PS alignment algorithms tested on the SCOP 1.53 database. For SCOP 1.53 database, the ROC of SOFM-SDGSW is 22.94% greater than the SVM-Ensemble, 20.5% greater than the CONVERT, 19.31% greater than the SMI-BLAST, 18.15% greater than the SOFM-Top, 5.32% greater than the SOFM-SW, 3.08% greater than the SOFM-SMSW and 1.15% greater than the SOFM-SDSW. Also, the ROC50 of SOFM-SDGSW is 40.96% greater than the SVM-Ensemble, 37.42% greater than the CONVERT, 33.39% greater than the SMI-BLAST, 31.88% greater than the SOFM-Top, 14.55% greater than the SOFM-SW, 4.07% greater than the SOFM-SMSW and 1.61% greater than the SOFM-SDSW.

Evaluation of ROC & ROC50 for proposed and existing PS alignment algorithms on SCOP 1.53 database.
Figure 7 exhibits the ROC & ROC50 values for the proposed and existing PS alignment algorithms tested on the SCOP 1.67 database. For the SCOP 1.67 database, the ROC of SOFM-SDGSW is 17.48% greater than the SVM-Ensemble, 15.6% greater than the CONVERT, 14.48% greater than the SMI-BLAST, 12.83% greater than the SOFM-Top, 10.45% greater than the SOFM-SW, 3.07% greater than the SOFM-SMSW and 1.4% greater than the SOFM-SDSW. Also, the ROC50 of SOFM-SDGSW is 17.34% greater than the SVM-Ensemble, 13.1% greater than the CONVERT, 11.02% greater than the SMI-BLAST, 9.3% greater than the SOFM-Top, 8.05% greater than the SOFM-SW, 5.49% greater than the SOFM-SMSW and 1.44% greater than the SOFM-SDSW. According to these outcomes, it is certified that the proposed algorithm attains better ROC & ROC50 compared to all other algorithms executed on the SCOP 1.53 & SCOP 1.67 databases for PRHR. This is achieved due to both local and global alignment with the help of an adaptive programming scheme.

Evaluation of ROC & ROC50 for proposed and existing PS alignment algorithms on SCOP 1.67 database.
Figure 8 illustrates the ROC & ROC50 values for the proposed and existing PS alignment algorithms executed on the superfamily database. It observes that the ROC of SOFM-SDGSW is 15.09% greater than the SVM-Ensemble, 13.97% greater than the CONVERT, 11.8% greater than the SMI-BLAST, 11.4% greater than the SOFM-Top, 8.19% greater than the SOFM-SW, 7.69% greater than the SOFM-SMSW and 1.52% greater than the SOFM-SDSW. Similarly, the ROC50 of SOFM-SDGSW is 16.37% greater than the SVM-Ensemble, 13.76% greater than the CONVERT, 10.54% greater than the SMI-BLAST, 8.18% greater than the SOFM-Top, 3.99% greater than the SOFM-SW, 2.5% greater than the SOFM-SMSW and 1.53% greater than the SOFM-SDSW.

Evaluation of ROC & ROC50 for proposed and existing PS alignment algorithms on superfamily database.
Thus, it is summarized that the SOFM-SDGSW algorithm establish a better performance for predicting protein homology and recognizing folds compared to other existing algorithms. This is because of considering the structural alignments of PSs and finding MEMs adaptively. Also, it determines the alignment score in both locally and globally to improve the efficiency of PRHR.
In this article, the SOFM-SDGSW algorithm was developed for PRHR. This algorithm uses both local and global ASs using the SW scheme. These scores can be effectively learned by the SVM for PRHR. To analyze the efficiency of this algorithm, 3 different benchmark databases were considered. The experimental outcomes of this algorithm applied to the SCOP 1.53, SCOP 1.67 and Superfamily databases reveal its efficiency for PRHR. It is proved that the SOFM-SDGSW algorithm has an ROC of 0.97, 0.941 and 0.938 for SCOP 1.53, SCOP 1.67 and Superfamily databases, correspondingly than the other PRHR models. Similarly, the SOFM-SDGSW algorithm has an ROC50 of 0.819, 0.846 and 0.86 for SCOP 1.53, SCOP 1.67 and Superfamily databases, respectively compared to the other PRHR models. In the future, an advanced deep learning algorithm will be employed to learn the alignment scores efficiently by choosing the most relevant PSs and MEMs.
