In Silico Tools and Approaches for the Prediction of Functional and Structural Effects of Single-Nucleotide Polymorphisms on Proteins: An Expert Review

Abstract

Single-nucleotide polymorphisms (SNPs) are single-base variants that contribute to human biological variation and pathogenesis of many human diseases. Among all SNP types, nonsynonymous single-nucleotide polymorphisms (nsSNPs) can alter many structural, biochemical, and functional features of a protein such as folding characteristics, charge distribution, stability, dynamics, and interactions with other proteins/nucleotides. These modifications in the protein structure can lead nsSNPs to be closely associated with many multifactorial diseases such as cancer, diabetes, and neurodegenerative diseases. Predicting structural and functional effects of nsSNPs with experimental approaches can be time-consuming and costly; hence, computational prediction tools and algorithms are being widely and increasingly utilized in biology and medical research. This expert review examines the in silico tools and algorithms for the prediction of functional or structural effects of SNP variants, in addition to the description of the phenotypic effects of nsSNPs on protein structure, association between pathogenicity of variants, and functional or structural features of disease-associated variants. Finally, case studies investigating the functional and structural effects of nsSNPs on selected protein structures are highlighted. We conclude that creating a consistent workflow with a combination of in silico approaches or tools should be considered to increase the performance, accuracy, and precision of the biological and clinical predictions made in silico.

Introduction

Genetic similarity of two different human beings is almost 99%, yet the differences are the cause of observed diseases most of the time (Cooper et al., 1985; Karczewski et al., 2020; Kwok and Chen, 2003). According to deep sequencing analysis, the difference between typical genome and reference genome is 4.1 million to 5.0 million bases (1000 Genomes Project Consortium, 2015). These pronounced differences are due to variations and mutations in the protein structures (Lek et al., 2016). Analysis of protein-coding variants is important for the knowledge of medicine and biology; thus, clinical and functional interpretations of variants are critical sources for human diseases (Cassa et al., 2017).

There exists many variation types, including single-nucleotide polymorphisms (SNPs) or single-nucleotide variants (SNVs), insertions and deletions, structural variants, repeat variations, and copy number variations (Alzu'bi et al., 2019). Generally, point mutations and SNPs refer to the same context in some cases, leading to confusions in the field. Mutations can be defined as “any heritable change to the DNA sequence,” while variations can be defined as “the differences from the reference genome or sequence” in modern human genetics (Jackson et al., 2018).

In another point of view, mutation is referred as a rarer change than SNP in the nucleotide sequence, while SNP is referred as a variation in DNA sequence of a population that is observed at the frequency of 1% or over (1000 Genomes Project Consortium, 2010, 2015; Brookes, 1999; Condit et al., 2002; Telenti et al., 2016).

SNPs are the main cause of the above mentioned 1% difference among the human population, while most of them provide diversity of humankind without any biological effect (Collins et al., 1997; Guo et al., 2019). On the other hand, some SNPs have crucial biological, functional, and structural effects such as gene expression, drug response, and disease susceptibility (Chakravarti, 2001; Zou et al., 2020). In general, SNPs are categorized as synonymous single-nucleotide polymorphisms (sSNPs) and nonsynonymous single-nucleotide polymorphisms (nsSNP).

In nsSNP, single base substitution alters encoded amino acids; whereas in sSNP, single base substitution does not change encoded amino acids. Both nsSNPs and sSNPs can exhibit neutral or negative effects for phenotype, which would be interpreted as a neutral or a damaging mutation in the protein structure, respectively (Wang and Moult, 2001). Missense point mutations that change encoded amino acids like nsSNPs can occasionally alter general protein characteristics especially in terms of biochemical properties such as stability, interaction, and dynamics (Kucukkal et al., 2015). Disease causing mutations or nsSNPs can also cause crucial alterations in the physicochemical features of amino acids such as hydrophobicity, charge, and geometry (Chaturvedi and Mahalakshmi, 2013; Lori et al., 2013).

Another important concept for prediction of functional and structural effects of variations on protein is private mutation. Private mutation is a rare gene mutation that usually occurs in a single family or a small population (Gao and Keinan, 2014). Rare genetic variants like private mutations are the result of rapid population growth and increase the burden of spontaneous mutations (Gazave et al., 2014).

Functional and molecular studies for private mutations have been performed to assess the functional and structural effects of these mutations, but difficulties in mapping, increased heterogeneity, and abundance of mutations lead to prediction improvements in the field of computational methods and in silico tools for prediction (Berger et al., 2016; Kim et al., 2016; Pangallo et al., 2020; Starita et al., 2018). Therefore, functional and structural effect prediction through in silico tools for private mutations is one of the best strategies to assess phenotypic effects.

This expert review consists of an examination and synthesis of the literature on the functional and structural effects of nsSNPs and pathogenicity of disease-related variants, followed by computational tools and algorithms for predicting these effects of variants (Fig. 1). Finally, we summarized and classified some case studies related to the investigation of nsSNPs on protein structures.

FIG. 1.

A graphical representation for the assessment and prediction of functional and structural effects of SNPs using in silico tools. SNP, single-nucleotide polymorphism.

Structural and Functional Effects of nsSNPs on Proteins

nsSNPs exhibit structural impacts on proteins due to the alteration of amino acids with smaller or larger ones that lead to the formation of voids and clashes (Yue et al., 2005), resulting in possible structural and thermodynamic destabilizations in the structure (Stitziel et al., 2003). These disturbances in the buried and core regions of a protein can cause harmful effects on the residue packing level (Vitkup et al., 2003).

In a recent study, researchers have found that variation of hydrophobic residues with charged amino acids in the protein core cannot be tolerated and can lead to destabilization of protein structures, while mutating small residues to larger ones can cause steric clashes (Yue et al., 2005). In another work, nsSNPs resulted in alterations in the proteins' charge distributions causing crucial changes in the pH dependence and catalysis, especially in the enzymes (Stefl et al., 2013).

To understand the genotype-phenotype relationship in terms of the effect of SNPs, computational and experimental approaches are widely used (Chasman and Adams, 2001; Yates and Sternberg, 2013). Computational studies are gaining more popularity as experimental studies are laborious, expensive, and time-consuming (Shen et al., 2006). However, even in the presence of high-quality 3D protein structures, predicting the effects and phenotypes of nsSNPs can be challenging for computational biophysicists and bioinformaticians too (Ittisoponpisan et al., 2019; Kucukkal et al., 2014).

Categorizing SNPs and/or mutations in terms of them being harmless or disease causing is not straightforward by means of protein dynamics (Nussinov and Tsai, 2013). On the other hand, consequences of these mutations are a reflection of the changes in protein dynamics that affect the function as well as the level of alterations in protein motion (Motlagh et al., 2014). Nevertheless, disease-causing or functional variants can exhibit harmful or neutral effects on the structure of a protein (Capriotti et al., 2009) such as protein structure destabilization, gene regulation, and alteration (Barroso et al., 1999), which influence protein charge (Petukh et al., 2015), geometry (Petukh et al., 2015), hydrophobicity (Petukh et al., 2015), stability (Chasman and Adams, 2001), dynamics (Kucukkal et al., 2015), and interprotein/intraprotein interactions (Zhao et al., 2014).

One of the metrics of disease-causing SNPs is the amino acid type that is being substituted. In their studies, Vitkup et al. (2003) and David and Sternberg (2015) indicated that variations from tryptophan and cysteine (Cys) residues have a higher chance of leading to a disease, while variations from arginine and glycine increase the genetic disease tendency by 30%. The reason of these observations lies in the nature of the amino acids. These amino acids have critical roles in various structural events of a protein, such as protein flexibility and formation of disulfide bond, hydrogen bond, salt bridge, and hydrophobic core. Thus, structural integrity and biological functions of a protein are mainly disrupted upon a change in these critical residues.

Among amino acid variations, Cys variation is one of the most disease-causing cases since this amino acid is considered to be both hydrophilic and hydrophobic at the same time (Betts and Russell, 2003). In addition, disulfide bond formation ability and metal binding capacity (mainly Zn²⁺) of Cys emphasize its importance in protein folding, stability, and function (Pace and Weerapana, 2013, 2014).

Another important phenomenon regarding the effect of nsSNPs is reflected on the differences in the binding free-energy (ΔΔG) of a wild-type and mutant protein structure (Yates and Sternberg, 2013). SKEMPI (Structural Database of Kinetics and Energetics of Mutant Protein Interactions) is a web-based database consisting of binding free energy difference values of 85 wild-type and mutant protein–protein complexes (Moal and Fernández-Recio, 2012). In the case of missing binding energy values of the variants, FoldX can also be used to evaluate free energy of an interaction (Schymkowitz et al., 2005). This technique can also be utilized for a quantitative estimation of SNP effect in terms of protein–protein interactions (Guerois et al., 2002).

Structural unity of a protein was also evaluated through measurement of stability changes upon variations (Kucukkal et al., 2015). Quantification of free folding energy indicates the thermodynamic stability of a protein, which is formed as a result of cumulative improvements from many structural parameters, including H-bonds and salt bridges. Variations like nsSNPs generally alter the energy landscape and the amount of conformations in both folded and unfolded states of a protein (Bartlett and Radford, 2009).

Besides SNPs, structural variations of base triplets, including deletions, insertions, and duplications, can affect protein structure and function by addition or deletion of amino acids. Structural variant term can be described as a part of DNA that demonstrates alterations in copy number (deletions, insertions, and duplications), inversions, or chromosomal locations (translocations) (Escaramís et al., 2015). In many human diseases, structural variants exhibit important functional consequences; thus, human disease studies have been focused on these variants to gain valuable insights about diseases (Weischenfeldt et al., 2013; Yokoyama and Kasahara, 2020).

Human Variation Databases

One of the important aspects of bioinformatics for analysis of variations are databases. Databases are utilized for collection, curation, organization, and analysis of biological data that are available as online sources (Ganesan et al., 2019; Savas, 2010). At the beginning, mutations and variations were preferred to be reported only in published literature; however, it is realized that creating online variation databases provide accessibility and reduced ambiguities and complexity of biological variation data (Higasa et al., 2016; Küntzer et al., 2010).

Especially, enormous amount of sequencing data have been produced through next-generation sequencing (NGS); thus, many variation databases have been developed to collect and organize biological data from NGS (Brown and Tastan Bishop, 2017). There are many human variation databases available online and it has been summarized in Table 1.

Table 1.

Human Variation Databases Accessible Online

Database	Website	Description	Reference article
ClinVar	https://www.ncbi.nlm.nih.gov/clinvar/	Clinically significant variations	Landrum et al. (2018)
COSMIC	https://cancer.sanger.ac.uk/cosmic	Cancer-associated somatic mutations	Tate et al. (2019)
dbSNP	https://www.ncbi.nlm.nih.gov/snp/	Known short variations	Sherry et al. (2001)
dbNSFP	https://sites.google.com/site/jpopgen/dbNSFP	Human nonsynonymous SNVs	Liu et al. (2016)
dbVAR	https://www.ncbi.nlm.nih.gov/dbvar/	Structural variation collection	Lappalainen et al. (2015)
EVA	https://www.ebi.ac.uk/eva/	Human variation archive	Cook et al. (2016)
HGMD	www.hgmd.cf.ac.uk/ac/index.php	Disease-related germline mutations	Stenson et al (2020)
HGVD	www.hgvd.genome.med.kyoto-u.ac.jp/index.html	Japanese human variation database	Higasa et al. (2016)
NHGRI-EBI Catalog (GWAS)	https://www.ebi.ac.uk/gwas/home	Catalog of published genome-wide association studies	Buniello et al. (2019)
OMIM	https://www.omim.org/	Human genes and genetic disorders along with human variations	Scott et al. (2014)
Ensembl	https://www.ensembl.org/index.html	Comprehensive biological platform and database, including human variations	Hubbard et al. (2002)
dbGaP	https://www.ncbi.nlm.nih.gov/gap/	Collections of associations between genotype and phenotype in diseases	Mailman et al. (2007)
TCGA	https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga	Cancer genomics platform and human mutations	McLendon et al. (2008)
LOVD	https://www.lovd.nl/	Locus-specific variation database	Fokkema et al. (2011)
The 1000 Genomes Project	www.internationalgenome.org/	Online platform that consists of human variation datasets	1000 Genomes Project Consortium et al.
HuVarBase	https://www.iitm.ac.in/bioinfo/huvarbase	Human variations with information at gene and protein levels	Ganesan et al. (2019)

COSMIC, Catalog of Somatic Mutations In Cancer; EVA, European Variation Archive; HGMD, Human Gene Mutation Database; HGVD, Human Genetic Variation Database; LOVD, Leiden Open-source Variation Database; NHGRI-EBI GWAS, National Human Genome Research Institute-European Bioinformatics Institute Genome-Wide Association Studies; OMIM, Online Mendelian In Man; SNV, single-nucleotide variant; TCGA, The Cancer Genome Atlas.

Among all human variation databases, ClinVar (Landrum et al., 2018) is the most well-known database that has been created and curated by National Center for Biotechnology Information (NCBI). ClinVar is a freely accessible human variation database that contains consequences of clinical significance of variations and mainly focuses on the association between disease and genotype. Another important human variation database is Catalog Of Somatic Mutations In Cancer (COSMIC) (Tate et al., 2019). COSMIC is a free online available database that consists of somatic mutations and their effects on human cancer. dbSNP (Sherry et al., 2001) is also NCBI-curated free online database for main collection of all known short variations.

In addition to these, there are also specialized databases for specific areas. dbNSFP (Liu et al., 2016) is a database that contains human nonsynonymous single-nucleotide variants (nsSNVs) and their functional predictions and annotations. dbVar (Lappalainen et al., 2013) is a specialized NCBI-curated database that consists of structural variations, including deletions and insertions. dbGaP (Mailman et al., 2007) is another NCBI-curated database that collects the associations between genotype and phenotype in diseases.

The European Bioinformatics Institute (EBI) also had various variation databases in the past, including the Database of Genomic Variants archive (DGVa) (Lappalainen et al., 2013) and the European Genome-phenome Archive (EGA) (Lappalainen et al., 2015). Nowadays, these databases have been integrated into one database known as the European Variation Archive (EVA) (Cook et al., 2016). EVA is an open-access human variation archive that also collaborates with different variation databases and platforms such as dbSNP, dbVar, and Ensemble. EBI, along with National Human Genome Research Institute (NHGRI), has a manually curated catalog of published genome-wide association studies known as NHGRI-EBI GWAS Catalog (Buniello et al., 2019).

HGMD (Human Gene Mutation Database) (Stenson et al., 2020) is an online collection of germline mutations in nuclear genes, which is closely associated with human inherited diseases in published cases. For specific populations, HGVD (Human Genetic Variation Database) (Higasa et al., 2016) is an online Japanese human variation database and a collection of associations between transcriptomics and variations. Some databases or platforms also consist of human variations along with other purposes such as Online Mendelian In Man (OMIM) (Scott et al., 2014) and Ensemble. OMIM is an online database and platform that consists of association variation of distinct phenotypes. Ensembl (Hubbard et al., 2002) is an online platform and database that stores the human variation data and incorporates various human variation databases, including dbSNP, ClinVar, COSMIC, and OMIM.

Another important platform and database is The Cancer Genome Atlas (TCGA) (McLendon et al., 2008). TCGA mainly focuses on variations in different cancer types and is a cancer genomics platform that contains genomic, epigenomic, transcriptomic, and proteomic data. Leiden Open-source Variation Database (LOVD) (Fokkema et al., 2011) is a web-based platform and locus-specific variation database, which contains gene variation sequence data from patients. The 1000 Genomes Project (1000 Genomes Project Consortium, 2015) is also an online platform that consists of human variation datasets derived from whole-genome sequencing methods. HuVarBase (Ganesan et al., 2019) is an online comprehensive human variant database that integrates gene- and protein-level information with sequence and structure properties of the variations.

Tools for Investigating Functional Effects of nsSNPs

Four main bioinformatics methodologies have been used to evaluate and understand the functional effects of nsSNPs and these are as follows: sequence homology-based, supervised learning-based, sequence-structure-based, and consensus-based tools (Table 2). However, each technique has some limitations in terms of defining the effect of the variants on protein dynamics. Hence molecular dynamic (MD) simulations, which enable a much more detailed structural investigation, have gained attention for evaluating the effects of these changes in terms of motion, protein flexibility, and secondary structure elements (Marcolino et al., 2016).

Table 2.

In Silico Tools for the Prediction and Evaluation of Functional Effects of Nonsynonymous Single Nucleotide Polymorphisms on Proteins

Tool name	Website	Input	Method name	Availability	Reference article
PolyPhen-2	http://genetics.bwh.harvard.edu/pph2/	RS ids or UniProt protein number, amino acid change	Sequence and structure based	Active	Adzhubei et al. (2010)
MuD	http://mud.tau.ac.il/	Unknown	Sequence and structure based	Inactive	Wainreb et al. (2010)
Fathmm	http://fathmm.biocompute.org.uk/	Rs ids or protein substitution	Sequence and structure based	Active	Shibab et al. (2013)
SNPs3D	www.snps3d.org/	Rs ids	Sequence and structure based	Active	Yue et al. (2006)
Align GVGD	http://agvgd.hci.utah.edu/	Fasta sequences, amino acid substitutions	Sequence and structure based	Active	Tavtigian et al. (2008)
CADD	https://cadd.gs.washington.edu/	Vcf file	Sequence and structure based	Active	Rentzsch et al. (2019)
SNPeffect	https://snpeffect.switchlab.org/	Fasta sequence, PDB file or PDB ID, UniProt ID	Sequence and structure based	Active	De Baets et al. (2012)
VAPOR	https://huma.rubi.ru.ac.za/#vapor	FASTA sequence and Mutation ID	Consensus based	Active	Brown et al. (2018)
Meta-SNP	http://snps.biofold.org/meta-snp/	FASTA sequence, Mutations	Consensus based	Active	Capriotti et al. (2013)
PredictSNP	https://loschmidt.chemi.muni.cz/predictsnp/	Chromosome number, genomic position, reference allele, mutation allele	Consensus based	Active	Bendl et al. (2014)
SIFT	https://sift.bii.a-star.edu.sg/	Rs ids or chromosome, coordinate, orientation and alleles	Sequence homology based	Active	Vaser et al. (2016) and Sim et al. (2012)
Provean	http://provean.jcvi.org/index.php	Rs ids or chromosome, position, reference allele, variant allele	Sequence homology based	Active	Choi and Chan (2015)
Mutation Assessor	http://mutationassessor.org/r3/	Genome build, chromosome, position, reference allele, substituted allele or protein ID variant, text	Sequence homology based	Active	Reva et al. (2007)
Panther	www.pantherdb.org/	FASTA sequence and substitutions	Sequence homology based	Active	Mi et al. (2019)
SNAP	https://www.rostlab.org/services/SNAP/	FASTA sequence and substitutions	Supervised learning based	Active	Bromberg and Rost (2007)
PhD-SNP	http://snps.biofold.org/phd-snp/phd-snp.html	FASTA sequence, swiss-prot code, residue position, new residue	Supervised learning based	Active	Capriotti et al. (2006)
SuSPect	www.sbg.bio.ic.ac.uk/suspect/	UniProt ID, amino acid change	Supervised learning based	Active	Yates et al. (2014)
MutPred2	http://mutpred.mutdb.org/	FASTA sequence	Supervised learning based	Active	Pejaver et al. (2017)
EFIN	http://paed.hku.hk/efin/	Rs ids	Supervised learning based	Active	Zeng et al. (2014)
ParePro	www.mobioinfor.cn/parepro/	Unknown	Supervised learning based	Inactive	Tian et al. (2007)
SNPs&GO	http://snps.biofold.org/snps-and-go/snps-and-go.html	Fasta sequence, Swiss-Prot Code, GO terms, Mutations	Supervised learning based	Active	Calabrese et al. (2009)
PON-P2	http://structure.bmc.lu.se/PON-P2/	Protein/Gene identifier(s) and variation(s)	Supervised learning based	Active	Niroula et al. (2015)
REVEL	https://sites.google.com/site/revelgenomics/	Chromosome number, genomic position, reference allele, mutation allele	Supervised learning based	Active	Ioannidis et al. (2016)
ClinPred	https://sites.google.com/site/clinpred/	Chromosome number, genomic position, reference allele, mutation allele	Supervised learning based	Active	Alirezaie et al. (2018)
CRAVAT	www.cravat.us/CRAVAT/	Vcf file or chromosome number, genomic position, reference allele, mutation allele	Supervised learning based	Active	Masica et al. (2017)
Rhapsody	http://rhapsody.csb.pitt.edu/	Uniprot coordinates or PDB files	Supervised learning based	Active	Ponzoni et al. (2020b)

GO term, gene ontology term; PDB, Protein Data Bank; rs ID, reference SNP cluster ID; VAPOR, Variant Analysis Portal; VCF, variant call format.

Most of the time, structural analysis of disease-causing variants is observed to have alterations in salt bridge formation and hydrogen bonding network (Petukh et al., 2015). Thermodynamic analysis from computational and experimental studies has also demonstrated that nsSNPs may lead to destabilized protein structure, function, and interactions (Brock et al., 2007). Stability decrease in mutant proteins can be analyzed by searching energy alteration databases for prediction and detection of probable effects of disease-related mutations.

Sequence-structure-based tools

Sequence-structure-based tools employ sequence-based properties as well as structural information to determine functional pathogenicities of nsSNPs. Sequence features such as evolutionary conservation score and homologous sequence score are combined mainly with structural properties, secondary structure information, accessible surface area of mutated residue, and protein stability (Kulshreshtha et al., 2016). There are many sequence-structure-based tools such as PolyPhen-2 (Adzhubei et al., 2010), MuD (Wainreb et al., 2010), FATHMM (Shihab et al., 2013), SNPs3D (Yue et al., 2006), CADD (Rentzsch et al., 2019), and SNPeffect (De Baets et al., 2012).

PolyPhen-2 (Adzhubei et al., 2010) is developed for the classification of disease-related variations according to three structural features and eight sequence-based predictive properties by naive Bayes theorem parameters. MuD (Wainreb et al., 2010) is a web-based sequence-structure prediction tool that can be used for separating functionally neutral and non-neutral variants and has a total of 14 novel or traditional sequence-structure features such as secondary structure assignment, number of sequences in the alignment, stability prediction change, solvent accessibility, and oligomerization interface.

FATHMM (Shihab et al., 2013) is a hidden Markov method-based prediction software and server that utilizes parameters, including conservation score (sequence) along with the conserved protein domain families (structure) to measure the pathogenicity weight score of SNPs. SNPs3D (Yue et al., 2006) modules are developed for the prediction of functional effects of SNPs and utilize protein folding state changes upon amino acid variations and amino acid sequence conservation scores.

CADD (Rentzsch et al., 2019) is another prediction tool that creates a scoring algorithm from evolutionary constraints, gene model annotations, epigenetic measurements, surrounding sequence context, and functional predictions. SNPeffect (De Baets et al., 2012) database consists of sequence- and structure-based prediction tools that use amyloid prediction (WALTZ) (Maurer-Stroh et al., 2010), aggregation prediction (TANGO) (Fernandez-Escamilla et al., 2004), chaperone-binding prediction (LIMBO) (Van Durme et al., 2009), and protein stability analysis (FoldX) (Schymkowitz et al., 2005).

Sequence homology-based tools

The underlying idea of sequence homology-based tools is the usage of sequence homology in terms of sequence conservation scores to define the deleterious effects of nsSNPs or mutations. This phenomenon depends on the concept that highly conserved parts of genomes are more crucial in protein function (Reva et al., 2011). SIFT (Sim et al., 2012; Vaser et al., 2016), PROVEAN (Choi and Chan, 2015), Mutation Assessor (Reva et al., 2007), and PANTHER (Mi et al., 2019; Tang and Thomas, 2016) are among the widely used sequence homology-based tools.

SIFT algorithm (Sim et al., 2012) specifically defines the relationship between variations and protein function by predicting the effect of SNPs through sequence-homology methodology. In this algorithm, sequence from a query protein is searched for homologous sequences (Vaser et al., 2016). Conservation scores are calculated and normalized according to composition of amino acids, while variations are predicted for their functional effects on protein pathogenicity. SIFT and PROVEAN have similar principles based on computing conservation scores of homologous amino acid sequences. PROVEAN (Choi and Chan, 2015) can also be used for the prediction of in-frame insertion and deletion effects, besides variation.

In PANTHER (Mi et al., 2019; Tang and Thomas, 2016), evolutionary preservation metric is used for the prediction of deleterious effects instead of evolutionary conservation score. Evolutionary preservation metric is based on the manifestation of negatively selected variants in avoiding evolutionary change at a particular region of a protein (Tang and Thomas, 2016). Mutation Assessor (Reva et al., 2007) is another tool in this category that utilizes information-based assessment of evolutionary conservation motifs in multiple sequence alignments instead of conservation score measurements.

Supervised learning-based tools

Supervised learning methodology or machine learning techniques, such as neural networks (NNs), random forests (RF), and support vector machines (SVM), can be used for the prediction of functional effects of SNPs (Zhao et al., 2014). These methods are very practical as they can be employed in the analysis of large datasets. Supervised learning prediction methods generally utilize training datasets that consist of known effects to train the algorithm before prediction (Mishra et al., 2019).

In training datasets, labels related with input data are created and these labels are used to identify the predictive patterns present in these data (Camacho et al., 2018). In this case, training datasets are created by prediction; thus, the results are directly related and depended on training datasets. There are many supervised learning-based tools such as SNAP (Bromberg and Rost, 2007), PhD-SNP (Capriotti et al., 2006), SuSPect (Yates et al., 2014), MutPred2 (Pejaver et al., 2017), ParePro (Tian et al., 2007), EFIN (Zeng et al., 2014), SNPs&GO (Calabrese et al., 2009), PON-P2 (Niroula et al., 2015), REVEL (Ioannidis et al., 2016), ClinPred (Alirezaie et al., 2018), and CRAVAT (Masica et al., 2017).

NNs are combination of algorithms inspired by human neural system that aim to recognize specific motifs. In NNs, units or neurons are interconnected with each other and each neuron transmits information into the next related one (Chen and Siu, 2020; Nicholls et al., 2020). Nodes, neurons, or units in NNs acquire many input signals and produce an active response by weighted sum of input data and a nonlinear activation function. Also, each neuron or unit conveys output signals to next connected neuron or unit (Lo et al., 2018). All units can be organized as different levels by forming multilayer network structures (NNs). Organization of nonlinear units can allow NNs to learn complex input data (Baskin et al., 2016).

NNs have wide applications, including recognizing handwritten numbers (Cohen et al., 2017), quantum chemistry (Balabin and Lomakina, 2009), 3D object reconstruction (Choy et al., 2016), and medical diagnosis (Lyons et al., 2016). Among supervised learning-based tools, SNAP (Bromberg and Rost, 2007) and MutPred (Pejaver et al., 2017) are NN-derived tools. SNAP (Bromberg and Rost, 2007) can predict functional consequences of SNPs through a set of NN algorithms by integrating information from residue conservation score, protein structure elements, and other significant parameters. On the other hand, MutPred (Pejaver et al., 2017) combines the genetic and molecular information parameters for the prediction of functional pathogenicity of SNPs.

Another supervised learning method that is used for functional effect prediction of nsSNPs on proteins is RF or random decision forests. RF is a nonparametric machine learning methodology that combines the concept of nearest neighbors on efficient data analysis (Breiman, 2019). EFIN (Zeng et al., 2014), PON-P2 (Niroula et al., 2015), REVEL (Ioannidis et al., 2016), ClinPred (Alirezaie et al., 2018), Rhapsody (Ponzoni et al., 2020b), and CRAVAT (Masica et al., 2017) are RF-based supervised learning tools for pathogenicity prediction of nsSNPs. Among these tools, EFIN (Zeng et al., 2014) and PON-P2 (Niroula et al., 2015) both use evolutionary conservation scores.

Algorithm of EFIN is designed by covering many homologous protein sequence clusters according to evolutionary gap, whereas PON-P2 utilizes biochemical and physical features of amino acids and gene ontology (GO) terms alongside the conservation score (Niroula et al., 2015; Zeng et al., 2014). ClinPred (Alirezaie et al., 2018) uses not only RF but also gradient boosting model by integrating individual prediction tool score and allele frequencies of SNPs and mutations in a population from gnomAD database (Karczewski et al., 2020).

CRAVAT (Masica et al., 2017) is a combination of two methodologies, CHASM (Cancer-specific High-throughput Annotation of Somatic Mutations) (Carter et al., 2009) and VEST (Variant Effect Scoring Tool) (Carter et al., 2013). CHASM algorithm utilizes an RF classifier that is formed by cancer driver variants and some passenger mutations, whereas RF classifier of VEST is trained by disease-related germline mutations. Rhapsody (Ponzoni et al., 2020b) is new web-based RF classifier algorithm that utilizes prediction and evaluation of pathogenicity by sequence conservation score and structure properties.

SVM is a supervised machine learning methodology used for the arrangement and analysis of datasets. Data points in SVM datasets are separated by an imaginary boundary, a hyperplane. Classification and clustering are performed by optimizing this hyperplane in particular sides to obtain the highest margin between data points (Ozer et al., 2020). PhD-SNP (Capriotti et al., 2006), SuSPect (Yates et al., 2014), and SNPs&GO (Calabrese et al., 2009) are SVM-based supervised learning tools.

PhD-SNP (Capriotti et al., 2006) can predict the new phenotype of disease-related SNPs starting from the protein sequence. SuSPect (Yates et al., 2014) is a prediction tool that consists of a trained SVM integrating sequence and structure properties of the protein to determine disease-associated variants. SNPs&GO (Calabrese et al., 2009) uses GO terms and protein sequence for prediction. ParePro (Tian et al., 2007) integrates evolutionary features with amino acid properties to identify the differences between the wild-type and mutant residues.

Consensus-based tools

Consensus-based prediction tools or consensus classifier tools are generally a combination of different tools that analyze homologous sequences through multiple sequence alignments. After analysis, consensus sequences are created and compared with actual protein sequences to make predictions. The most common consensus-based prediction tools are Variant Analysis Portal (VAPOR) (Brown and Tastan Bishop, 2018), Meta-SNP (Capriotti et al., 2013) and PredictSNP (Bendl et al., 2014).

VAPOR (Brown and Tastan Bishop, 2018) is actually integrated in the HUMA webserver with eight different tools, including PROVEAN, PolyPhen-2, PhD-SNP, PANTHER-PSEP, and FATHMM. Meta-SNP (Capriotti et al., 2013) is basically a meta-predictor consensus-based prediction tool that is developed for gathering information of disease-related nsSNPs from four different tools, PANTHER, PhD-SNP, SIFT, and SNAP. PredictSNP (Bendl et al., 2014) is also another meta-predictor consensus-based prediction tool that utilizes trained data from eight prediction tools and integrates the results into consensus classifier scores.

Tools for Investigating Stability and Structural Effects of nsSNPs

Tools that predict the structural impacts of nsSNPs on proteins are an exclusive area of structural bioinformatics. Due to nsSNPs, internal energy of a protein changes, leading to alterations in the protein structure. Free energy difference of a wild-type and a mutated form of a protein is a crucial parameter for protein stability (Kulshreshtha et al., 2016; Yue et al., 2005). Furthermore, long-range order, hydrophobicity of a residue, contact map matrix, and stabilization center of residues are exclusive parameters to determine the structural effects of SNPs. To this end, many tools have been developed for the prediction of nsSNPs' stabilities and structural effects on proteins (Table 3).

Table 3.

In Silico Tools for the Prediction and Evaluation of Stability and Structural Effects of Nonsynonymous Single Nucleotide Polymorphisms on Proteins

Tool name	Website	Input	Availability	Reference article
ENCOM	http://bcb.med.usherbrooke.ca/encom	Unknown	Inactive	Frappier et al. (2015)
DynaMut	http://biosig.unimelb.edu.au/dynamut/	PDB file or PDB ID, mutated amino acid and chain ID	Active	Rodrigues et al. (2018)
MuPro	http://mupro.proteomics.ics.uci.edu/	Mutated amino acid and position, protein sequence	Active	Cheng et al. (2006)
Eris	https://dokhlab.med.psu.edu/eris/login.php	No information	Inactive	Yin et al. (2007)
PoPMuSiC	https://soft.dezyme.com/	No information	Active	Dehouck et al. (2011)
CUPSAT	http://cupsat.tu-bs.de/	PDB ID, amino acid residue number	Active	Parthiban et al. (2006)
NeEMO	http://protein.bio.unipd.it/neemo/help.html	PDB file or PDB ID, amino acid position and alleles	Active	Giollo et al. (2014)
I-Mutant 2.0	http://folding.biofold.org/i-mutant/imutant2.0.html	PDB file or protein sequence	Active	Capriotti et al. (2005)
Pmut	http://mmb.irbbarcelona.org/PMut/	UniProt ID and mutation list	Active	López-Ferrando et al. (2017)
Protherm	https://www.iitm.ac.in/bioinfo/ProTherm/	Unknown	Inactive	Gromiha et al. (1999)
Auto-Mute 2.0	http://binf.gmu.edu/automute/	PDB ID, chain ID, mutation list	Active	Masso and Vaisman (2014)
SDM	http://marid.bioc.cam.ac.uk/sdm2	PDB number, mutation file list (Chain number, amino acid change)	Active	Pandurangan et al. (2017)
mCSM	http://biosig.unimelb.edu.au/mcsm/	PDB ID, chain ID, mutation list	Active	Pires et al. (2014)

mCSM, mutation Cutoff Scanning Matrix; SDM, Site-directed mutator.

ENCoM (Frappier et al., 2015) is a structural effect predictor web-server that can make predictions on protein dynamics features and thermostability through coarse-grained normal mode analysis. Furthermore, DynaMut (Rodrigues et al., 2018) is another web-based server that uses normal mode approach. DynaMut algorithm combines normal mode analysis with graph-based signature and utilizes them as a consensus predictor for protein stability.

MuPro (Cheng et al., 2006) is a machine learning algorithm depending on SVM that measures the free binding energy change (ΔΔG) of a wild-type and a mutated form of a protein. I-Mutant2.0 (Capriotti et al., 2005) is also a web-based SVM tool for direct prediction of protein stability by ΔΔG values. Eris (Yin et al., 2007) also uses ΔΔG analysis for structural effect prediction of SNPs through side-chain packing and backbone relaxation algorithms.

CUPSAT (Cologne University Protein Stability Analysis Tool) (Parthiban et al., 2006) is a web-based server that predicts ΔΔG differences through structural condition- specific atom and torsion angle potentials. Another tool in this category is PoPMuSiC (Prediction of Protein Mutant Stability Changes) (Dehouck et al., 2011). This tool mainly focuses on stability of a mutant protein with the help of sequence-based approaches.

Another approach for the prediction of structural effects of SNPs is based on residue interaction networks (RINs). RINs are graph-based representations of protein structures, where nodes serve as amino acids and edges symbolize physicochemical bonds. Interactions among residues are important in internal folding energy and hence protein stability; therefore, mutant effects on protein structures can be alternatively analyzed through RINs (Cheng et al., 2008). NeEMO (Giollo et al., 2014) is a web-based tool for the evaluation of stability changes and structural effects using RINs. Another tool using graph-based signatures for structural effect predictions is mutation Cutoff Scanning Matrix (Pires et al., 2014).

Pmut (López-Ferrando et al., 2017) is an open-source web portal that has several predictor approaches such as protein domain families, amino acid conservation scores, protein interactome information, and physicochemical features of the protein. Protherm (Gromiha et al., 1999) utilizes differences in thermodynamic parameters between wild-type and mutant proteins. Auto-Mute 2.0 (Masso andVaisman, 2014) is a stand-alone software package for predicting structural effects of nsSNPs using structure-based features with trained statistical learning models. Site-directed mutator (Pandurangan et al., 2017), which predicts structural stability of a mutant protein through statistical potential energy function, is a knowledge-based approach.

Case Studies Investigating the Functional and Structural Effects of nsSNPs

To date, many studies have been conducted (Supplementary Table 1) on the prediction of functional and structural effects of disease-related nsSNPs on proteins. Most of these studies focused on variants of one or more genes belonging to several genetic disorders.

While some studies aimed to investigate variants in multiple disorders and a single gene (Arshad et al., 2018; Doss et al., 2012; Islam et al., 2019; Porto et al., 2015; Shen et al., 2006) or multiple gene and a single disorder (Masoodi et al., 2013; Pandey et al., 2019), others aimed to analyze variants in a single gene and disorder (Abdul Samad et al., 2016; Chitrala and Yeguvapalli, 2014; Kandakatla et al., 2014; Khan et al., 2013; Kumar et al., 2013; Nagarajan et al., 2020; Naveed et al., 2016, 2017; Owji et al., 2020; Ponzoni et al., 2020a; Sang et al., 2017; Yadegari and Majidzadeh, 2019). Interpretations from these studies are generally related to individual genes, while some general consequences could also be derived from them. In this section, recent case studies related to the investigation of structural and functional effects of missense variations on proteins are summarized.

Predicting the effect of nsSNPs is critical for the determination of genetic characterization of a disorder, discovery of molecular therapeutic targets, and understanding evolutionary susceptibility to disease(s). Furthermore, comparing the predictions and experimental findings in terms of functional or structural effects provides valuable insights on pathogenicity and genetic basis of a disease.

Alterations in thermal activity, thermodynamic stability, and structural dynamics must be confirmed with experimental evaluations of variants, while traces of these effects should be detected through computational prediction methods. In an experimental study, Lori et al. (2013) studied Pim-1 kinases and the structural effects of natural variants on these proteins' stabilities and activities. They expressed and purified recombinant soluble mutant proteins and characterized the thermal and thermodynamic stability together with enzyme activity. As expected, their results indicated that mutant proteins display a significant decrease in thermodynamic and thermal stability and activation energy for kinase activity.

For the evaluation of the effects of nsSNPs, structural parameters can also be combined with sequence and evolutionary information and this combination can be a valuable methodology for novel variants' determination. One recent study on Cys loop receptor gene investigated the GABRA2 gene's pathogenic variants in combination with structure, sequence, and evolutionary information and compared the variants to the formerly reported pathogenic variants' positions in other Cys loop receptors (Sanchis-Juan et al., 2020). This study revealed that one of seven variants in GABRA2 gene results in a decreased score in structural, evolutionary, and sequence parameters.

Structural analysis of the variants can also be utilized as a diagnostic classifier when it is associated with genetic information. There is an integrative machine learning approach that produces an in silico model of CACNA1F gene to separate disease-related SNPs from benign variations (Sallah et al., 2020). This approach specifically contains sequence and homology modeling data along with structural parameters of amino acids such as charge, hydrophobicity, position, and size.

Comparison of the predictions of nsSNP effects using in silico tools and clinical methodology is another option to characterize disease-related variants. Cohort studies with in silico methodologies provide strong insights on disease-related variants. In their recent study, Cheng et al. (2020) investigated a large group of TAF1/MRXS33 intellectual disability syndrome cases by combination of clinical and in silico approaches. They performed computational analysis with modeling approaches to identify variants' pathogenicity scores and compared them with clinical phenotypes.

Clinically, multifactorial diseases such as cancer, neurodegenerative diseases, or neurodevelopmental diseases are difficult to investigate since there are many genes involved and variations in those genes resulting in a rather complex picture. Therefore, computational tools are useful in terms of supporting clinical methodology results. Post et al. (2020) investigated PTEN gene and its variants in Autism Spectrum Disorder. They created a deep phenotypic profiling approach to evaluate the effects of missense variants. This model resulted in a strong validation for the effects of variants in such a diverse multifactorial disorder.

An example study on the experimental approaches for the determination of functional effects of nsSNPs was carried by Marín-Martín et al. (2014). They investigated the effects of nsSNPs that can change ATP-binding cassette transporter (ABCA1) gene expression in Tangier disease and allelic disorders familial hypoalphalipoproteinemia. Their results showed that most of the nsSNPs were correctly predicted by MutPred and PolyPhen2 tools and were correlated with experimental studies.

In their recent study on the comparison of gene expression and functional effect prediction of nsSNPs, Russell et al. (2020) assessed Na⁺-taurocholate co-transporting polypeptide (NTCP, SLC10A1) gene, uptake, transport, and cellular localization of substrate taurocholic acid in patients with mutation. They also compared computational scores of in silico prediction tools with observed in vitro functional effects of nsSNPs to assess the efficiency of seven different algorithms. Decreased NTCP gene expression and reduced substrate uptake were observed in some rare variants. Interestingly, comparison of computational scores of in silico prediction tools with observed in vitro functional effects indicated that in silico prediction tools are not as powerful as experimental in vitro studies.

Decreased ambiguities in computational prediction tools and improved consensus of in silico algorithms are another aspect of case studies for the prediction of structural and functional effects of nsSNPs. Proper variant classification and determination of variant pathogenicity depend on reliability, accuracy, and precision of a prediction algorithm. Especially in machine learning-based prediction tools, quality of trained datasets is also important for reproducibility, reliability, and improved consensus prediction scores of variants.

In the literature, there exist three recent studies that compare different in silico prediction tools and algorithms. First of all, Orioli and Vihinen (2019) have assessed performance of 22 variant pathogenicity predictor tools along with 7 subcellular localization predictors on membrane proteins computationally. PON-P2 (Niroula et al., 2015) was demonstrated to have the best performance followed by REVEL (Ioannidis et al., 2016) and VEST3 (Carter et al., 2013). They also concluded that in silico predictors are more successful in prediction of multipass proteins than single0pass proteins.

Second, Accetturo et al. (2020) evaluated the performance of three in silico meta-predictor algorithms: VEST3 (Carter et al., 2013), REVEL (Ioannidis et al., 2016), and ClinPred (Alirezaie et al., 2018) in NF1 gene variants from ClinVar (Chitipiralla et al., 2015). Among all three meta-predictors, there was no significant difference in the scores of variants in “benign,” “likely benign,” “likely pathogenic,” and “pathogenic” categories of ClinVar. Finally, Gyulkhandanyan et al. (2020) also assessed the performance and reliability of 22 in silico pathogenicity prediction tools and algorithms in missense variants. Even though some conflicting results were found from some variants in this study, they concluded that a combination of several tools can be used to define potential effects of variants.

Case studies investigating effects of nsSNPs by using MD simulations

MD simulations are an important methodology for assessing protein dynamic motions by predicting how atoms of a protein or a biomolecular system move over time. By the development of recent computational methodologies, MD simulations have been involved in many studies that investigate stability and specificity of a protein (Sneha and Priya Doss, 2016).

MD simulations are particularly important for investigating stability and structural effects of nsSNPs on proteins. nsSNPs and missense mutations seem to alter protein dynamics as global perturbations (Haliloglu and Bahar, 2015). MD simulations have also been used for investigating the structural properties of nsSNPs in protein interaction interfaces (Kamburov et al., 2015). Variations are also enriched at dynamically and functionally important regions such as cofactor binding region, DNA-binding region, and hinge region; thus, MD simulations have been useful in detecting the effect of the variations (Stehr et al., 2011).

In literature, there exist many MD studies (as shown in Supplementary Table 1) for the identification of nsSNPs' effects. Some of these studies mainly focus on the MD simulation analysis methods for phenotypic consequences of nsSNPs, such as root mean square deviation (RMSD), root mean square fluctuation (RMSF), hydrogen-bonds (H-bonds), and solvent-accessible surface area (SASA) analysis. Variations in the binding region of a protein can inhibit protein–protein, protein–ligand, or protein-DNA interactions; therefore, such deleterious variations can alter protein stability and dynamics.

In their study, Doss and NagaSundaram (2012) aimed to investigate the pathogenic variations in A-purinic endonuclease-1 (APE1) gene that disturb the binding surface and protein-DNA interactions. A practical methodology, which overlaps the scores from two different in silico prediction tools and analysis from MD simulations, including RMSF, RMSD, H-bonding, salt bridge, and SASA, was developed to assess the APE1 gene variants.

In another study, MD simulations were utilized as a supporting data for functional and structural pathogenicity prediction of nsSNPs (Kumar and Purohit, 2014). MD simulations were conducted, and their results were combined with in silico prediction tools to interpret cancer-related mutations in Aurora-A kinase gene. As a result, atomic rearrangements and structural conformational changes were observed in a mutant protein, while these interpretations were correlated with computational predictions.

Since extended MD simulations provide more insights on protein structure, Marcolino et al. (2016) performed 1 μs MD simulations to evaluate structural impacts of variations in uroguanylin gene. Also, dynamic cross-correlation (DCC) and dynamic residue network (DRN) analysis have been used for variant protein structure and are supportive techniques for MD simulations in variant effect prediction. Sanyanga and Tastan Bishop (2020) have investigated different pathogenic variants in Carbonic Anhydrase VIII gene and have performed DCC and DRN analysis. According to DRN analysis, change in binding surface structure and its accessibility could be a discriminating factor for benign and malign variants.

Conclusions and Outlook

This expert review offers a synthesis of the in silico tools and algorithms for the prediction of functional or structural effects of SNP variants, in addition to the description of the phenotypic effects of nsSNPs on protein structure, association between pathogenicity of variants, and functional or structural features of disease-associated variants. Finally, case studies investigating the functional and structural effects of nsSNPs on selected protein structures are highlighted.

Through recent developments in computational technologies, a diversity of approaches and tools has been produced to assess the functional and structural effects of nsSNPs or missense variants. In addition, structural variants, including deletions, inversions, and duplications, have major roles in protein function and structure since these variants can lead to addition or deletion of amino acids and cause perturbations in the system. Since SNPs have exclusive functional and structural effects involving drug response, gene expression, and disease susceptibility, computational predictions of these effects provide enormous insights, especially in the field of medical science.

Phenotypic effects of nsSNPs on protein structure can exhibit neutral or negative behavior; therefore, these variations can alter functional features, especially in terms of biochemical parameters such as stability, interaction, and dynamics. Disease-related nsSNPs can also result in changes in physicochemical features of amino acids, energy landscapes, free energy differences among folded and unfolded states of proteins, and the amount of conformations in different folding states. For the proper assessment of these structural parameters, MD simulations should be utilized for the assessment of structural effects of nsSNPs on proteins.

Furthermore, a consistent workflow should be established for the reliability and reproducibility of in silico approaches, while a combination of several in silico prediction algorithms or tools must be considered to increase the performance, accuracy, and precision. Combination of in silico tools should comprise two or more of different methodologies such as sequence homology-based, supervised learning-based, sequence-structure based, and consensus-based tools. Therefore, artifacts or disadvantages from each can be eliminated to increase the performance of the assessment. It should also be noted that there does not exist a single dataset that matches the input requirements of all available tools/programs. Hence, a major improvement to the field can be achieved when such a dataset is created.

Another important aspect for improving assessment of effects of SNPs is experimental validation. Experimental validation for effects of variations should focus on genotype and phenotype associations to increase performance. Indeed, adding experimental verification step into workflow for the assessment effects of nsSNPs on proteins provides precision, reliability, and reproducibility.

We conclude that creating a consistent workflow with a combination of in silico approaches or tools should be considered to increase the performance, accuracy, and precision of the biological and clinical predictions made in silico.

Footnotes

Acknowledgment

TUSEB project number 3454 is kindly acknowledged.

Author Disclosure Statement

The authors declare they have no conflicting financial interests.

Funding Information

No funding was received for this article.

Supplementary Material

Abbreviations Used

References

Genomes Project Consortium. (2010). A map of human genome variation from population-scale sequencing. Nature, 467, 1061–1073.

Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526, 68–74.

Abdul Samad

, Suliman

, Basha

, Manivasagam

, and Essa

. (2016). A comprehensive In Silico analysis on the structural and functional impact of SNPs in the congenital heart defects associated with NKX2-5 gene—A molecular dynamic simulation approach. PLoS One, 11, e0153999.

Accetturo

, Bartolomeo

, and Stella

. (2020). In-silico analysis of NF1 missense variants in clinvar: Translating variant predictions into variant interpretation and classification. Int J Mol Sci, 21, 1–19.

Adzhubei

, Schmidt

, Peshkin

, et al. (2010). A method and server for predicting damaging missense mutations. Nat Methods, 7, 248–249.

Alirezaie

, Kernohan

, Hartley

, Majewski

, and Hocking

. (2018). ClinPred: Prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. Am J Hum Genet, 103, 474–483.

Alzu'bi

, Zhou

, and Watzlaf

VJM

. (2019). Genetic variations and precision medicine. Perspect Health Inf Manag, 16, 1a.

Arshad

, Bhatti

, and John

. (2018). Identification and in silico analysis of functional SNPs of human TAGAP protein: A comprehensive study. PLoS One, 13, e0188143.

Balabin

, and Lomakina

. (2009). Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies. J Chem Phys, 131, 074104.

10.

Barroso

, Gurnell

, Crowley

VEF

, et al. (1999). Dominant negative mutations in human PPARγ associated with severe insulin resistance, diabetes mellitus and hypertension. Nature, 402, 880–883.

11.

Bartlett

, and Radford

. (2009). An expanding arsenal of experimental methods yields an explosion of insights into protein folding mechanisms. Nat Struct Mol Biol, 16, 582–588.

12.

Baskin

, Winkler

, and Tetko

. (2016). A renaissance of neural networks in drug discovery. Expert Opin Drug Discov, 11, 785–795.

13.

Bendl

, Stourac

, Salanda

, et al. (2014). PredictSNP: Robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol, 10, e1003440.

14.

Berger

, Brooks

, Wu

, et al. (2016). High-throughput phenotyping of lung cancer somatic mutations. Cancer Cell, 30, 214–228.

15.

Betts

, and Russell

. (2003). Amino Acid Properties and Consequences of Substitutions. Bioinformatics for Geneticists. Chichester, United Kingdom: John Wiley & Sons, Ltd., 289–316.

16.

Breiman

. (2019). Random forests. Random Forests, the, Netherlands, 1–122.

17.

Brock

, Talley

, Coley

, Kundrotas

, and Alexov

. (2007). Optimization of electrostatic interactions in protein-protein complexes. Biophys J, 93, 3340–3352.

18.

Bromberg

, and Rost

. (2007). SNAP: Predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res, 35, 3823–3835.

19.

Brookes

. (1999). The essence of SNPs. Gene, 234, 177–186.

20.

Brown

, and Tastan

Bishop Ö

. (2017). The role of structural bioinformatics in drug discovery via computational SNP analysis – a proposed protocol for analyzing variation at the protein level. Glob Heart, 12, 151–161.

21.

Brown

, and Tastan

Bishop Ö

. (2018). HUMA: A platform for the analysis of genetic variation in humans. Hum Mutat, 39, 40–51.

22.

Buniello

, Macarthur

JAL

, Cerezo

, et al. (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res, 47, D1005–D1012.

23.

Calabrese

, Capriotti

, Fariselli

, Martelli

, and Casadio

. (2009). Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat, 30, 1237–1244.

24.

Camacho

, Collins

, Powers

, Costello

, and Collins

. (2018). Next-generation machine learning for biological networks. Cell, 173, 1581–1592.

25.

Capriotti

, Altman

, and Bromberg

. (2013). Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics, 14, S2.

26.

Capriotti

, Calabrese

, and Casadio

. (2006). Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics, 22, 2729–2734.

27.

Capriotti

, Fariselli

, and Casadio

. (2005). I-Mutant2.0: Predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res, 33, 306–310.

28.

Capriotti

, Fornasari

, Juritz

, Martelli

, and Fariselli

. (2009). Improving the prediction of disease-related variants using protein dynamism. BMC Bioinformatics, 12, 2010–2012.

29.

Carter

, Chen

, Isik

, et al. (2009). Cancer-specific high-throughput annotation of somatic mutations: Computational prediction of driver missense mutations. Cancer Res, 69, 6660–6667.

30.

Carter

, Douville

, Stenson

, Cooper

, and Karchin

. (2013). Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics 14(Suppl 3), S3.

31.

Cassa

, Weghorn

, Balick

, et al. (2017). Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet, 49, 806–810.

32.

Chakravarti

. (2001). To a future of genetic medicine. Nature, 409, 822–823.

33.

Chasman

, and Adams

. (2001). Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: Structure-based assessment of amino acid variation. J Mol Biol, 307, 683–706.

34.

Chaturvedi

, and Mahalakshmi

. (2013). Methionine mutations of outer membrane protein X influence structural stability and beta-barrel unfolding. PLoS One, 8, e79351.

35.

Chen

, and Siu

SWI

. (2020). Machine learning approaches for quality assessment of protein structures. Biomolecules, 10, 626.

36.

Cheng

, Capponi

, Wakeling

, et al. (2020). Missense variants in TAF1 and developmental phenotypes: Challenges of determining pathogenicity. Hum Mutat, 41, 449–464.

37.

Cheng

, Randall

, and Baldi

. (2006). Prediction of protein stability changes for single-site mutations using support vector machines. Proteins, 62, 1125–1132.

38.

Cheng

TMK

, Lu

, Vendruscolo

, Lio’

, and Blundell

. (2008). Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms. PLoS Comput Biol, 4, e1000135.

39.

Chitipiralla

, Jang

, Brown

, et al. (2015). ClinVar: Public archive of interpretations of clinically relevant variants. Nucleic Acids Res, 44, D862–D868.

40.

Chitrala

, and Yeguvapalli

. (2014). Computational screening and molecular dynamic simulation of breast cancer associated deleterious non-synonymous single nucleotide polymorphisms in TP53 gene. PLoS One, 9, e104242.

41.

Choi

, and Chan

. (2015). PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics, 31, 2745–2747.

42.

Choy

, Xu

, Gwak

, Chen

, and Savarese

. (2016). 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9912 LNCS, arXiv.org, the United States of, America, 628–644.

43.

Cohen

, Afshar

, Tapson

, and Van Schaik

. (2017). EMNIST: Extending MNIST to handwritten letters. Proceedings of the International Joint Conference on Neural Networks, the United States of America, May, 2017, 2921–2926.

44.

Collins

, Guyer

, and Chakravarti

. (1997). Variations on a theme: Cataloging human DNA sequence variation. Science, 278, 1580–1581.

45.

Condit

, Achter

, Lauer

, and Sefcovic

. (2002). The changing meanings of “mutation”: A contextualized study of public discourse. Hum Mutat, 19, 69–75.

46.

Cook

, Bergman

, Finn

, Cochrane

, Birney

, and Apweiler

. (2016). The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res, 44, D20–D26.

47.

Cooper

, Smith

, Cooke

, Niemann

, and Schmidtke

. (1985). An estimate of unique DNA sequence heterozygosity in the human genome. Hum Genet, 69, 201–205.

48.

David

, and Sternberg

MJE

. (2015). The contribution of missense mutations in core and rim residues of protein–protein interfaces to human disease. J Mol Biol, 427, 2886–2898.

49.

De Baets

, Durme J

Van

, Reumers

, et al. (2012). SNPeffect 4.0: On-line prediction of molecular and structural effects of protein-coding variants. Nucleic Acids Res, 40, 935–939.

50.

Dehouck

, Kwasigroch

, Gilis

, and Rooman

. (2011). PoPMuSiC 2.1: A web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics, 12, 151.

51.

Doss

CGP

, and NagaSundaram

. (2012). Investigating the structural impacts of I64T and P311S mutations in APE1-DNA complex: A molecular dynamics approach. PLoS One, 7, e31677.

52.

Doss

CGP

, Rajith

, Garwasis

, et al. (2012). Screening of mutations affecting protein stability and dynamics of FGFR1—A simulation analysis. Appl Transl Genom, 1, 37–43.

53.

Durme J

Van

, Maurer-Stroh

, Gallardo

, Wilkinson

, Rousseau

, and Schymkowitz

. (2009). Accurate prediction of DnaK-peptide binding via homology modelling and experimental data. PLoS Comput Biol, 5, e1000475.

54.

Escaramís

, Docampo

, and Rabionet

. (2015). A decade of structural variants: Description, history and methods to detect structural variation. Brief Funct Genomics, 14, 305–314.

55.

Fernandez-Escamilla

, Rousseau

, Schymkowitz

, and Serrano

. (2004). Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol, 22, 1302–1306.

56.

Fokkema

IFAC

, Taschner

PEM

, Schaafsma

GCP

, Celli

, Laros

JFJ

, and Dunnen

JT den

. (2011). LOVD v.2.0: The next generation in gene variant databases. Hum Mutat 32, 557–563.

57.

Frappier

, Chartier

, and Najmanovich

. (2015). ENCoM server: Exploring protein conformational space and the effect of mutations on protein function and stability. Nucleic Acids Res, 43, W395–W400.

58.

Ganesan

, Kulandaisamy

, Binny Priya

, and Michael Gromiha

. (2019). HuVarbase: A human variant database with comprehensive information at gene and protein levels. PLoS One, 14, e0210475.

59.

Gao

, and Keinan

. (2014). High burden of private mutations due to explosive human population growth and purifying selection. BMC Genomics, 15, S3.

60.

Gazave

, Ma

, Chang

, et al. (2014). Neutral genomic regions refine models of recent rapidhuman population growth. Proc Natl Acad Sci USA, 111, 757–762.

61.

Giollo

, Martin

AJM

, Walsh

, Ferrari

, and Tosatto

SCE

. (2014). NeEMO: A method using residue interaction networks to improve prediction of protein stability upon mutation. BMC Genomics, 15, 1–11.

62.

Gromiha

, An

, Kono

, Oobatake

, Uedaira

, and Sarai

. (1999). ProTherm: Thermodynamic database for proteins and mutants. Nucleic Acids Res, 27, 286–288.

63.

Guerois

, Nielsen

, and Serrano

. (2002). Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations. J Mol Biol, 320, 369–387.

64.

Guo

, Yu

, Samuels

, Yue

, Ness

, and Zhao

. (2019). Single-nucleotide variants in human RNA: RNA editing and beyond. Brief Funct Genomics, 18, 30–39.

65.

Gyulkhandanyan

, Rezaie

, Roumenina

, et al. (2020). Analysis of protein missense alterations by combining sequence- and structure-based methods. Mol Genet Genomic Med, 8, e1166.

66.

Haliloglu

, and Bahar

. (2015). Adaptability of protein structures to enable functional interactions and evolutionary implications. Curr Opin Struct Biol, 35, 17–23.

67.

Higasa

, Miyake

, Yoshimura

, et al. (2016). Human genetic variation database, a reference database of genetic variations in the Japanese population. J Hum Genet, 61, 547–553.

68.

Hubbard

, Barker

, Birney

, et al. (2002). The Ensembl genome database project. Nucleic Acids Res, 30, 38–41.

69.

Ioannidis

, Rothstein

, Pejaver

, et al. (2016). REVEL: An ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet, 99, 877–885.

70.

Islam

, Parves

, Mahmud

, Tithi

, and Reza

. (2019). Assessment of structurally and functionally high-risk nsSNPs impacts on human bone morphogenetic protein receptor type IA (BMPR1A) by computational approach. Comput Biol Chem, 80, 31–45.

71.

Ittisoponpisan

, Islam

, Khanna

, Alhuzimi

, David

, and Sternberg

MJE

. (2019). Can predicted protein 3D structures provide reliable insights into whether missense variants are disease associated?. J Mol Biol, 431, 2197–2212.

72.

Jackson

, Marks

, May

GHW

, and Wilson

. (2018). The genetic basis of disease. Essays Biochem, 62, 643–723.

73.

Kamburov

, Lawrence

, Polak

, et al. (2015). Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc Natl Acad Sci USA, 112, E5486–E5495.

74.

Kandakatla

, Ramakrishnan

, Chekkara

, and Balakrishnan

. (2014). Computational screening of disease associated mutations on NPC1 gene and its structural consequence in Niemann-Pick type-C1. Front Biol, 9, 410–421.

75.

Karczewski

, Francioli

, Tiao

, et al. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. bioRxiv, 581, 531210.

76.

Khan

, Abduljaleel

, Alanazi

, and Elrobh

. (2013). Evidence of colorectal cancer risk associated variant Lys25Ser in the proximity of human bone morphogenetic protein 2. Gene, 522, 75–83.

77.

Kim

, Ilic

, Shrestha

, et al. (2016). Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discov, 6, 714–726.

78.

Kucukkal

, Petukh

, Li

, and Alexov

. (2015). Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. Curr Opin Struct Biol, 32, 18–24.

79.

Kucukkal

, Yang

, Chapman

, Cao

, and Alexov

. (2014). Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci, 15, 9670–9717.

80.

Kulshreshtha

, Chaudhary

, Goswami

, and Mathur

. (2016). Computational approaches for predicting mutant protein stability. J Comput Aided Mol Des, 30, 401–412.

81.

Kumar

, and Purohit

. (2014). Use of long term molecular dynamics simulation in predicting cancer associated SNPs. PLoS Comput Biol, 10, e1003318.

82.

Kumar

, Rajendran

, Sethumadhavan

, and Purohit

. (2013). Evidence of colorectal cancer-associated mutation in MCAK: A computational report. Cell Biochem Biophys, 67, 837–851.

83.

Küntzer

, Eggle

, Klostermann

, and Burtscher

. (2010). Human variation databases. Database, 2010, 1–13.

84.

Kwok

, and Chen

. (2003). Detection of single nucleotide polymorphisms. Curr Issues Mol Biol, 5, 43–60.

85.

Landrum

, Lee

, Benson

, et al. (2018). ClinVar: Improving access to variant interpretations and supporting evidence. Nucleic Acids Res, 46, D1062–D1067.

86.

Lappalainen

, Almeida-King

, Kumanduri

, et al. (2015). The European Genome-phenome Archive of human data consented for biomedical research. Nat Genet, 47, 692–695.

87.

Lappalainen

, Lopez

, Skipper

, et al. (2013). DbVar and DGVa: Public archives for genomic structural variation. Nucleic Acids Res, 41, 936–941.

88.

Lek

, Karczewski

, Minikel

, et al. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536, 285–291.

89.

Liu

, Wu

, Li

, and Boerwinkle

. (2016). dbNSFP v3.0: A one-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs. Hum Mutat, 37, 235–241.

90.

Y-C

, Rensi

, Torng

, and Altman

. (2018). Machine learning in chemoinformatics and drug discovery. Drug Discov Today, 23, 1538–1546.

91.

López-Ferrando

, Gazzo

, La Cruz X

, Orozco

, and Gelpí

. (2017). PMut: A web-based tool for the annotation of pathological variants on proteins, 2017 update. Nucleic Acids Res, 45, W222–W228.

92.

Lori

, Lantella

, Pasquo

, et al. (2013). Effect of single amino acid substitution observed in cancer on Pim-1 kinase thermodynamic stability and structure. PLoS One, 8, e64824.

93.

Lyons

, Alizadeh

, Mannheimer

, et al. (2016). Changes in cell shape are correlated with metastatic potential in murine and human osteosarcomas. Biol Open, 5, 289–299.

94.

Mailman

, Feolo

, Jin

, et al. (2007). The NCBI dbGaP database of genotypes and phenotypes. Nat Genet, 39, 1181–1186.

95.

Marcolino

ACS

, Porto

, Pires ÁS, Franco

, and Alencar

. (2016). Structural impact analysis of missense SNPs present in the uroguanylin gene by long-term molecular dynamics simulations. J Theor Biol, 410, 9–17.

96.

Marín-Martín

, Soler-Rivas

, Martín-Hernández

, and Rodriguez-Casado

. (2014). A comprehensive in silico analysis of the functional and structural impact of nonsynonymous sNPs in the ABCA1 transporter gene. Cholesterol, 2014, 1–19.

97.

Masica

, Douville

, Tokheim

, et al. (2017). CRAVAT 4: cancer-related analysis of variants toolkit. Cancer Res, 77, e35–e38.

98.

Masoodi

, Shammari SA

, Al-Muammar

, Alhamdan

, and Talluri

. (2013). Exploration of deleterious single nucleotide polymorphisms in late-onset Alzheimer disease susceptibility genes. Gene, 512, 429–437.

99.

Masso

, and Vaisman

. (2014). AUTO-MUTE 2.0: A portable framework with enhanced capabilities for predicting protein functional consequences upon mutation. Adv Bioinformatics, 2014, 278385.

100.

Maurer-Stroh

, Debulpaep

, Kuemmerer

, et al. (2010). Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nat Methods, 7, 237–242.

101.

McLendon

, Friedman

, Bigner

, et al. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068.

102.

, Muruganujan

, Ebert

, Huang

, and Thomas

. (2019). PANTHER version 14: More genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res, 47, D419–D426.

103.

Mishra

, Kumar

, and Mukhtar

. (2019). Systems biology and machine learning in plant–pathogen interactions. Mol Plant Microbe Interact, 32, 45–55.

104.

Moal

, and Fernández-Recio

. (2012). SKEMPI: A Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics, 28, 2600–2607.

105.

Motlagh

, Wrabl

, Li

, and Hilser

. (2014). The ensemble nature of allostery. Nature, 508, 331–339.

106.

Nagarajan

, Narayanaswamy

, and Vetrivel

. (2020). Mutational landscape screening of methylene tetrahydrofolate reductase to predict homocystinuria associated variants: An integrative computational approach. Mutat Res. 819–820, 111687.

107.

Naveed

, Anwar

, Tariq

, and Abbas

. (2017). In silico screening and pathway analysis of disease-associated nsSNPs of MITF gene: A study on melanoma. Int J Comput Sci Inform Secur, 15, 31–54.

108.

Naveed

, Tehreem

, Mubeen

, Nadeem

, Zafar

, and Irshad

. (2016). In-silico analysis of non-synonymous-SNPs of STEAP2: To provoke the progression of prostate cancer. Open Life Sci, 11, 402–416.

109.

Nicholls

, John

, Watson

, Munroe

, Barnes

, and Cabrera

. (2020). Reaching the end-game for GWAS: Machine learning approaches for the prioritization of complex disease loci. Front Genet, 11, 350.

110.

Niroula

, Urolagin

, and Vihinen

. (2015). PON-P2: Prediction method for fast and reliable identification of harmful variants. PLoS One, 10, e0117380.

111.

Nussinov

, and Tsai

. (2013). Allostery in disease and in drug discovery. Cell, 153, 293–305.

112.

Orioli

, Vihinen

. (2019). Benchmarking subcellular localization and variant tolerance predictors on membrane proteins. BMC Genomics, 20, 547.

113.

Owji

, Eslami

, Nezafat

, and Ghasemi

. (2020). In silico elucidation of deleterious non-synonymous SNPs in SHANK3, the autism spectrum disorder gene. J Mol Neurosci, 70, 1649–1667.

114.

Ozer

, Sarica

, and Arga

. (2020). New machine learning applications to accelerate personalized medicine in breast cancer: Rise of the support vector machines. OMICS, 24, 241–246.

115.

Pace

, and Weerapana

. (2013). Diverse functional roles of reactive cysteines. ACS Chem Biol, 8, 283–296.

116.

Pace

, and Weerapana

. (2014). Zinc-binding cysteines: Diverse functions and structural motifs. Biomolecules, 4, 419–434.

117.

Pandey

, Dhusia

, Katara

, Singh

, and Gautam

. (2019). An in silico analysis of deleterious single nucleotide polymorphisms and molecular dynamics simulation of disease linked mutations in genes responsible for neurodegenerative disorder. J Biomol Struct Dyn, 38, 4259–4272.

118.

Pandurangan

, Ochoa-Montaño

, Ascher

, and Blundell

. (2017). SDM: A server for predicting effects of mutations on protein stability. Nucleic Acids Res, 45, W229–W235.

119.

Pangallo

, Kiladjian

, Cassinat

, et al. (2020). Rare and private spliceosomal gene mutations drive partial, complete, and dual phenocopies of hotspot alterations. Blood, 135, 1032–1043.

120.

Parthiban

, Gromiha

, and Schomburg

. (2006). CUPSAT: Prediction of protein stability upon point mutations. Nucleic Acids Res, 34, 239–242.

121.

Pejaver

, Urresti

, Lugo-Martinez

, et al. (2017). MutPred2: inferring the molecular and phenotypic impact of amino acid variants. bioRxiv. DOI: https://dx-doi-org.web.bisu.edu.cn/10.1101/134981.

122.

Petukh

, Kucukkal

, and Alexov

. (2015). On human disease-causing amino acid variants: Statistical study of sequence and structural patterns. Hum Mutat, 36, 524–534.

123.

Pires

DEV

, Ascher

, and Blundell

. (2014). MCSM: Predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics, 30, 335–342.

124.

Ponzoni

, Nguyen

, Bahar

, and Brodsky

. (2020a). Complementary computational and experimental evaluation of missense variants in the ROMK potassium channel. PLoS Comput Biol, 16, e1007749.

125.

Ponzoni

, Peñaherrera

, Oltvai

, and Bahar

. (2020b). Rhapsody: Predicting the pathogenicity of human missense variants. Bioinformatics, 36, 3084–3092.

126.

Porto

, Franco

, and Alencar

. (2015). Computational analyses and prediction of guanylin deleterious SNPs. Peptides, 69, 92–102.

127.

Post

, Belmadani

, Ganguly

, et al. (2020). Multi-model functionalization of disease-associated PTEN missense mutations identifies multiple molecular mechanisms underlying protein dysfunction. Nat Commun, 11, 2073.

128.

Rentzsch

, Witten

, Cooper

, Shendure

, and Kircher

. (2019). CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res, 47, D886–D894.

129.

Reva

, Antipin

, and Sander

. (2007). Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol, 8, R232.

130.

Reva

, Antipin

, and Sander

. (2011). Predicting the functional impact of protein mutations: Application to cancer genomics. Nucleic Acids Res, 39, 37–43.

131.

Rodrigues

CHM

, Pires

DEV

, and Ascher

. (2018). DynaMut: Predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic Acids Res, 46, W350–W355.

132.

Russell

, Zhou

, Lauschke

, and Kim

. (2020). In vitro functional characterization and in silico prediction of rare genetic variation in the bile acid and drug transporter, Na+-taurocholate cotransporting polypeptide (NTCP, SLC10A1). Mol Pharm, 17, 1170–1181.

133.

Sallah

, Sergouniotis

, Barton

, et al. (2020). Using an integrative machine learning approach utilising homology modelling to clinically interpret genetic variants: CACNA1F as an exemplar. Eur J Hum Genet, 28, 1274–1282.

134.

Sanchis-Juan

, Hasenahuer

, Baker

, et al. (2020). Structural analysis of pathogenic missense mutations in GABRA2 and identification of a novel de novo variant in the desensitization gate. Mol Genet Genomic Med, 8, e1106.

135.

Sang

, Hu

, Ye

Y-J

, Li

L-H

, Zhang

, Xie

Y-H

, and Meng

Z-H

. (2017). In silico screening, molecular docking, and molecular dynamics studies of SNP-derived human P5CR mutants. J Biomol Struct Dyn, 35, 2441–2453.

136.

Sanyanga

, and Tastan

Bishop Ö

. (2020). Structural characterization of carbonic anhydrase VIII and effects of missense single nucleotide variations to protein structure and function. Int J Mol Sci, 21, 1–20.

137.

Savas

. (2010). Useful genetic variation databases for oncologists investigating the genetic basis of variable treatment response and survival in cancer. Acta Oncol, 49, 1217–1226.

138.

Schymkowitz

, Borg

, Stricher

, Nys

, Rousseau

, and Serrano

. (2005). The FoldX web server: An online force field. Nucleic Acids Res, 33, 382–388.

139.

Scott

, Schiettecatte

, Bocchini

, Amberger

, and Hamosh

. (2014). OMIM.org: Online Mendelian Inheritance in Man (OMIM^®), an online catalog of human genes and genetic disorders. Nucleic Acids Res 43, D789–D798.

140.

Shen

, Deininger

, and Zhao

. (2006). Applications of computational algorithm tools to identify functional SNPs in cytokine genes. Cytokine, 35, 62–66.

141.

Sherry

, Ward

, Kholodov

, et al. (2001). DbSNP: The NCBI database of genetic variation. Nucleic Acids Res, 29, 308–311.

142.

Shihab

, Gough

, Cooper

, et al. (2013). Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat, 34, 57–65.

143.

Sim

, Kumar

, Hu

, Henikoff

, Schneider

, and Ng

. (2012). SIFT web server: Predicting effects of amino acid substitutions on proteins. Nucleic Acids Res, 40, 452–457.

144.

Sneha

, and Priya Doss

. (2016). Molecular Dynamics: New Frontier in Personalized Medicine. Adv Protein Chem Struct Biol, 102, 181–224.

145.

Starita

, Islam

, Banerjee

, et al. (2018). A multiplex homology-directed DNA repair assay reveals the impact of more than 1,000 BRCA1 missense substitution variants on protein function. Am J Hum Genet, 103, 498–508.

146.

Stefl

, Nishi

, Petukh

, Panchenko

, and Alexov

. (2013). Molecular mechanisms of disease-causing missense mutations. J Mol Biol, 425, 3919–3936.

147.

Stehr

, Jang

SHJ

, Duarte

, et al. (2011). The structural impact of cancer-associated missense mutations in oncogenes and tumor suppressors. Mol Cancer, 10, 54.

148.

Stenson

, Mort

, Ball E

, et al. (2020). The Human Gene Mutation Database (HGMD^®): Optimizing its use in a clinical diagnostic or research setting. Hum Genet, 139, 1197–1207.

149.

Stitziel

, Tseng

, Pervouchine

, Goddeau

, Kasif

, and Liang

. (2003). Structural location of disease-associated single-nucleotide polymorphisms. J Mol Biol, 327, 1021–1030.

150.

Tang

, and Thomas

. (2016). PANTHER-PSEP: Predicting disease-causing genetic variants using position-specific evolutionary preservation. Bioinformatics, 32, 2230–2232.

151.

Tate

, Bamford

, Jubb

, et al. (2019). COSMIC: The Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res, 47, D941–D947.

152.

Tavtigian

, Byrnes

, Thomas

. (2008). Classification of rare missense substitutions, using risk surfaces, with genetic-and molecular-epidemiology applications. Hum Mutat, 29, 1342–1354.

153.

Telenti

, Pierce

LCT

, Biggs

, et al. (2016). Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci USA, 113, 11901–11906.

154.

Tian

, Wu

, Guo

, Zhang

, and Fan

. (2007). Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics, 8, 5–8.

155.

Vaser

, Adusumalli

, Leng

, Sikic

, and Ng

. (2016). SIFT missense predictions for genomes. Nat Protoc, 11, 1–9.

156.

Vitkup

, Sander

, and Church

. (2003). The amino-acid mutational spectrum of human genetic disease. Genome Biol, 4, R72.

157.

Wainreb

, Ashkenazy

, Bromberg

, et al. (2010). MuD: An interactive web server for the prediction of non-neutral substitutions using protein structural data. Nucleic Acids Res, 38, 523–528.

158.

Wang

, and Moult

. (2001). SNPs, protein structure and disease. Hum Mutat, 17, 263–270.

159.

Weischenfeldt

, Symmons

, Spitz

, and Korbel

. (2013). Phenotypic impact of genomic structural variation: Insights from and for human disease. Nat Rev Genet, 14, 125–138.

160.

Yadegari

, and Majidzadeh

. (2019). In silico analysis for determining the deleterious nonsynonymous single nucleotide polymorphisms of BRCA genes. Mol Biol Res Commun, 8, 141–150.

161.

Yates

, Filippis

, Kelley

, and Sternberg

MJE

. (2014). SuSPect: Enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol, 426, 2692–2701.

162.

Yates

, and Sternberg

MJE

. (2013). The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. J Mol Biol, 425, 3949–3963.

163.

Yin

, Ding

, and Dokholyan

. (2007). Eris: An automated estimator of protein stability. Nat Methods, 4, 466–467.

164.

Yokoyama

, and Kasahara

. (2020). Visualization tools for human structural variations identified by whole-genome sequencing. J Hum Genet, 65, 49–60.

165.

Yue

, Li

, and Moult

. (2005). Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol, 353, 459–473.

166.

Yue

, Melamud

, and Moult

. (2006). SNPs3D: Candidate gene and SNP selection for association studies. BMC Bioinformatics, 7, 1–15.

167.

Zeng

, Yang

, Chung

BHY

, Lau

, and Yang

. (2014). EFIN: Predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome. BMC Genomics, 15, 1–9.

168.

Zhao

, Han

, Shyu

, and Korkin

. (2014). Determining effects of non-synonymous SNPs on Protein-protein interactions using supervised and semi-supervised learning. PLoS Comput Biol, 10, e1003592.

169.

Zou

, Wu

, Tan

, Shang

, and Zhou

. (2020). Significance of single-nucleotide variants in long intergenic non-protein coding RNAs. Front Cell Dev Biol, 8, 1–14.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB