Abstract
Abstract
Stroke is a major cause of mortality and morbidity in both the developed and developing world. Next generation sequencing (NGS) and multi-omics integrative biology research offer new opportunities in the way we research and understand stroke. These biotechnologies also signal a shift from genetics to genomics of stroke, which is highlighted in this review. Stroke is a focal neurological deficit resulting from disruption of the cerebral blood supply. There are two main types of common stroke, ischemic stroke (IS), which comprises 80% of cases, and hemorrhagic stroke (HS) that accounts for about 20% of cases. IS is a complex multi-factorial disease with multiple environmental and genomic determinants. We discuss here IS from genomics and bioinformatics perspectives, including the highlights of the genome wide association studies (GWAS), NGS progress to date, and exome studies. While both ‘common variant, common disease’ and ‘rare variant, common disease’ approaches need to be assessed in tandem, future studies into IS omics should also consider pedigree and/or community based sampling to take account of the complex diversity of IS genetics. We conclude by presenting an example of such community genomics research from China in an extended pedigree sample, and the ways in which the intersection of genomics and global society can usefully inform our understanding of IS pathophysiology and potential preventive medicine interventions in the future.
Stroke and Public Health
A
Selected national statistics further highlight the growing problem that stroke is becoming for public health. In the USA, The National Center for Chronic Disease Prevention and Health Promotion reported that every year stroke is the cause of death of 130,000 Americans with 610,00 Americans suffering a new stroke (The National Center for Chronic Disease Prevention and Health Promotion, 2013). In the UK, The Stroke Association reports that there are approximately 152,000 strokes in the UK every year, or about one every 5 minutes, with 1.1 million stroke survivors living in the UK (Stroke Association, 2013). In Australia, a Deloitte Access Economics report on the economic impact of stroke on Australia reported that more than 1000 Australians sustain a stroke every week, of whom 40% die within 12 months (Deloitte Access Economics, 2013) (Table 1).
Multi-omics data-intensive life sciences offer new vistas for neurological and mental health disorders (Goldenberg et al., 2014; Longuespee et al., 2014; Podder and Latha, 2014). Among these, stroke has been considered a disease caused by lifestyle and dietary behaviors such as increased energy intake, fat intake, and alcohol consumption, and decreased physical activity and cigarette smoking. Constitutive factors such as genomics have recently gained increasing conceptual importance.
One striking example of this is highlighted in a World Health Organization (WHO)-funded project called ‘Incidence and trends of stroke and its subtypes in China’ (Jiang et al., 2006). In these studies researchers collected medical diagnoses, genetic information, and blood samples from 3015 ischemic stroke patients in Beijing, Shanghai, and Changsha, including sampling multiple individuals within 12 pedigrees of Han Chinese (Jiang et al., 2006). It was also observed that proportionally Chinese populations had relatively more hemorrhagic strokes and fewer ISs, but this was changing and the incidence of IS was growing and HS was reducing.
A striking example was the finding that the age-adjusted incidence of overall stroke in individuals over 55 years in these populations was generally higher than in Western populations and that incidence of IS rose 50% in Beijing during the study period (1991–2000). The researchers concluded that the main cause of this rapid increase of IS incidence was due to the economic boom in China during the study period of 1991–2000. They observed that Chinese populations were rapidly adopting Western lifestyle and dietary habits, increasing the incidence of obesity and hypercholesterolemia in China and thus increasing the incidence of IS.
The Genetics/Genomics of Ischemic Stroke
Another recognized risk for stroke is a genetic/genomics predisposition. Although the study and knowledge of the impact of environmental and modifiable risk factors are well advanced, the identification of genetic variants associated with genetic predisposition is still a work in progress (Markus, 2011). The major reason for the relative paucity of data on stroke predisposition is the complexity of stroke genetics itself. Stroke is defined as a focal neurological deficit resulting from disruption of the cerebral blood supply. There are two main types of common stroke, ischemic stroke (IS), which comprises ∼80% of cases, and hemorrhagic stroke (HS) that accounts for the remaining ∼20% of cases (Markus, 2011; Bevan and Markus, 2011). According to the TOAST (Trial of Org 10172 in Acute Stroke Treatment) system (Adams et al., 1993), IS has been classified into five subtypes: 1) large artery atherosclerosis, 2) cardioembolism, 3) small vessel occlusion, 4) stroke of other determined etiology, and 5) stroke of undetermined etiology.
Within these five subtypes there are some rare Mendelian stroke disorders and syndromes, for example, CADASIL and CARASIL (Markus, 2011) and MELAS (Pavlakis et al., 1985; Sproule et al., 2013). CADASIL is the most common genetic small vessel IS and is a dominantly inherited small artery disease caused by >190 known mutations of the NOTCH3 gene (Federico et al., 2012). CARASIL is a much rarer genetic form of ischemic, nonhypertensive, cerebral small vessel disease directly affecting the cerebral small blood vessels and caused by mutations in the HTRA1 gene encoding HtrA serine peptidase/protease (Fukutake, 2011). CARASIL is an early onset disease, usually occurring between the ages of 20–30 years, and most cases are of East Asian origin (Yamamoto et al., 2011). MELAS is an early onset syndrome in which IS is one symptom. One of the first IS-related genetic syndromes to be characterized is now known to be caused by mutations in at least 30 mitochondrial genes (Sproule et al., 2013). Further investigation into ischemic stroke and mitochondrial variation has led to several correlations between mtDNA haplogroups and IS (Cai et al., 2015), as well as common OXPHOS gene variations and IS (Anderson et al., 2013). These associations are still putative and require further investigation.
The vast majority of ischemic stroke cases represent a multi-factorial complex disease involving multiple genetic and environmental factors. While in the previously discussed Chinese study, environmental factors were overwhelmingly the prime suspect in increasing IS incidence, the differing patterns of stroke subtypes between Chinese and Western stroke incidence patterns observed could not rule out a role for genetic effects. In fact, it has been previously established that conventional environmental risk factors do not explain all IS risks (Sacco et al., 1989).
Evidence from studies of twins and family history suggests that genetic predisposition is important (Traylor et al., 2012). In twin studies in particular, while it is theoretically possible to estimates of heritability, and differentiate environmental and genetic risk, performing such studies in stroke have proven challenging (Bevan et al., 2012). In twin cohorts, the number of stroke cases is small, and in studies performed to date, subtyping is not available and thus no estimates of heritability for the specific stroke subtypes are available (Bevan et al., 2012).
Genome Wide Association Studies
The exploration of possible genetic factors in the development of complex multigeneic IS was first made feasible by the advent of genome wide association studies (GWAS). Causative genetic variations have been traditionally detected using classical Mendalian gene mapping and linkage studies. While very successful for rare genetic disorders, these studies are far less successful when applied to complex diseases. As the name implies, complex diseases, such as IS, are not caused by single mutations in a single causal gene, but are instead influenced by multiple subtle genetic factors. The hunt for these small effect causative genetic variants in complex disease started with GWAS-based investigations. This approach was first theoretically developed from the concept that comparing the allele frequencies at variants across the genome between thousands of cases and controls would be well-powered to detect common alleles of small effect (Risch and Merikangas, 1996).
GWAS-based projects genotype a large number of variants (in the hundreds of thousands or even millions) in a large number of cases and controls, usually in the thousands in order to minimize false positive findings. This approach does not depend on a prior hypothesis and novel associations can therefore be detected (Bevan et al., 2012). The ability to conduct such studies was dependant on the development of the microarray, initially developed by Affymatrix™. Their first commercial single nucleotide polymorphism (SNP) array was released in 1996 and targeted 1500 human SNPs. As of 2015, there are now currently two predominate vendors that provide technology for collection of GWAS data, Affymetrix and Illumina, which use two differing methodologies to assay for over 1 million variants per sample.
In an Affymetrix chip, thousands of copies of oligonuclotide probes for each SNP and copy number variations (CNV), usually 25mers, are directly synthesized onto a silicon chip. Fluorescently labeled target DNA is then hybridized to the probes, with successful hybridization (dependent on the target DNA oligo containing the SNP contained in the probe) resulting in a fluorescent signal. Ilumina, on the other hand, developed the BeadChip approach, in which the oligo probes are attached to beads (1000s of copies of the same probe on each bead). These beads are then deposited in wells on a glass slide, or BeadChip. Again hybridization of target oligo to the probe is detected as fluorescent signals. The latest Affymatrix Genotyping, the Genome-Wide Human SNP Array contains 1.8 million genetic markers, including more than 906,600 SNPs and more than 946,000 probes for the detection of copy number variation. Illumina BeadChip technology has developed to such an extent that the latest genotyping chip, the HumanOmni5-Quad, contains ∼4.3 million markers with the ability to add 500,000 more. This rapid development of high-throughput genotyping technologies has enabled accurate and reproducible genotyping in combination with the progressive drop in genotyping costs.
Furthermore, GWAS variants not directly assayed for can be generated through genotype imputation. Genotype imputation is a statistical methodology that works by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset (Marchini and Howie, 2008; 2010; Howie et al., 2009; 2011). In effect, it can be argued that imputation further increases the number of variants assayed to such a level that potentially all common variants involved in disease risk have been investigated.
With the rapid progress of such high-density SNP microarrays and the development of genotype imputation, GWAS have been successfully undertaken for many common human diseases, revealing multiple SNPs with strong associations (minor allele frequency >5%, p<1…10−8) (Cirulli and Goldstein, 2010). This rapid development has resulted in an exponential growth in association data. For example, The Catalog of Published GWAS Studies, now hosted at European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) (http://www.ebi.ac.uk/gwas), has, as of June 2015, listed more than 15,000 SNPs involved in more than 8,000 associations with complex traits extracted from more than 2000 GWAS publications.
GWAS and Ischemic Stroke
Although initially developing at a much slower pace than GWAS of other traits, IS GWAS projects have revealed a growing number of associated SNPs (Lupski et al., 2011; Bevan et al., 2012; Foo et al., 2012; Hacke and Grond-Ginsbach, 2012). While an initial major international GWAS meta-analysis of both the Ischemic Stroke Genetic Study (ISGS) and the Bio-Repository of DNA in Stroke (BRAINS) dataset resulted in no SNP of genome-wide significance associated with IS (Meschia et al., 2011), many other IS GWAS meta-analysis projects have been successful in revealing many significant IS-associated SNPs (Yamada et al., 2009; Meschia et al., 2011; Sun et al., 2011; Arregui et al., 2012; Bellenguez et al., 2012; Holliday et al., 2012).
The International Stroke Genetics Consortium GWAS has revealed four variants associated with IS in European patients who exhibit evidence of heterogeneity of effect across different TOAST stroke subtypes (Bellenguez et al., 2012), suggesting multiple distinct stroke subtypes and their related conditions rather than a single ‘common stroke’ phenotype. This hypothesis is demonstrated by studies showing SNP associations exclusive to specific ethnicities, such as the Japanese and Han Chinese (Yamada et al., 2009; Sun et al., 2011), as well as some SNP genetic associations with conditions such as coronary heart disease and atrial fibrillation that are also significantly associated with stroke (Markus, 2011; Lemmens et al., 2011; Arregui et al., 2012).
With the growth of the number of international GWAS projects, the next stage of the exploration of IS genetics is the combination of these studies through Meta-analysis. Consortia such as Meta-Stroke (Traylor et al., 2012) and SiGN (Meschia et al., 2013) use a rationale of combing a massive amount of data gleamed from all the major international GWAS projects and sib-pair studies via meta-analysis in an effort to further define significant IS genetic markers. The amount of data available to these consortia is significant. For example, SiGN contains 14,549 cases from 24 genetic research centers: 13 from the United States and 11 from Europe. With such a large bank of population-based data, these consortia can therefore theoretically establish subgroups of data for different stroke subtypes, considered crucial in uncovering IS stroke genetic factors (Table 2).
CE, cardioembolism; LVD, large vessel disease; SNP, single nucleotide polymorphism; SVD, small vessel disease; TOAST, Trial of Org 10172 in Acute Stroke Treatment.
Limits of GWAS
While GWAS has allowed researchers to make major advances in understanding the complex genetic architecture of IS, in turn these discoveries have shown that most IS heritability remains unidentified. In fact, GWAS was specifically designed to address what is referred to as the Common Variant, Common Disease (CVCD) hypothesis (Bevan and Markus, 2011). The ‘missing heritability’ is therefore most likely to involve rare or low frequency variants within pedigrees and individuals. The Rare Variant, Common Disease (RVCD) hypothesis (Cirulli and Goldstein, 2010; Foo et al., 2012), with multiple rare variants, sometimes specific to the individual, is combined to express a higher genetic risk for developing common diseases (Cirulli and Goldstein, 2010; Manry and Quintana-Murci, 2013). Family studies support a RVCD hypothesis, with different subtypes of stroke demonstrated within different families, suggesting alternative underlying rare variant genetic risk factors (Polychronopoulos et al., 2002; Jerrard-Dunne et al., 2003). It is doubtful whether many such risk factors would be detected via GWAS, which mainly detects more common SNPs (Markus, 2011).
Next Generation Sequencing Studies
The development of next generation sequencing (NGS) has enabled researchers to identify multiple causal variants, including rare variants, for disease that might otherwise be missed by SNP-chip technology (Kim et al., 2012). These target variants are either low frequency (typically defined as those with minor allele frequency (MAF) between 1% and 5%) or rare variants (those with MAF <1%). The most common use of NGS for detecting such low frequency and rare variants is whole exome sequencing, since the whole exome approach has been the most feasible both in economic cost and available bioinformatic resources.
The first target of exome sequencing was the identification of rare causative mutations that are responsible for Mendelian genetic disease. For example, new causative mutations for Kabuki disease (Ng et al., 2010), Joubert syndrome (Srour et al., 2012) and postaxial polydactyly type A (Kalsoom et al., 2013), to name just three, have been found via exome sequencing. Now whole exome sequencing is proving useful in pinpointing rare variations associated with complex disease such as Alzheimer's disease (Cruchaga et al., 2014), autism (Yu et al., 2013), and Crohn's disease (Ellinghaus et al., 2013).
As with GWAS, exome studies into IS are not as advanced as other complex diseases. A pilot exome study of previously surveyed GWAS samples failed to identify excess rare variation in any of the IS candidate exomes evaluated, but did confirm a common variant discovered in previous GWAS results (Cole et al., 2012). In this study, 10 individuals were selected for exome sequencing from the GEOS and SWISS GWAS cohorts. These samples were limited to males with an early age of onset with a family history of stroke in order to maximize the genetic contribution to stroke risk.
In the analysis of these exomes for IS-associated SNPs, two broad approaches were taken, genotype-specific and compound heterozygote analysis. The first approach involved top genetic variant hits for lancular stroke in various major studies from the literature on all IS stroke subtypes. These genetic variants were then assayed in the 10 exomes. The compound heterozygote approach involved screening for genes in which every sample had at least one rare variant. From the first analysis, one gene, which was previously identified in the GWAS study, CSN3, was identified as having a significant coding polymorphism and an excess of rare variants. From the second analyses, 48 genes identified having at least one rare variant.
While Cole et al. (2012) took an approach that was effectively a re-evaluation of GWAS samples with whole exome sequencing, a later study applied an exome only matched-case control approach to uncovering IS associated variants (Zhang et al., 2013). Zhang et al. completed an ambitious study involving three stages: 1) exome sequencing and candidate SNP imputation from of 100 cases and 100 matched controls from Shenzhen, China; 2) verification Sequenom™ genotyping of candidate SNPs in a further 500 cases and 500 controls from Shenzhen; and 3) replication of these genotyping results using Taqman™ in a further 1277 cases and 1277 controls from Beijing and Shenzhen. Despite the large sample size and multiple verification studies, only two candidate novel SNPs were found that were associated with an increased risk of IS for the Han Chinese population (see Table 2).
One of the biggest exome studies is that of the National Heart, Lung, and Blood Institute (NHLBI) of the United States entitled the NHLBI GO Exome Sequencing Project (ESP). In an effort to find variants contributing to heart, lung, and blood disorders, this project has found several significantly associated IS variants in the PON1 gene (Kim et al., 2014). This study involved detecting IS-associated variants in the exomes of 496 IS stroke patients from a wider exome sample of 4204 unrelated participants. In total, 7 SNPS in PON1 were found to be associated with IS stroke in participants of European and African Ancestry (Table 2). Within these 7 SNPS, 2 were only found amongst patients of European ancestry and one was exclusive to patients of African Ancestry.
All three studies discussed relied on samples were not related to each other relying on a case-control inference of association and were not IS-subtype specific, a crucial factor established by several GWAS investigations as necessary for assessing rare variant association and effect. Furthermore, as outlined by Cole et al., next generation sequencing studies have an inherent problem of producing a large number of false positives due to the nature of the read alignment and variation imputation process (Cole et al., 2012). This problem was highlighted in a major article that highlighted significant discrepancies in SNP and Indel calling between many of the currently available variant-calling pipelines when applied to the same set of Illumina sequence data with near-default software parameterizations (O'Rawe et al., 2013). It is also expected that many rare variants will have a very restricted geographic distribution, so that careful matching of case and control ancestries is likely to be extremely important due to the potential for false signals to be introduced by small differences in ancestry (Do et al., 2012).
Another problem is that exome studies are naturally limited to the coding regions of the genome and thus cannot be used to assess both non-coding regions and many structural variations. This could theoretically result in missing potential targets for IS genetic mapping. Recommendations for resolving issues in complex disease NGS strategies include international collaborations with thousands of sample in order to increase statistical power (much like current major GWAS projects) (Kilpinen and Barrett, 2013) and a renewed collection of exome and whole-genome sequencing of multi-generational families, to increase the overall accuracy of NGS studies (O'Rawe et al., 2013).
Therefore, the next logical step in exploring the genetic landscape of ischemic stroke is whole-genome sequencing. With whole-genome sequence, not only SNPs and Indels could be defined, but also larger genetic variant types such as CNVs and Structured Variants (SVs). In the past, compared to exome sequencing, whole-genome sequencing had a prohibitively large cost. However, the costs have been rapidly decreasing. For example, the introduction of Ilumina HiSeq X Ten promises $US 1000 whole-genome sequencing, a similar price to current whole-exome sequencing protocols (Illumina, 2014). The rapid development of NGS technology has resulted in international projects proposing to generate and store whole genome sequencing data from thousands of individuals, for example, the Genomics England 100,000 genomes project (Branco, 2013). The availability of such resources for IS research would allow researchers to better pinpoint multiple variants that have a small but significant impact on stroke pathology.
Bioinformatics and IS
The type and amount of next generation sequencing data has been increasing rapidly. To be able to store and analyze this increasing amount of data, extremely high-performance computing and intensive bioinformatics support must be available (Zhao et al., 2012). It is argued the development of bioinformatic infrastructure has not been as rapid as that of NGS and has created a bottleneck (Scholz et al., 2012; Schrijver et al., 2012). It can then be concluded that the full benefit of NGS will not be achieved until bioinformatics are able to maximally interpret and utilize these short-read sequences, including alignment, assembly, and annotation.
While sufficient computing power is becoming more available to researchers, the major bottleneck is still storage capacity. With the generation of 100s of exome and genomes for adequate IS genomic studies, there will be immense data storage requirements. The amount of raw sequence data for each sample is usually hundreds of Gbs, and, as well as for immediate analysis, needs to be stored for potential future analysis and interpretation when analytical algorithms improve. With such large amounts of data, transfer of data between the research and clinical centers is a problem. It has become routine for data to be shipped in portable hard drives with at least 2 Tb capacity or greater disk space. However, this makes ready access to the data problematic, and the cost of buying and shipping hard drives can add significantly to the cost of the sequencing project.
The current solution to compute and storage bottlenecks is the utilization of cloud computing services. Services such as such as DNAnexus, Illumina BaseSpace, and Seven Bridges Genomics allow scientists to sequence, analyze, and collaborate on data via Amazon Web Services (AWS) cloud infrastructure with constantly decreasing costs. Using such services, the researcher can sequence and filter raw read data, then align, annotate, and finally, visualize the annotation results. The advantages of this system advantage are the ability to align and annotate sequence read data in web browser interfaces, allowing researchers with no command line Linux experience to perform sophisticated genomic analysis. Another advantage is flexible data storage and exchange. Results from these pipeline services are available in spreadsheet or text file (Variant Call File) format or even pre-formatted for GENBANK and ENA submissions.
The use of private cloud services all raise problems with data privacy. With the advent of cloud computing infrastructure, data privacy of human subjects is increasingly difficult to control. This raises concerns in the public of genome data ownership that may inhibit consent by individuals to participate in genetic studies (Strom et al., 2012; Regola and Chawla, 2013).
Finally, to be able to collect and process this data and perform the analysis, adequate numbers of appropriately trained personnel are required, something that is much harder to address. The development of the previously mentioned integrated cloud services does allow analysis by researchers with only a rudimental knowledge on bioinformatic process. But with the ability for more researchers to develop bigger genomic datasets through rapidly improving NGS technologies expanding beyond genomics to transcriptomics, metagenomics, and proteogenomics, more bioinformaticians with the ability to develop bespoke bioinformatic solutions will become essential.
Community Genomics
An accelerator for Omics research on stroke
Evidence presented in this review show while the factors causing IS are largely environmental, there are enough data to suggest a role for heritability and genetics in the prevalence of IS. While not as advanced as in other complex diseases such as autism and schizophrenia, GWAS and NGS studies into genetic IS stroke biomarkers are beginning to reveal some possible candidate variants. With the push for bigger and bigger genomics studies theoretically to be able to detect a larger range and type of disease associated variants (Kilpinen and Barrett 2013), more and more bioinformatics resources are required, resources that have not kept up with the rapid improvement and volume of genomic sequencing. In order to address these challenges in the near future, alternative genomic study designs may be required.
In the realm of Mendelian genetic disorders, it has long been shown that different deleterious alleles causing the same specific genetic disorders have been shown to coexist and be expressed in families within highly endogamous communities (Bittles, 2012). The effect of such restricted gene flow is also important in the realm of complex disease genetics, which theoretically involve multiple rare variants with minor effects. The effect of rapidly increasing population migration and community-based endogamy are proving to be important factors in tracing the genetics of complex disease (Campbell et al., 2009). However, current studies of complex disease genetics have mainly concentrated on large international studies with combined multi-population big data sets in the range of thousands of samples. This has been somewhat successful for the detection of more common variants associated with various complex diseases, but still has not addressed the effect of different IS subtypes and population genetic structure required in the search for rarer variants.
There have been some previous population-genetic based approaches to genetic factors of complex disease, predominately featuring runs, or regions, of homozygosity (ROH) (Ku et al., 2011). An ROH is a continuous or uninterrupted stretch of a DNA sequence without heterozygosity in the diploid state that is in the presence of both copies of the homologous DNA segment. Typically, reliable ROH are >500 kb or even >1 mb in length. ROH mapping utilizes the same SNP arrays developed for GWAS analysis, though with the increase in NGS data, this may change.
Applying ROH detection in GWAS based studies has already led to the reporting of significant differences in ROH content between cases and controls for schizophrenia (Lencz et al., 2007) and late-onset Alzheimer's disease (Nalls et al., 2009) with the underlying idea of using this homozygosity association approach to uncover recessive variants contributing to complex phenotypes (Ku et al. 2011). In both the schizophrenia and Alzheimer's studies, SNPs in genes that are plausible biological candidates were found in or near ROH. Therefore, to take into account the effects of restricted gene flow and improve the detection rates of IS-associated variants, it is proposed that a global sampling and genotyping strategy combined with a subpopulation-based sampling and genotyping strategy, including ROH analysis, would be recommended. Implementation of such a strategy requires the use of population genetic approaches, and not just case control approaches, that take into account population genetic concepts such as population bottlenecks, founder effects, and endogamy to uncover both rare and common IS associated variants.
A possibility where such an approach could be taken is with a biobank created as part of the study by Jiang et al. (2006) in their exploration of IS stroke prevalence in China. As well as sporadic samples, the researchers also sampled from 12 pedigrees with evidence of IS heritability. Figure 1 shows one of these pedigrees that demonstrate available samples for several trios and sib-pairs available with multiple occurrences of ischemic stroke. This pedigree included several cousins, both male and female, who have suffered IS events. Eleven other pedigrees show similar patterns, all suggesting strong genetic predisposition to IS. More important, this pedigree has an affected sibling, theoretically allowing an affected sib approach. This affected sib-pair approach has already been developed by the “Siblings With Ischemic Stroke Study” (SWISS) (Meschia et al., 2002). Samples from the available biobank could be added to this data.

An example of community genomics. Pedigree of a Han Chinese family with strong genetic predisposition to ischemic stroke (IS). Solid symbols indicate individuals known to have suffered at least one ischemic event. Diagonal lines indicate deceased individuals.
Ability to process pedigree data could also enable the use of classical linkage techniques for the analysis of next generation sequencing in complex diseases such as IS. Such a technique was developed for Mendelian disease variant detection called linkdatagen (Smith et al., 2011). This combines classical linkage analysis (LOD score calculation) with next generation sequencing data to narrow down the field of candidate variants. This technique has been successful in the discovery of a 3 bp deletion in the gene NR5A1 that is causative for a newly described disorder of sexual development (Eggers et al., 2015). One major pitfall of a family based approach would be, unlike early onset Mendelian diseases, the limited availability of parent–child samples due to the late onset of IS. Another is the effect of familial factors on stroke incidence within a family, for example, shared environment and shared diet that would overwhelm the detectable signal from any heritable factors.
Conclusion
We recommend that future studies into IS genetics should focus on community-based sampling as well as case-control based sporadic patient samples in order to define and detect both ‘common’ and ‘rare’ ischemic stroke associated variants. More such targeted experimental design and sampling approach also allows more efficient utilization of bioinformatics resources in processing data from GWAS and the rapidly improving NGS based methodologies. IS Omics research should consider community genomics, and the ways in which the intersection of genomics and global society can usefully inform our understanding of IS pathophysiology and potential preventive medicine interventions in the future.
Footnotes
Acknowledgments
This work was supported by grants from the National “12th Five-Year” Plan for Science and Technology Support, China (2012BAI37B03), National Natural Science Foundation of China (81001281, 81273170, 31070727), and the Australia-China Sciences Research Foundation (ACSRF06444). The authors would like to acknowledge the reviewers and editors of OMICS for their helpful advice and input. We would especially like to thank Editor-in-Chief Dr. Vural Özdemir for his helpful advice and suggestions in the completion of this review.
Author Disclosure Statement
The authors declare that there are no conflicting interests.
