Abstract
Background:
Autism spectrum disorder (ASD) is a clinically and genetically heterogeneous group of pervasive neurodevelopmental disorders with a strong hereditary component. Although, genome-wide linkage scans (GWLS) and association studies (GWAS) have previously identified hundreds of ASD risk gene loci, the results remain inconclusive.
Method:
We performed a heterogeneity-based genome search meta-analysis (HEGESMA) of 15 genome scans of autism and ASD.
Results:
For strictly defined autism, data were analyzed across six separate genome scans. Region 7q22-q34 reached statistical significance in both weighted and unweighted analyses, with evidence of significantly low between-scan heterogeneity. For ASDs (data from 12 separate scans), chromosomal regions 5p15.33-5p15.1 and 15q22.32-15q26.1 reached significance in both weighted and unweighted analyses but did not reach significance for either low or high heterogeneity. Region 1q23.2-1q31.1 was significant in unweighted analyses with low between-scan heterogeneity. Finally, region 8p21.1-8q13.2 reached significant linkage peak in all our meta-analyses. When we combined all available genome scans (15), the same results were produced.
Conclusions:
This meta-analysis suggests that these regions should be further investigated for autism susceptibility genes, with the caveat that autism spectrum disorders have different linkage signals across genome scans, possibly because of the high genetic heterogeneity of the disease.
Introduction
Autism spectrum disorder (ASD) is a clinically and genetically heterogeneous group of pervasive neurodevelopmental disorders characterized by a significant impairment in social communication and interaction combined with repetitive, restricted, and stereotyped interests and behaviors. Autism typically presents in the early developmental period, before the age of 3 years (American Psychiatric Association, 2013). Epidemiological studies have estimated that more than 1% of the population could be diagnosed with ASD. ASD affects 1-2% of children in the United States and several studies have indicated the importance of genetic factors with etiological heterogeneity in ASD development (Tordjman et al., 2018).
There is significant evidence to suggest that ASDs have a strong hereditary component. The etiology of ASD is far from completely understood, but early twin and family studies strongly support a genetic predisposition to the disease. By the late 1980s, twin studies comparing concordance for autism in twins showed that concordance in monozygotic twins was 70-90% versus 0-25% among dizygotic twins (Folstein and Rutter, 1977; Steffenburg et al., 1989; Bailey et al., 1995).
There are three approaches to identify genetic hotspots or chromosomal regions that are likely to contain relevant genes: (1) cytogenetic studies, (2) whole-genome scans, and (3) evaluation of a priori selected candidate genes known to affect brain development or possibly involved in the pathogenesis of autism (Tordjman et al., 2018).
Genome-wide linkage scan (GWLS) and association studies (GWAS) have identified hundreds of ASD risk gene loci in all human chromosomes. A large number of genome scans (6-9) have been published and many of them have been updated, expanded, or elaborated in two-stage designs in an effort to provide more information for specific chromosomal regions or specific endophenotypes of ASDs. Such genome scans have produced inconclusive results, as the linkage signals tended to be rather low and the individual genome scans identified linkages in different chromosomal regions.
Early efforts were made by Badner and Gershon (2002) to synthesize data from four genome scans (Bailey et al., 1998; Barrett et al., 1999; Philippe et al., 1999; Risch et al., 1999) using a meta-analytic methodology (Badner and Gershon, 2002). This early meta-analysis combined the reported p-values across these four scans and demonstrated evidence for a susceptibility locus in the 7q area.
A new methodology, termed heterogeneity-based genome search (genome scan) meta-analysis, or heterogeneity-based genome search meta-analysis (HEGESMA), was applied to the available GWLS on autism spectrum disorders by Zintzaras and Trikalinos in 2006 (Trikalinos et al., 2006). Region 7q22-q32 reached genome-wide significance for autism and autism spectrum disorders suggesting that the chromosomal regions 17p11.2-q12 and 10p12-q11.1 were significant (Trikalinos et al., 2006). HEGESMA has become the best established method for meta-analysis of genome scans and has subsequently been applied to genome scans of several complex diseases (Zintzaras and Ioannidis, 2005a, 2005b, 2008; Trikalinos et al., 2006; Zintzaras and Kitsios, 2006; Zintzaras et al., 2006, 2007; Bouzigon et al., 2010; Rao et al., 2012; Tziastoudi et al., 2019).
Thus, to provide more conclusive evidence on regions linked to ASDs, we analyzed the results of all available genome scans and evaluated the consistency and heterogeneity of the results of these scans. The current study aimed to replicate and expand the results of the first meta-analysis of GWLS data for ASDs using HEGESMA method, by combining the results of all available GWLS to date.
Materials and Methods
Bibliographic search and eligible whole-genome scans
To identify all the available genome scans for inclusion, we searched the PubMed database using the search terms (“genome search” OR “genome scan” OR “genome screen” OR “genomic search” OR “genomic scan” OR “genomic screen” OR “LOD score” OR “NPL score” OR “susceptibility loci” OR “linkage” OR “genome-wide” OR “genome-wide linkage analysis”) AND (“autism” OR “Asperger syndrome” OR “autism-spectrum disorder”) for articles published up to September 2020. Additionally, the meta-analyses, reviews, and references of the eligible articles were also screened. Any unpublished data were requested from the author.
The included studies fulfilled the following criteria: (1) studies involving probands with idiopathic, nonsyndromic ASD, as well as probands with known genetic, chromosomal, metabolic, neurological, or psychiatric disorder were excluded; (2) ASD had to be defined by the use of accepted diagnostic criteria according to the Diagnostic and Statistical Manual of Mental Disorders, 3rd edition (DSM-III), Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV), Diagnostic and Statistical Manual of Mental Disorders, 5th edition ( DSM-5), the International Classification of Diseases-10 (ICD-10), the Autism Diagnostic Interview, the Autism Diagnostic Interview-Revised (ADI-R), or the Autism Diagnostic Observation Schedule (ADOS); (3) studies performed whole-genome scans either for strictly defined autism or for autism spectrum disorders and linkage score data for all chromosomes were either available or extractable from published graphs; (4) studies used a minimum of 100 microsatellite markers or SNP markers, regardless of the statistical analytical method or software employed; (5) they included human subjects; and (6) they were published in English.
We excluded studies or subsets of subjects in which data were derived only for specific chromosomes or chromosomal regions. Moreover, we excluded studies in which the available data were derived following stratification of the patients according to qualitative or quantitative autistic traits and there were no available linkage data for the whole ASD sample. In studies with overlapping cases, the largest study or the most recent study with extractable data, was included.
The eligibility of the articles was assessed independently by two investigators (I.M. and E.Z.), the results were compared, and any disagreements were resolved by reaching consensus.
Data extraction
For each eligible study, the following information was extracted: name of first author, year of publication, country of recruitment or ethnicity of study population, type of ASD and the diagnostic criteria used for ASD, number of families, number of markers and intermarker interval, linkage statistic, type of statistical analysis and software of linkage analysis, and finally, the chromosomal regions corresponding to the maximum linkage score. Whenever multiple statistical analyses were available in a study, we generally preferred the results obtained by multipoint MLS.
Genome scan data, if not available, were derived from published figures after digitization using Engauge Digitizer (version 2.12, Mark Mitchell, 2002). Regarding the included scans, the corresponding authors were asked to provide whole-genome results when results were presented only for regions showing some evidence of linkage.
Genome scan meta-analysis and heterogeneity testing
Genome search (genome scan) meta-analysis (GSMA) was used, as this technique combines linkage data from studies with different marker sets and analyzed by different methods.
Briefly, the genome was divided into 120 bins of ∼30 cM (Zintzaras and Ioannidis, 2005a, 2005b). The bins were identified using the chromosome number and number of bins in that chromosome; for example, “bin 7.5” is the fifth bin of the seventh chromosome. For each genome scan, the most significant result was recorded for each bin and these bins were then ranked according to their significance of results, with the highest rank given to the highest value; that is, 120th bin is the highest rank. Equal test statistics for several bins within a study were assigned as tied ranks and negative linkage scores were ranked as 0. The ranks of each bin were summed across the studies and the summed rank (SR) formed the basis of the test statistic. The significance of the average rank of each bin was assessed empirically against the distribution of the average ranks. When a bin had a high SR, this was considered as evidence of linkage.
Heterogeneity between studies for each bin was assessed using the Q statistic. Heterogeneity was defined by the sum of the squared deviations of each study's bin rank from the mean of the ranks. In GSMA, low between-study heterogeneity indicates the consistency of the study results for the same bin. Thus, the presence of low heterogeneity for a specific bin with high ranks could be interpreted as further supporting evidence for the significance of this bin. In addition to the Q statistic, two other heterogeneity metrics (B and Ha) were proposed, but inferences with these metrics were similar to the Q metric.
The statistical significance of the average rank and the Q metric was assessed using the Monte Carlo method (Zintzaras and Ioannidis, 2005a, 2005b). In this method, in a run, the ranks of each study were randomly permuted and the simulated average rank and Q metric were calculated. The procedure was repeated for 50,000 runs and a null distribution for the average rank, and for the Q metric, was constructed. The significance level (p-rank) of the average rank of bins against the null distribution of average ranks was the percentage of simulated average ranks greater than or equal to the observed (right-sided p-value). The statistical significance level (PQ) for low heterogeneity was the percentage of simulated metrics less than or equal to the observed (left-sided p-value) (Zintzaras and Ioannidis, 2005a, 2005b).
Moreover, the Monte Carlo test was performed, separately generating null distributions for each bin, considering only the simulated distributions of the Q metric (Q-adjusted) for bins with the neighboring simulated average rank (±2) as the bin being considered each time. This analysis considers that the Q statistic may be influenced by the average rank (Zintzaras and Ioannidis, 2005a).
An ordered rank (p-order) statistic was also generated, which provides a genome-wide interpretation of significance by comparing the n-th highest SR with the distribution of the n-th highest SR obtained through simulation (Levinson et al., 2003).
We performed unweighted and weighted analyses. In the weighted analysis, the ranks of the bins in each study were weighted by
Where i denotes the study, and the weights are scaled to sum up to one.
Statistical significance was tested at p ≤ 0.05. Thus, genome-wide significance was tested at p = 0.00042, after accounting for 120 comparisons.
The significance of the average ranks and the significance of heterogeneity were evaluated using the HEGESMA software https://biomath.med.uth.gr/ (Zintzaras and Ioannidis, 2005a, 2005b).
Results
Study characteristics
A total of 812 records were identified by literature search of PubMed and additional records from other sources until September 2020.
Of these, 34 were genome-wide scans for autism or ASDs. Fifteen scans were included in the meta-analysis (Philippe et al., 1999; Risch et al., 1999; CLSA, 2001; IMGSAC, 2001; Auranen et al., 2002; Yonan et al., 2003; Ylisaukko-Oja et al., 2004; Cantor et al., 2005; Schellenberg et al., 2006; Ma et al., 2007; Liu et al., 2008; Kilpinen et al., 2009; Allen-Brady et al., 2010; Werling et al., 2014; Woodbury-Smith et al., 2018), along with 18 articles that were either subsets of larger studies or overlapped partially with other studies (Bailey et al., 1998; Bass et al., 1998; Buxbaum et al., 2001, 2004; Liu et al., 2001; Stone et al., 2004; Alarcón et al., 2005; Bartlett et al., 2005; McCauley et al., 2005; Molloy et al., 2005; Philippi et al., 2005; Spence et al., 2006; Szatmari et al., 2007; Wassink et al., 2008; Allen-Brady et al., 2009; Weiss et al., 2009; Fradin et al., 2010; Piven et al., 2013).
The study by Shao et al. (2002) did not have linkage scores for all chromosomes even though it was a whole-genome scan. We requested the data set from the author, but they were unavailable. A flowchart of the studies retrieved and excluded for meta-analysis, with justification for the reasons for exclusion are presented in Figure 1.

Flowchart of retrieved studies for meta-analysis and studies excluded, with justification of reasons.
The characteristics of the included studies are listed in Table 1. We included six studies in the meta-analysis of strictly defined autism and 12 studies in the meta-analysis of ASDs. Finally, we combined all studies with either strictly defined autism or ASD and we performed meta-analysis of 15 studies (population of 3 studies of the strictly defined autism analysis were fully included in ASD analysis), according to DSM-5. All studies used similar, but not identical, criteria for diagnosis of autism or ASDs employing different combinations of the DSM-IV, DSM-5, the ICD-10 criteria, the ADI-R, and ADOS. The majority of genome scans used the logarithm of odds (LOD) score as the metric of linkage statistics. The chromosome regions with suggestive linkages identified from each individual genome scan are shown in Table 1.
Characteristcs of Genome Scans Included in the Meta-Analysis
The first number refers to markers used in the whole total sample and the second number to those used in the sample recruited in the first stage of a two-stage scan.
ADI-R, Autism Diagnostic Interview-Revised; ADOS, Autism Diagnostic Observation Schedule; AGRE, autism genetic research exchange; ASD, autism spectrum disorder; DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, 4th edition; DSM-5, Diagnostic and Statistical Manual of Mental Disorders, 5th edition; ICD-10, International Classification of Diseases-10; LOD, logarithm of odds; MLS, maximal LOD score; PPL, posterior probability of linkage statistic.
GSMA and HEGESMA—strictly defined autism
For autism, six genome scans were included in the meta-analysis. Seven different bins (3.6, 5.1, 7.4, 7.5, 7.6, 8.3, and 13.2) were found to be significant (p-rank ≤0.05) by either weighted or unweighted analyses; of these, four bins (5.1, 7.5, 7.6, and 8.3) were found to be statistically significant (p-rank ≤0.05) in both. Regarding the ordered analyses, no bin was significant (all bins had a p-order >0.05) (Table 2).
Autism and Autism Spectrum Disorders Genome Scan Meta-Analysis
Results in unweighted and weighted (in parenthesis) analyses. Significant results in bold.
In low heterogeneity testing, from those bins with a significantly high average rank, only chromosomal bin 7.5 was statistically significant with all metrics of heterogeneity (left-sided PQ/Ha/B <0.05) in the unweighted analysis only. However, this was no longer formally significant when the average rank of the bin was considered. Thus, bin 7.5 showed evidence of linkage to autism in terms of both high average rank and the significantly low heterogeneity between genome scans (Table 2).
Two other bins, 8.6 and 9.4, were statistically significant with all metrics of heterogeneity (left-sided PQ/Ha/B <0.05) in both unweighted and weighted analyses. Moreover, bins 6.5 and 9.3 were statistically significant with all metrics of heterogeneity (left-sided PQ/Ha/B <0.05) in weighted analysis only. However, these bins were not significant regarding p-rank (data not shown).
GSMA and HEGESMA—autism spectrum disorders
Twelve studies investigating ASDs were included in the meta-analysis. Here, 14 bins (1.7, 4.2, 4.5, 5.1, 6.6, 8.3, 10.2, 11.3, 11.4, 15.2, 15.3, 15.4, 20.1, and 21.1) reached significance regarding rank (p-rank ≤0.05) by either weighted or unweighted analyses. Of these, only two bins (5.1 and 15.3) were significant for both weighted and unweighted analyses. Regarding ordered analyses, only bin 11.3 was significant (p-order ≤0.05) in unweighted analysis (Table 2).
From these bins, only bin 1.7 was statistically significant under low heterogeneity testing, with all metrics of heterogeneity (left-sided PQ/Ha/B <0.05) in both weighted and unweighted analyses, and even when the average rank of the bin was taken into account. None of the other bins reached significance for either low or high heterogeneity (Table 2).
Regarding low heterogeneity testing alone, seven bins (1.2, 1.8, 10.6, 11.5, 13.4, 16.4, and 19.2) were statistically significant with more than one metric of heterogeneity (left-sided PQ/Ha/B <0.05) in the unweighted analysis. Additionally, 11 bins (1.6, 5.4, 5.6, 8.4, 9.5, 10.6, 13.4, 16.4, 17.4, 19.4, and 20.2) reached significance with more than one metric of heterogeneity (left-sided PQ/Ha/B <0.05) in the weighted analysis. Of these, three bins (10.6, 13.4, and 16.4) were common for both the unweighted and weighted analyses. None of these bins, however, was significant regarding p-rank (data not shown).
GSMA and HEGESMA—all studies
We combined all studies with either autism only or ASDs and 15 studies were included in the meta-analysis. We conclude that 11 bins (1.7, 4.2, 4.5, 5.1, 6.6, 7.5, 8.3, 11.3, 15.2, 15.3, and 20.1) reached significance regarding p-rank (p-rank ≤0.05) by either weighted or unweighted analyses. Of these, only two bins (5.1 and 15.3) were significant for both weighted and unweighted analyses. Regarding ordered analyses, only bin 1.7 was significant (p-order ≤0.05) in unweighted analysis (Table 2).
From these bins, only bin 1.7 was statistically significant in low heterogeneity testing, with all metrics of heterogeneity (left-sided PQ/Ha/B <0.05) in the unweighted analysis and this significance remained even when the average rank of the bin was taken into account. None of the other bins reached significance for low heterogeneity (Table 2).
Regarding low heterogeneity testing only, seven bins (1.2, 1.8, 11.5, 12.2, 16.4, 19.3, and 19.4) were statistically significant with more than one metric of heterogeneity (left-sided PQ/Ha/B <0.05) in the unweighted analysis. Additionally, 12 bins (1.6, 1.7, 3.5, 5.4, 5.6, 8.4, 10.6, 11.5, 16.4, 17.4, 19.4, and 20.2) reached significance with more than one metric of heterogeneity (left-sided PQ/Ha/B <0.05) in the weighted analysis. Bins 16.4 and 19.4 showed statistical significance with more than one metric of heterogeneity (left-sided PQ/Ha/B <0.05) in both the unweighted and weighted analyses. From these bins, only bin 1.7 was significant regarding p-rank in the unweighted analysis.
It is worth noting that 10 out of the 11 bins that were significant in the meta-analyses of all studies were also significant in the meta-analyses that included only ASD studies. Chromosomal bins 5.1 and 8.3 remained significant across all our meta-analyses.
Discussion
In this meta-analysis, we analyzed data from six available genome scans of strictly defined autism and 12 genome scans of ASDs.
We found that a chromosomal region in the long arm of chromosome 7 (region q22.3-q34) was significant for strictly defined autism for both weighted and unweighted analyses, with significantly low heterogeneity between genome scans. This result was consistent with the results of Trikalinos et al. (2006) where the same bin reached genome-wide significance for strict autism definition. In addition, a prior meta-analysis identified the long arm of chromosome 7 as a susceptibility locus for autism (Badner and Gershon, 2002).
As mentioned in the results of our analyses, the flanking chromosomal region 7q11.23-7q22.3 (bin 7.4) and 7q34-7q36.3 (bin 7.6) also reached significance. We hypothesize that flanking bins may have reached significance because linkage peaks for complex diseases may extend across the borders of two bins, which may reflect the carry-over effect from the neighboring bin. Bin 7.5 remained significant when we included all the available genome scans on either strict autism or ASDs in the meta-analysis.
The RELN gene is one of the best-studied ASD candidate genes, which is mapped on region 7q22 that reached statistical significance in our meta-analysis. This gene encodes an extracellular serine protease that regulates neuronal migration during central nervous system development (Tissir and Goffinet, 2003). Common variants in the RELN gene have been investigated in various gene association studies and meta-analyses, with both positive and negative results (He et al., 2011; Wang et al., 2018; Hernández-García et al., 2020).
A number of studies have focused on the genetic association of the MET gene with autism, which is located at 7q31.2. MET encodes a hepatocyte growth factor receptor with tyrosine kinase activity, involved in pathways affecting the development of the cerebral cortex and cerebellum in manners relevant to patients with autism. Multiple studies have shown positive associations between MET gene and autism in Caucasian, Japanese, and Italian populations, as well as in AGRE family cohorts (Campbell et al., 2006, 2008; Sousa et al., 2009; Zhou et al., 2011).
The FOXP2 gene encodes a member of the forkhead/winged-helix (FOX) family of transcription factors required for the proper development of speech and language regions of the brain during embryogenesis (Lai et al., 2001). This region is located on 7q21-31.3; many studies have found an association between rare variants of the FOXP2 gene and autism. For example, positive associations have been found in the Chinese Han population, and mutations have been identified in the Japanese population. However, several other studies observed no significant genetic association between the FOXP2 gene and autism (Wassink et al., 2002; Gauthier et al., 2003; Gong et al., 2004; Marui et al., 2005).
Similar to these genes, EN2 (homeodomain transcription factor ENGRAILED 2), located in 7q36.3 (bin 7.6), and CNTNAP2 located in 7q35-36, are two other candidate genes for ASDs that have been studied for common variants (Wang et al., 2008; Yang et al., 2008; Lia et al., 2010).
When we analyzed genome scans on autism spectrum disorders, 14 bins (p-rank ≤ 0.05) showed a high average rank by either weighted or unweighted analyses, but only two (5.1 and 15.3) were significant for both. A further 2 bins close to 15.3 (15.2 and 15.4) were found to be significant on weighted and unweighted analyses, respectively.
We did not replicate the finding of the significant linkage of the chromosomal region 17p11.2-q12 (bin 17.2) of the study of Trikalinos et al., (2006), as our work included seven studies in addition to those of this prior meta-analysis, which included five genome scans. In our study, this region did not reach high linkage score in any of the individual new genome scans. The majority of the new genome scans had more weight due to the increased number of markers and patients compared with the previous genome scans, and therefore contributed more to the final result. Bins 5.1 and 15.3 remained significant when we included all the available genome scans either on strict autism or on ASDs in the meta-analysis.
Bin 5.1 (region 5p15.33-5p15.1) reached statistical significance in all three analyses, even when only studies with strictly defined autism were meta-analyzed. This is not a novel finding, as this chromosomal region reached significance in two previous individual genome scans that we included in our meta-analyses (Philippe et al., 1999; Yonan et al., 2003).
Bin 5.1 (region 5p15.33-5p15.1) contains several genes encoding cell-adhesion molecules, such as CDH9 and CDH10. Genetic associations have been observed between the CDH9 and autism in two large cohorts (AGRE and ACC) of European ancestry and were replicated in two other cohorts (CAP and CART) (Wang et al., 2009). Another gene related to ASDs, SLC6A3, is located at 5p15.33. Functional characterization of a de novo missense variant in the SLC6A3 gene, originally identified in a simplex ASD case as part of a whole-exome sequencing study in 175 ASD parent-child trios (Devlin et al., 2012) showed that this variant (T356M) resulted in anomalous dopamine transporter function.
A missense variant in the same gene (p.Ala 559Val), previously identified in individuals with Attention-deficit/hyperactivity disorder (Mazei-Robison et al., 2005) and bipolar disorder (Grünhage et al., 2000), was recently identified in two unrelated male ASD probands and has been shown to alter dopamine function and trafficking (Bowton et al., 2014). The genome-wide significant signal in a previous GWAS on the Autism Genetic Resource Exchange and National Institute for Mental Health (AGRE-NIMH) multiplex family sample (Weiss et al., 2009) included an intergenic SNP ∼80 kb upstream of the SEMA5A gene. SEMA5A gene is located at 5p15.31 and the encoded protein of this gene is a member of the semaphorin family of membrane proteins that play roles in axonal guidance during neural development.
In a novel GWAS follow-up approach, Cheng et al. (2013) mapped the expression regulatory pathway for this gene (Cheng et al., 2013). This study provided additional information regarding the regulatory network of SEMA5A, along with support from data on genetic variations important in autism risk. In fact, these data suggest that SEMA5A is a common downstream effector for all the genes in this network and that autism copy number variants (CNVs) in this network act through modulation of SEMA5A expression.
Thus, based on our results we conclude that region 5p15.33-5p15.1 would be a region of interest for additional genetic studies in the field of autism.
The chromosomal region 15q22.32-15q26.1 (bin 15.3) was also significant in our meta-analysis for ASDs, as were the neighboring regions 15q14-15q22.32 (bin 15.2) and 15q26.1-15q26.3 (bin 15.4), which were both statistically significant. These regions reached significant linkage scores in many of the studies included in the meta-analysis (Table 1). These regions are located in the region of Prader-Willi and Angelman syndrome (15q11-13), which are related to ASDs. In addition, GABRB3, which encodes a protein subunit of the GABA-A receptor essential for neurotransmission, is dysregulated in Rett syndrome, Angelman syndrome, and autism and its association with autism has been found in multiple cohorts (Buxbaum et al., 2002; Warrier et al., 2013).
Another region with significantly high average rank and low heterogeneity in meta-analysis of the ASD genome scans is the region 1q23.2-1q31.1 (bin 1.7), which matches with the findings of some of the included individual genome scans (Ylisaukko-Oja et al., 2004; Ma et al., 2007; Kilpinen et al., 2009). Region 1q23-24 contains several plausible candidate genes of autism susceptibility, including aldehyde dehydrogenase 9 family, member A1 (ALDH9A1), regulator of G-protein signaling 4 and 5 (RGS4 and RGS5) (Bartlett et al., 2005) and an interesting candidate gene for schizophrenia, the carboxyl-terminal PDZ ligand of neuronal nitric oxide synthase (CAPON) (Brzustowicz et al., 2004).
The chromosomal region 8p21.1-8q13.2 (bin 8.3) reached significant linkage peak in all our meta-analyses. This region was significant only in the linkage study of Werling et al. (2014). The peak at 8p21.2 spans several candidate genes, including STC1, which encodes a glycoprotein regulated by calcium that may act to protect neurons from ischemia and hypoxia (Zhang, 2000), along with neurofilament genes NEFM and NEFL, whose products likely function in the transport of cargo to neuronal projections (Brownlees et al., 2002). This region may be a field for subsequent sequencing or association studies in the autism population.
The novel findings of this study constitute the chromosomal regions that were identified as significant by the meta-analysis, but not by the individual genome scans, such as bin 3.6 (3q21.2-3q25.32) for strictly defined autism, and bins 10.2 (10p14-10p11.23) and 21.1 (21p11.2-21q22.11) for ASDs.
SLC9A9 is located in 3q4, and rare variants in the SLC9A9 gene have been identified in autism in the HMCA cohort (Morrow et al., 2008). CNVs involving the gene UPF2, located in region 10p14, were statistically enriched in a cohort of 57,356 patients with neurodevelopmental disorders compared with a cohort of 20,474 controls (deletions, p = 0.034805; duplications, p = 0.018903). A de novo 2.04 Mb deletion encompassing the UPF2 gene was detected in a female patient from the Developmental Gene Discovery Project (DGDP151) with ASD, intellectual disability, trichotillomania, aggressive behavior, and neuroregression (Nguyen et al., 2013). Finally, the 21p11.2-21q22.11 region contains the SOD1 gene; two rare SNPs located in the noncoding, potentially regulatory regions surrounding the SOD1 gene were found to be associated with ASD in a Slovenian case-cohort analysis (Kovač et al., 2014).
It is worth noting that chromosomal regions found to be nonsignificant in this meta-analysis, but significant in individual genome scans should not be automatically discarded as irrelevant to autism spectrum disorders. Meta-analysis of genome scans is only one, imperfect method, to identify a linkage with a chromosomal region. GSMA and HEGESMA have both identified very promising regions by synthesizing the results of all available genome scans, in an effort to prioritize research targets. In case of genuine heterogeneity among genome scans (e.g., if linkage is present only in a specific population), a pooled, nonstratified analysis might not be able to provide linkage signals found in studies that are more limited.
Despite the strengths of this study, several potential limitations should be considered. First, the populations involved in the meta-analysis were of mixed origin, and different studies showed differences in the study design and conduct (number of families, sib pairs, and extended families). Another limitation of the analysis was the use of variable map density between studies. In addition, rare de novo or familial CNVs that confer risk for ASDs could be a source of heterogeneity that decreases sensitivity in linkage analyses.
Therefore, to improve sensitivity, the sequencing of functional genomic features in linked regions will be necessary to identify rare variants and evaluate their role in familial ASD risk. Moreover, larger family-based cohorts stratified for phenotypic traits, such as language development, developmental milestones or developmental regression, restricted repetitive and stereotyped behaviors, and denser linkage mapping in all samples, will be needed to improve power and decrease any heterogeneity.
Although HEGESMA is based on bins with a width of 30 cM, an analysis with a bin width of less than 30 cM could provide a more descriptive information. HEGESMA considers only autosomes, thus, it cannot be used to draw conclusions regarding possible linkages on the X and Y chromosomes. The statistical significance values that we report are nominal and should be interpreted with caution given the number of bins tested, even if these bin results are correlated. Nevertheless, the fact that HEGESMA calculates the heterogeneity between the included genome scans makes the results more reliable.
Moreover, even if the meta-analysis provides evidence of linkage for cytogenetic locations that showed statistical significance in more than one analysis, we should interpret the results with caution, as it is likely that some important regions would been missed, while others may result from type I errors. The regions that reached significance in this meta-analysis were candidate locations for follow-up studies.
Conclusions
We conclude that the use of linkage analyses in multiplex family cohorts has complementary utility to GWAS in the investigation of the familial, inherited contribution to ASD risk. Further work is needed to determine which gene(s) or genetic features within linked regions harbor the variants responsible for increasing familial genetic risk for ASD. Allowing for these limitations, autism susceptibility loci exist in the chromosomal regions that we have highlighted and these regions require further investigation.
