Abstract
Background:
C9orf72 repeat expansion (C9exp) is the most common genetic cause underlying frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis (ALS). However, detection of the C9exp requires elaborative methods.
Objective:
Identification of C9exp carriers from genotyped cohorts could be facilitated by using single nucleotide polymorphisms (SNPs) as markers for the C9exp.
Methods:
We elucidated the potential of the previously described Finnish risk haplotype, defined by the SNP rs3849942, to identify potential C9exp carriers among 218,792 Finns using the FinnGen database. The haplotype approach was first tested in an idiopathic normal pressure hydrocephalus (iNPH) patient cohort (European Alzheimer’s Disease DNA BioBank) containing C9exp carriers by comparing intermediate (15–30) and full-length (> 60 repeats) C9exp carriers (n = 41) to C9exp negative patients (< 15 repeats, n = 801).
Results:
In this analysis, rs3849942 was associated with carriership of C9exp (OR 8.44, p < 2×10–15), while the strongest association was found with rs139185008 (OR 39.4, p < 5×10–18). Unbiased analysis of rs139185008 in FinnGen showed the strongest association with FTLD (OR 4.38, 3×10–15) and motor neuron disease ALS (OR 5.19, 3×10–21). rs139185008 was the top SNP in all diseases (iNPH, FTLD, ALS), and further showed a strong association with ALS in the UK Biobank (p = 9.0×10–8).
Conclusion:
Our findings suggest that rs139185008 is a useful marker to identify potential C9exp carriers in the genotyped cohorts and biobanks originating from Finland.
Keywords
INTRODUCTION
Frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis (ALS) are neurodegenerative disorders sharing genetic and neuropathological similarities [1]. C9orf72 hexanucleotide repeat expansion (C9exp), the most common genetic cause of FTLD and ALS [2, 3], is exceptionally prevalent in Finnish FTLD and ALS patients [4, 5]. Previous studies have suggested that more than 30 units of the C9exp are pathogenic [3, 6]. Recently, the C9exp was shown to be an important genetic etiology for idiopathic normal pressure hydrocephalus (iNPH) [7], which is the most common form of hydrocephalus and characterized by progressive gait impairment, cognitive decline, and loss of bladder control [8]. Because it is not possible to sequence the expanded region using whole genome-sequencing, the presence and estimated length of the C9exp can only be determined using repeat-primed PCR, Southern blotting, or long-read sequencing [9, 10]. The discovery of specific single nucleotide polymorphisms (SNPs) and groups of SNPs (haplotypes) associating with the C9exp would enable identification of potential C9exp carriers from large genotyped cohorts from which the C9exp cannot be detected using current methods. Previously, a Finnish risk haplotype of 42 SNPs was reported to associate with ALS in Finland [11]. Moreover, another risk haplotype of 20 SNPs has been shown to associate with FTLD, ALS, and the C9exp in other European and U.S. cohorts [12]. Here, we aimed to identify a SNP that could be used as a genetic marker to identify C9exp carriers in Finnish cohorts. Our findings showed that the variant rs139185008 distinguishes C9exp carriers from non-carriers in the European Alzheimer’s Disease DNA BioBank (EADB) and associates with the clinical diagnoses of FTLD and ALS in the large population-based FinnGen database and UK Biobank. This suggests that rs139185008 might be a powerful genetic marker for the identification of C9exp carriers in other Finnish cohorts as well.
METHODS
Cohorts, genotyping, and imputations of EADB samples, and clinical endpoints
This study includes GWAS data from the EADB and the FinnGen database. EADB data were processed as previously described [13]. Finnish iNPH patients included in the EADB GWAS were diagnosed according to published guidelines and procedures [14, 15]. C9exp genotyping was performed using repeat-primed PCR and amplicon length analysis [3]. Previous studies have suggested a pathological threshold of > 30 units [3, 6] or > 45 units [16] for the C9exp. However, smaller repeats of < 30 units may also associate with disease [4 , 17]. In these studies, the minimum lengths of the C9exp have been identified as 7 [4] and 17 [16, 17] repeats on the longer allele. Here, we chose a threshold of 15 repeats to define individuals positive for the C9exp. The iNPH cohort contains 41 C9exp carriers [7 full-length (> 60 repeats) and 34 intermediate C9exp carriers (15–30 repeats)] and 801 controls (< 15 repeats). Forty-eight percent of C9exp carriers and controls were male.
Detailed information of the FinnGen data is described at https://www.finngen.fi/fi. Genome and clinical data from 218,792 individuals were obtained from FinnGen study data release 5. Clinical diagnoses were derived from the International Statistical Classification of Diseases and Related Health Problems, version 10 (ICD-10) codes in Finnish national hospital registries and cause-of-death registry as part of FinnGen project. UK Biobank data were used for validation of the identified SNPs and haplotypes.
Generation of risk haplotypes associating with C9exp
Trans-Omics for Precision Medicine (TOPMed) imputed genotype data from EADB consortium was used [13]. Genotypes were phased with Eagle v2.4105 and imputed with Minimac4 v4-1.0.2. Only SNPs having Hardy-Weinberg equilibrium p > 10–5 and imputation quality greater than 0.6 were considered. The imputation quality score for rs139185008 was 0.75. The previously published 20-SNP Finnish risk haplotype [12] was used to test for association with the C9exp (iNPH cohort) and clinical diagnoses (“motor neuron disease” for ALS, “circumscribed brain atrophy” for FTLD; FinnGen). Additional upstream and downstream SNPs were added to create longer haplotypes that were able to distinguish C9exp carriers better from non-carriers. The SNP selection was conducted based on a side by side inspection of an individual C9exp carrier and non-carrier haplotypes of the phased and imputed most probable genotype data. Minor and major alleles included in the haplotypes are presented in Supplementary Table 1.
Analysis of SNP and haplotype association with C9exp and clinical endpoints
Both LD-statistics (D’) and case vs. control logistic regression analysis with covariates were conducted on pre-processed imputed genotypes using PLINK software (version 1.9) [18]. For iNPH cohort, only principal component (PC) 1–2, and for FinnGen PC1-5 were used as covariates. UK Biobank data were extracted through http://big.stats.ox.ac.uk/variant/9:27491942-T-C and http://big.stats.ox.ac.uk/pheno/traits_011.
Data presentation
Manhattan and regional association plots were drawn using LocusZoom software (v0.12.0). For LD calculation, European reference population was used. Images were modified using LibreOffice Draw (version: 6.0.2.1). Bar graphs and geographical plot of minor allele frequencies (MAFs) were generated, and Pearson’s Chi-square test on minor versus major allele counts among Finnish regions was performed using RStudio software (version: 1.1.463) and ggplot2 [19] and geofi packages [20].
Data availability
Data are available on reasonable request from the corresponding authors. Due to privacy policies, the data are not publicly available.
Ethics statement
All experimental procedures complied with the standards of the Declaration of Helsinki. The Ethics Committee of Hospital District of Northern Savo approved the iNPH study and all patients provided an informed consent. Patients and controls in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act (https://www.finngen.fi/fi). All DNA samples and data were pseudonymized (iNPH cohort and FinnGen cohort).
RESULTS
C9exp associates with SNPs near MOB3B and C9orf72 genes
Based on genotype data obtained from a global screening array, SNP association analysis was performed in a well-characterized iNPH patient cohort, comprising intermediate (15–30) and full-length (> 60 repeats) C9exp carriers (n = 41) who were compared to non-carriers (< 15 repeats, n = 801). Except for two SNPs, all significantly C9exp-associated SNPs (p < 5×10–8) were located on chromosome 9 (Fig. 1A). Several of these were close or within the MOB kinase activator 3B (MOB3B) or C9orf72 genes, spanning an approx. 94 kb region, and showed a strong linkage disequilibrium (LD) (r2≥0.8) with the reference SNP rs3849942 (Fig. 1B), a previously reported surrogate marker for the chromosome 9p risk haplotype [2, 12]. Interestingly, rs139185008 (odds ratio, OR = 39.4, 95%CI [17.2–90.5], p = 4.6×10–18), localizing within a recombination-poor region 81 541 bp upstream of the C9exp, showed the strongest single SNP association with C9exp carriership. Also, rs139185008 (MAF 0.016) was in complete LD (D’ = 1.00) with the reference SNP rs3849942 (MAF 0.17), which showed a weaker association with C9exp carriership (OR 8.44, 95%CI [4.99–140.29], p = 2.0×10–15). Importantly, rs139185008 was highly abundant in C9exp carriers (MAF for full-length and intermediate carriers = 0.21 and 0.19, respectively), but rare in non-carriers (< 15 repeats, MAF = 0.008). Several C9exp-associated haplotypes were significantly overrepresented in C9exp carriers as compared to non-carriers in the iNPH cohort (Table 1). rs139185008 was part of the haplotypes 2, 5, 6, 8, and 10 showing the most prominent risk effects (OR > 42.0). Moreover, as compared to the previously reported 20-SNP Finnish risk haplotype, including, e.g., rs868856, rs7046653, rs2814707, rs3849942, and rs774359 [12] (Fig. 1B), the inclusion of rs139185008 to haplotypes (“haplo”) 2, 5, 6, 8, and 10 markedly improved the specificity to identify C9exp carriers from non-carriers in the iNPH cohort (Table 1, Supplementary Table 1), e.g., the OR for haplotype 2 (OR = 11.33, 95%CI [6.38–20.14], p = 1.28×10–16) substantially increased after the inclusion of rs139185008 (OR = 42.74, 95%CI [18.35–99.53], p = 3.16×10–18).

SNPs associating with the C9orf72 repeat expansion in iNPH cohort locate near the MOB3B and C9orf72 genes. Manhattan plot of genome-wide association (GWA) of SNPs associated with the C9orf72 expansion in a Finnish iNPH cohort. Chromosome numbers are indicated below the x-axis (A). Regional association plot of chromosome 9 locus, which contained significant association from the GWA study. SNPs of the previously described Finnish risk haplotype [12] and rs139185008 (arrow) are indicated above the plot. Significantly associated SNPs are indicated in bold. Linkage disequilibrium is indicated as color-coded r2 values. Recombination rates are depicted by continuous line. The reference variant rs3849942 is shown as a diamond (B). Gray lines indicate significance level (p < 5×10–8). iNPH, idiopathic normal pressure hydrocephalus; SNP, single nucleotide polymorphism.
Haplotypes and individual SNPs significantly associating with C9orf72 repeat expansion (C9exp) in the iNPH cohort and ALS and FTLD in the FinnGen cohort
iNPH cohort Ncarriers/control = 41/801; Motor neuron disease ALS Ncases/control = 238/111,855; Frontotemporal lobar degeneration Ncases/control = 242/214,474; *denotes that variant chr9_27491944_T_C (rs139185008) was added to haplotype analysis; chromosomal positions of SNPs constituting haplotypes are listed in Supplementary Table 1. CI, confidence interval; FTLD, frontotemporal lobar degeneration; haplo, haplotype; iNPH, idiopathic normal pressure hydrocephalus; MAF, minor allele frequency; N, number of subjects; OR, odds ratio; SNP, single-nucleotide polymorphism.
Rs139185008 strongly associates with FTLD and ALS in FinnGen
Next, we unbiasedly examined in the FinnGen database which clinical diagnoses associate with the SNPs and haplotypes identified in the iNPH cohort. The FinnGen database contains comprehensive genome-wide genotype data and life-long medical history from > 200,000 Finns. However, FinnGen does not include genetic data on complex genomic alterations, such as C9exp. rs139185008 and haplotypes 2, 5, 6, 8, and 10 containing the minor allele of rs139185008 strongly associated with ALS and FTLD (Table 1). In comparison, the previously reported 20-SNP Finnish risk haplotype [12] showed a weaker association with ALS and FTLD (haplo 2, Table 1). In FinnGen, rs139185008 was the top SNP that associated with ALS and FTLD, confirming the result obtained in the iNPH GWAS (Fig. 1B). Importantly, rs139185008 also significantly associated with ALS in UK Biobank (p = 9.0×10–8), but it was not among the top SNPs associated with ALS (beta value = –0.4; p values≤1.2×10–18; Supplementary Figure 1).
Rs139185008 is regionally enriched to South-Eastern Finland
Finally, we used FinnGen data to calculate the MAF of the rs139185008 according to the region of birth in Finland (Fig. 2). Geographically, the rs139185008 minor allele showed the highest prevalence in Southern Savonia (MAF = 0.025) and the lowest in Ostrobothnia (MAF = 0.008) (Fig. 2). Pearson’s Chi-square test of the frequency of rs139185008 minor allele revealed significant differences in the geographic distribution of rs139185008 in Finland (p < 2.2×10–16, χ2 = 282.43, df = 18).

Geographical distribution of rs139185008 minor allele frequencies in different regions in Finland. Minor allele frequencies (MAF) of rs139185008 in Finland. Pearson’s Chi-square test revealed statistically significant deviation (p < 2.2×10–16, χ2 = 282.43, df = 18) of the geographical distribution of the minor allele counts of rs139185008. Genotyped population sizes are given for each region. Mean MAF for all regions is indicated as black vertical line (A). MAF of rs139185008 within Finnish regions showed geographical clustering of high (dark magenta) and low (white) frequencies (B).
DISCUSSION
We report that rs139185008 strongly associates with C9exp in a cohort of iNPH cases, suggesting surrogate marker potential for identifying C9exp carriers in large population-based cohorts and biobank databases. rs139185008 indicated stronger association with FTLD and ALS clinical diagnoses in FinnGen (OR 4.4 and 5.2, respectively) as compared to the previously reported C9exp proxy marker rs3849942 (OR 1.2 and 1.6, respectively). The top SNPs differed in FinnGen and UK Biobank, which indicates that there are differences in the C9exp haplotype structures among European populations. In Finland, the frequency of rs139185008 minor allele was highest in South-Eastern Finland, and lowest in the west-coastal Ostrobothnia, which represent genetically different geographical regions. The regional distribution of rs139185008 in Finland is consistent with the most enriched areas of haplotypes of Finnish Heritage Diseases in regard of the low enrichment in the west coast area and high enrichment in Savonia regions [21], a phenomenon traceable back to the population migration history within Finland and the resulting genetic isolation due to bottleneck events and founder effects [22]. However, rs139185008 is also highly prevalent in Helsinki and surrounding areas and showed a link to ALS in the UK Biobank, consisting of a more heterogeneous population. In this context, however, it should be emphasized that the beta-value provided by UK Biobank for rs139185008 was negative, indicating an odds ratio below one for this SNP. Importantly, similar results (negative beta-values) were also observed with some other SNPs significantly associated with ALS in the UK Biobank in the MOB3B/C9orf72 region. Thus, further investigations on the prevalence of rs139185008 and its association with C9exp-linked diseases in other populations and cohorts are warranted in the future to evaluate its importance beyond Finland and the UK Biobank. Collectively, the present data suggest that specific haplotypes containing rs139185008 are useful proxy markers to identify potential C9exp carriers. Since gene-based therapies are emerging in C9exp-linked diseases, rs139185008 may be utilized in the identification of potential C9exp carriers already at an early phase from biobanks and population cohorts for confirmatory C9exp genotyping and subsequent clinical trials in the future.
Footnotes
ACKNOWLEDGMENTS
HR is a PhD student in the GenomMed and Molecular Medicine (DPMM) Doctoral Programs of the University of Eastern Finland, Kuopio, Finland. This study is part of the research activities of the Finnish FTD Research Network (FinFTD).
This work was supported by the Academy of Finland, under grant numbers 315459 (AH), 315460 (AMR), and 307866 (MH); the Strategic Neuroscience Funding of the University of Eastern Finland; Finnish Brain Foundation (ES), Sigrid Juselius Foundation (ES, MH, PJT), Finnish Cultural Foundation (KK) Instrumentarium Science Foundation (ES), Orion Research Foundation (ES). This work was supported by a grant (European Alzheimer DNA BioBank, EADB) from the EU Joint Programme –Neurodegenerative Disease Research (JPND). Inserm UMR1167 is also funded by Inserm, Institut Pasteur de Lille, the Lille Métropole Communauté Urbaine, the French government’s LABEX DISTALZ program (development of innovative strategies for a transdisciplinary approach to Alzheimer’s disease). This publication is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 740264.
