Abstract

The completion of the first human genome sequence a decade ago, led to gene-hunting studies focusing on the delineation of intergenomic variation, since this underpins the majority of both human phenotype and disease risk. Advances in low-cost, high-throughput genotyping and sequencing technologies were vital and have facilitated major projects such as the SNP consortium's HapMap project (
Despite the successful use of SNP markers in identifying susceptibility loci, estimates suggest that in the majority of the common diseases, only a small proportion of genetic risk has been identified. To address this, a complementary approach investigating the entire spectrum of genetic variation is now underway. In the human genome, copy number variations (CNVs) represent a diverse group of polymorphisms that include insertion/deletions, segment duplications, and complex rearrangements of DNA sequence that range in size from thousand to million base-pair segments. Such large structural variations could have significant effects on many facets of gene function and regulation. Based on this, exploratory studies have started to identify the extent of common CNVs genome-wide, with more than 10,000 CNVs now listed on public databases (24,25). The Wellcome Trust Case Control Consortium (WTCCC) recently performed a comprehensive CNV association screen, investigating 3432 polymorphic CNVs on a custom designed array in 3000 shared controls and 2000 cases for each of eight common diseases, including type 1 diabetes, rheumatoid arthritis, Crohn's disease, type 2 diabetes, hypertension, bipolar disorder, breast cancer, and coronary artery disease (26). Whilst it revealed that although there are many CNVs throughout the genome, individual CNVs were generally less frequent than we had suspected. Out of the autosomal CNVs investigated, 44% were rare with minor allele frequencies (MAF) below 0.05, highlighting once again the need to use very large sample sizes, of many thousands, in CNV association studies (26). In total the study only detected three previously identified susceptibility regions in four diseases, including the IRGM region in Crohn's disease, the HLA region for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and the TSPAN8 region in type 2 diabetes (26). A meta-analysis of GWAS in Crohn's disease, type 1 diabetes, and type 2 diabetes revealed that there are 95 gene regions associated with the three diseases combined; however, only three regions were identified as associated in the CNV study (26). This suggests that common CNVs are unlikely to have a major role as a screening tool to identify novel susceptibility loci in complex genetic diseases. However, CNVs might be important in driving association signals in previously identified susceptibility loci and, therefore, require direct interrogation.
In this issue of Thyroid, Huber and colleagues (27) report the contribution of CNVs in three established GD susceptibility loci, CTLA4, CD40, and PTPN22. All three genes have previously demonstrated strong association with GD and encode molecules vital in antigen presentation and control of lymphocyte activation. Since the etiological variants have not been identified it is, therefore, possible that CNVs may play a role in conferring GD susceptibility at these loci. Having first searched public databases for CNVs within the genes of interest, the authors identified one CNV that encompasses the entire CD40 gene and multiple CNVs spanning PTPN22. However, because no CNVs were listed within or nearby CTLA4, the authors screened for CTLA4 CNVs in 56 GD subjects and 15 control subjects, yet no CNVs were found. This could suggest that common CNVs are unlikely to have a role in GD susceptibility conferred by CTLA4. The CNVs identified within CD40 and PTPN22 were also investigated in 191 GD cases and 192 controls. This analysis demonstrated that the CD40 CNV was not polymorphic in either the GD cases or control subjects, while the PTPN22 CNVs were extremely rare, with no CNV variations identified in the GD subjects and only one duplication and one deletion identified in the control cohort. This study, therefore, was unable to identify association in any of the CNVs investigated but is able to rule out a GD effect of common CNVs (MAF > 0.05) in CTLA4, CD40, and PTPN22. Interestingly, as a result of concerns that DNA derived from immortalized B-cell lines may have induced genetic changes, the authors investigated a possible effect of DNA source on the CNV assay, by comparing cell line–derived and whole blood–derived DNA. The authors were able to demonstrate that DNA derived from cell lines possessed greater numbers of CNV deletions compared to DNA derived from whole blood. These findings are important because they demonstrate that cell line–derived DNA may have undergone various aberrant changes, which could lead to a number of artefacts in the association analysis, leading to reproducible, false-positive associations. Notably, the WTCCC identified a similar phenomenon in which cell line–derived DNA displayed more variance in the raw data amplification plots (26). Based on their findings, the association analyses conducted by Huber et al. (27) used DNA derived from whole blood to avoid potential inaccuracies in CNV genotyping. In the context of GD genetics, the study by Huber et al. (27) represents the first attempt to specifically investigate the role of CNVs in GD and highlights some of the key challenges of CNV association analysis.
So what role are CNVs likely to play in common diseases such as GD? At the moment we are unable to fully answer this question. In a previous study of 5000 individuals, we found CNVs tended to be in strong linkage disequilibrium (LD) with common SNPs; for example, CNVs with two or three classes (alleles) that have an MAF ≥ 10%, were in strong LD (r 2 > 0.8) with at least one common SNP (26). This suggests that where detailed SNP association screens have been performed on genes such as CTLA4 and TSHR in GD (15,28), CNVs will have been indirectly “tagged.” However, in order to confidently identify etiological disease causing DNA variants, all variants, including CNVs, will need to be analyzed. It must also be remembered that at the present time certain types of CNVs cannot be typed by current genotyping methods, such as highly polymorphic tandem repeat sequences and large, high copy number repeats; therefore, a role for these remains unknown.
In summary, it would seem that the most effective strategy for the identification of novel GD susceptibility loci rests with the analysis of ever more dense panels of SNP markers in large case–control cohorts. Common CNVs are unlikely to make a major contribution to the genetic basis of common diseases such as GD. It remains a possibility that, in some people, CNVs that are generally uncommon in the population may contribute to GD onset and their study will be important, particularly when fine mapping disease-risk loci.
