Copy Number Variation in Common Disease

Abstract

T he search for genetic variants that confer an increased risk for the development of common diseases such as Graves' disease (GD) has been an evolving process over the last 20 years or so. Initial association and linkage studies predominantly investigated a small number of select, di-allelic variants such as single nucleotide polymorphisms (SNPs) and microsatellite markers in cohorts of modest size and were largely unsuccessful in identifying novel disease risk genes. The publication of a detailed linkage map of the human genome in 1994 (1) paved the way for large-scale, genome-wide linkage studies; first in type 1 diabetes (2,3) and subsequently in most common diseases including autoimmune thyroid diseases (4 –8). Unfortunately, despite much hope at the time, these studies largely failed to deliver novel disease-risk loci. A return to population-based candidate gene association studies proved more successful and for GD, for example, led to the identification of novel loci in the HLA class I and II regions and CTLA-4 (9 –14).

The completion of the first human genome sequence a decade ago, led to gene-hunting studies focusing on the delineation of intergenomic variation, since this underpins the majority of both human phenotype and disease risk. Advances in low-cost, high-throughput genotyping and sequencing technologies were vital and have facilitated major projects such as the SNP consortium's HapMap project (www.hapmap.org, accessed November 25, 2010), which aimed to catalogue common SNP variation across the human genome in a number of well-characterized and distinct geographic populations. This shifted the emphasis from microsatellites and linkage studies back to SNP association studies. With greater knowledge of common SNP architecture across the genome, it has been possible to assess chromosomal structure and capture the majority of common genetic variations. In GD, this approach has provided new insights into susceptibility genes such as PTPN22, IL2Rα, TSHR, and FCRL3 (15 –18). More recently, the first wave of genome-wide association studies (GWAS) has taken place, whereby hundreds of thousands of SNPs have been screened in large case–control cohorts (19). This has enabled the identification of hundreds of genetic variants associated with many complex diseases and human phenotypes. A similar approach in GD employed a genome screen of nonsynonymous SNPs (20). A full 500,000 SNP GWAS in GD is eagerly awaited, however. Nonetheless, since GD and type 1 diabetes are known to share many susceptibility regions, the type 1 diabetes GWAS results have been used to identify several novel GD susceptibility loci, including AFF3, IL2, CAPSL, PTPN2, CD226, and IFIH1 (21 –23).

Despite the successful use of SNP markers in identifying susceptibility loci, estimates suggest that in the majority of the common diseases, only a small proportion of genetic risk has been identified. To address this, a complementary approach investigating the entire spectrum of genetic variation is now underway. In the human genome, copy number variations (CNVs) represent a diverse group of polymorphisms that include insertion/deletions, segment duplications, and complex rearrangements of DNA sequence that range in size from thousand to million base-pair segments. Such large structural variations could have significant effects on many facets of gene function and regulation. Based on this, exploratory studies have started to identify the extent of common CNVs genome-wide, with more than 10,000 CNVs now listed on public databases (24,25). The Wellcome Trust Case Control Consortium (WTCCC) recently performed a comprehensive CNV association screen, investigating 3432 polymorphic CNVs on a custom designed array in 3000 shared controls and 2000 cases for each of eight common diseases, including type 1 diabetes, rheumatoid arthritis, Crohn's disease, type 2 diabetes, hypertension, bipolar disorder, breast cancer, and coronary artery disease (26). Whilst it revealed that although there are many CNVs throughout the genome, individual CNVs were generally less frequent than we had suspected. Out of the autosomal CNVs investigated, 44% were rare with minor allele frequencies (MAF) below 0.05, highlighting once again the need to use very large sample sizes, of many thousands, in CNV association studies (26). In total the study only detected three previously identified susceptibility regions in four diseases, including the IRGM region in Crohn's disease, the HLA region for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and the TSPAN8 region in type 2 diabetes (26). A meta-analysis of GWAS in Crohn's disease, type 1 diabetes, and type 2 diabetes revealed that there are 95 gene regions associated with the three diseases combined; however, only three regions were identified as associated in the CNV study (26). This suggests that common CNVs are unlikely to have a major role as a screening tool to identify novel susceptibility loci in complex genetic diseases. However, CNVs might be important in driving association signals in previously identified susceptibility loci and, therefore, require direct interrogation.

In this issue of Thyroid, Huber and colleagues (27) report the contribution of CNVs in three established GD susceptibility loci, CTLA4, CD40, and PTPN22. All three genes have previously demonstrated strong association with GD and encode molecules vital in antigen presentation and control of lymphocyte activation. Since the etiological variants have not been identified it is, therefore, possible that CNVs may play a role in conferring GD susceptibility at these loci. Having first searched public databases for CNVs within the genes of interest, the authors identified one CNV that encompasses the entire CD40 gene and multiple CNVs spanning PTPN22. However, because no CNVs were listed within or nearby CTLA4, the authors screened for CTLA4 CNVs in 56 GD subjects and 15 control subjects, yet no CNVs were found. This could suggest that common CNVs are unlikely to have a role in GD susceptibility conferred by CTLA4. The CNVs identified within CD40 and PTPN22 were also investigated in 191 GD cases and 192 controls. This analysis demonstrated that the CD40 CNV was not polymorphic in either the GD cases or control subjects, while the PTPN22 CNVs were extremely rare, with no CNV variations identified in the GD subjects and only one duplication and one deletion identified in the control cohort. This study, therefore, was unable to identify association in any of the CNVs investigated but is able to rule out a GD effect of common CNVs (MAF > 0.05) in CTLA4, CD40, and PTPN22. Interestingly, as a result of concerns that DNA derived from immortalized B-cell lines may have induced genetic changes, the authors investigated a possible effect of DNA source on the CNV assay, by comparing cell line–derived and whole blood–derived DNA. The authors were able to demonstrate that DNA derived from cell lines possessed greater numbers of CNV deletions compared to DNA derived from whole blood. These findings are important because they demonstrate that cell line–derived DNA may have undergone various aberrant changes, which could lead to a number of artefacts in the association analysis, leading to reproducible, false-positive associations. Notably, the WTCCC identified a similar phenomenon in which cell line–derived DNA displayed more variance in the raw data amplification plots (26). Based on their findings, the association analyses conducted by Huber et al. (27) used DNA derived from whole blood to avoid potential inaccuracies in CNV genotyping. In the context of GD genetics, the study by Huber et al. (27) represents the first attempt to specifically investigate the role of CNVs in GD and highlights some of the key challenges of CNV association analysis.

So what role are CNVs likely to play in common diseases such as GD? At the moment we are unable to fully answer this question. In a previous study of 5000 individuals, we found CNVs tended to be in strong linkage disequilibrium (LD) with common SNPs; for example, CNVs with two or three classes (alleles) that have an MAF ≥ 10%, were in strong LD (r ² > 0.8) with at least one common SNP (26). This suggests that where detailed SNP association screens have been performed on genes such as CTLA4 and TSHR in GD (15,28), CNVs will have been indirectly “tagged.” However, in order to confidently identify etiological disease causing DNA variants, all variants, including CNVs, will need to be analyzed. It must also be remembered that at the present time certain types of CNVs cannot be typed by current genotyping methods, such as highly polymorphic tandem repeat sequences and large, high copy number repeats; therefore, a role for these remains unknown.

In summary, it would seem that the most effective strategy for the identification of novel GD susceptibility loci rests with the analysis of ever more dense panels of SNP markers in large case–control cohorts. Common CNVs are unlikely to make a major contribution to the genetic basis of common diseases such as GD. It remains a possibility that, in some people, CNVs that are generally uncommon in the population may contribute to GD onset and their study will be important, particularly when fine mapping disease-risk loci.

References

Buetow

, Ludwigsen

, Scherpbier-Heddema

, Quillen

, Murray

, Sheffield

, Duyk

, Weber

, Weissenbach

, Gyapay

. 1994. Human genetic map. Genome maps V. Wall chart. Science, 265:2055–2070.

Davies

, Kawaguchi

, Bennett

, Copeman

, Cordell

, Pritchard

, Reed

, Gough

, Jenkins

, Palmer

. 1994. A genome-wide search for human type, 1 diabetes susceptibility genes. Nature, 371:130–136.

Todd

. 1995. Genetic analysis of type 1 diabetes using whole genome approaches. Proc Natl Acad Sci U S A, 92:8560–8565.

Tomer

, Barbesino

, Greenberg

, Concepcion

, Davies

. 1999. Mapping the major susceptibility loci for familial Graves' and Hashimoto's diseases: evidence for genetic heterogeneity and gene interactions. J Clin Endocrinol Metab, 84:4656–4664.

Tomer

, Ban

, Concepcion

, Barbesino

, Villanueva

, Greenberg

, Davies

. 2003. Common and unique susceptibility loci in Graves and Hashimoto diseases: results of whole-genome screening in a data set of 102 multiplex families. Am J Hum Genet, 73:736–747.

Taylor

, Gough

, Hunt

, Brix

, Chatterjee

, Connell

, Franklyn

, Hegedus

, Robinson

, Wiersinga

, Wass

JAH

, Zabaneh

, Mackay

, Weetman

. 2006. A genome-wide screen in, 1119 relative pairs with autoimmune thyroid disease. J Clin Endocrinol Metab, 91:646–653.

Sakai

, Shirasawa

, Ishikawa

, Ito

, Tamai

, Kuma

, Akamizu

, Tanimura

, Furugaki

, Yamamoto

, Sasazuki

. 2001. Identification of susceptibility loci for autoimmune thyroid disease to 5q31-q33 and Hashimoto's thyroiditis to 8q23-q24 by multipoint affected sib-pair linkage analysis in Japanese. Hum Mol Genet, 10:1379–1386.

Jin

, Teng

, Ben

, Xiong

, Zhang

, Xu

, Shugart

, Jin

, Chen

, Huang

. 2003. Genome-wide scan of Graves' disease: evidence for linkage on chromosome 5q31 in Chinese Han pedigrees. J Clin Endocrinol Metab, 88:1798–1803.

Heward

, Allahabadia

, Daykin

, Carr-Smith

, Daly

, Armitage

, Dodson

, Sheppard

, Barnett

, Franklyn

, Gough

. 1998. Linkage disequilibrium between the human leukocyte antigen class II region of the major histocompatibility complex and Graves' disease: replication using a population case control and family-based study. J Clin Endocrinol Metab, 83:3394–3397.

10.

Simmonds

, Howson

, Heward

, Cordell

, Foxall

, Carr-Smith

, Gibson

, Walker

, Tomer

, Franklyn

, Todd

, Gough

. 2005. Regression mapping of association between the human leukocyte antigen region and Graves disease. Am J Hum Genet, 76:157–163.

11.

Heward

, Allahabadia

, Armitage

, Hattersley

, Dodson

, Macleod

, Carr-Smith

, Daykin

, Daly

, Sheppard

, Holder

, Barnett

, Franklyn

, Gough

. 1999. The development of Graves' disease and the CTLA-4 gene on chromosome 2q33. J Clin Endocrinol Metab, 84:2398–2401.

12.

Vaidya

, Oakes

, Imrie

, Dickinson

, Perros

, Kendall-Taylor

, Pearce

. 2003. CTLA4 gene and Graves' disease: association of Graves' disease with the CTLA4 exon 1 and intron 1 polymorphisms, but not with the promoter polymorphism. Clin Endocrinol (Oxf), 58:732–735.

13.

Yanagawa

, Hidaka

, Guimaraes

, Soliman

, DeGroot

. 1995. CTLA-4 gene polymorphism associated with Graves' disease in a Caucasian population. J Clin Endocrinol Metab, 80:41–45.

14.

Braun

, Donner

, Siegmund

, Walfish

, Usadel

, Badenhoop

. 1998. CTLA-4 promoter variants in patients with Graves' disease and Hashimoto's thyroiditis. Tissue Antigens, 51:563–566.

15.

Brand

, Barrett

, Simmonds

, Newby

, McCabe

, Bruce

, Kysela

, Carr-Smith

, Brix

, Hunt

, Wiersinga

, Hegedus

, Connell

, Wass

, Franklyn

, Weetman

, Heward

, Gough

. 2009. Association of the thyroid stimulating hormone receptor gene (TSHR) with Graves' disease. Hum Mol Genet, 18:1704–1713.

16.

Heward

, Brand

, Barrett

, Carr-Smith

, Franklyn

, Gough

. 2007. Association of PTPN22 haplotypes with Graves' disease. J Clin Endocrinol Metab, 92:685–690.

17.

Brand

, Lowe

, Heward

, Franklyn

, Cooper

, Todd

, Gough

. 2007. Association of the interleukin-2 receptor alpha (IL-2Ralpha)/CD25 gene region with Graves' disease using a multilocus test and tag SNPs. Clin Endocrinol (Oxf), 66:508–512.

18.

Simmonds

, Heward

, Carr-Smith

, Foxall

, Franklyn

, Gough

. 2006. Contribution of single nucleotide polymorphisms within FCRL3 and MAP3K7IP2 to the pathogenesis of Graves' disease. J Clin Endocrinol Metab, 91:1056–1061.

19.

WTCCC. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447:661–678.

20.

Wellcome Trust Case Control Consortium. Burton

, Clayton

, Cardon

, Craddock

, Deloukas

, Duncanson

, Kwiatkowski

, McCarthy

, Ouwehand

, Samani

, Todd

, Donnelly

, Barrett

, Davison

, Easton

, Evans

, Leung

, Marchini

, Morris

, Spencer

, Tobin

, Attwood

, Boorman

, Cant

, Everson

, Hussey

, Jolley

, Knight

, Koch

, Meech

, Nutland

, Prowse

, Stevens

, Taylor

, Walters

, Walker

, Watkins

, Winzer

, Jones

, McArdle

, Ring

, Strachan

, Pembrey

, Breen

, St Clair

, Caesar

, Gordon-Smith

, Jones

, Fraser

, Green

, Grozeva

, Hamshere

, Holmans

, Jones

, Kirov

, Moskivina

, Nikolov

, O'Donovan

, Owen

, Collier

, Elkin

, Farmer

, Williamson

, McGuffin

, Young

, Ferrier

, Ball

, Balmforth

, Barrett

, Bishop

, Iles

, Maqbool

, Yuldasheva

, Hall

, Braund

, Dixon

, Mangino

, Stevens

, Thompson

, Bredin

, Tremelling

, Parkes

, Drummond

, Lees

, Nimmo

, Satsangi

, Fisher

, Forbes

, Lewis

, Onnie

, Prescott

, Sanderson

, Matthew

, Barbour

, Mohiuddin

, Todhunter

, Mansfield

, Ahmad

, Cummings

, Jewell

, Webster

, Brown

, Lathrop

, Connell

, Dominiczak

, Marcano

, Burke

, Dobson

, Gungadoo

, Lee

, Munroe

, Newhouse

, Onipinla

, Wallace

, Xue

, Caulfield

, Farrall

, Barton

Biologics in RA Genetics Genomics Study Syndicate (BRAGGS) Steering Committee Bruce

, Donovan

, Eyre

, Gilbert

, Hilder

, Hinks

, John

, Potter

, Silman

, Symmons

, Thomson

, Worthington

, Dunger

, Widmer

, Frayling

, Freathy

, Lango

, Perry

, Shields

, Weedon

, Hattersley

, Hitman

, Walker

, Elliott

, Groves

, Lindgren

, Rayner

, Timpson

, Zeggini

, Newport

, Sirugo

, Lyons

, Vannberg

, Hill

, Bradbury

, Farrar

, Pointon

, Wordsworth

, Brown

, Franklyn

, Heward

, Simmonds

, Gough

, Seal

Breast Cancer Susceptibility Collaboration (UK) Stratton

, Rahman

, Ban

, Goris

, Sawcer

, Compston

, Conway

, Jallow

, Newport

, Sirugo

, Rockett

, Bumpstead

, Chaney

, Downes

, Ghori

, Gwilliam

, Hunt

, Inouye

, Keniry

, King

, McGinnis

, Potter

, Ravindrarajah

, Whittaker

, Widden

, Withers

, Cardin

, Davison

, Ferreira

, Pereira-Gale

, Hallgrimsdottir

, Howie

, Su

, Teo

, Vukcevic

, Bentley

, Brown

, Compston

, Farrall

, Hall

, Hattersley

, Hill

, Parkes

, Pembrey

, Stratton

, Mitchell

, Newby

, Brand

, Carr-Smith

, Pearce

, McGinnis

, Keniry

, Deloukas

, Reveille

, Zhou

, Sims

, Dowling

, Taylor

, Doan

, Davis

, Savage

, Ward

, Learch

, Weisman

, Brown

. 2007. Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet, 39:1329–1337.

21.

Todd

, Walker

, Cooper

, Smyth

, Downes

, Plagnol

, Bailey

, Nejentsev

, Field

, Payne

, Lowe

, Szeszko

, Hafler

, Zeitels

, Yang

, Vella

, Nutland

, Stevens

, Schuilenburg

, Coleman

, Maisuria

, Meadows

, Smink

, Healy

, Burren

, Lam

, Ovington

, Allen

, Adlem

, Leung

, Wallace

, Howson

, Guja

, Ionescu-Tîrgovişte

Genetics of Type 1 Diabetes in Finland Simmonds

, Heward

, Gough

Wellcome Trust Case Control Consortium Dunger

, Wicker

, Clayton

. 2007. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat Genet, 39:857–864.

22.

Hafler

, Maier

, Cooper

, Plagnol

, Hinks

, Simmonds

, Stevens

, Walker

, Healey

, Howson

, Maisuiria

, Duley

, Coleman

, Gough

International Multiple Sclerosis Genetics Consortium (IMSGC) Worthington

, Kuchroo

, Wicker

, Todd

. 2009. CD226 Gly307Ser association with multiple autoimmune diseases. Genes Immun, 10:5–10.

23.

Fanciulli

, Norsworthy

, Petretto

, Dong

, Harper

, Kamesh

, Heward

, Gough

, de Smith

, Blakemore

, Froguel

, Owen

, Pearce

, Teixeira

, Guillevin

, Graham

, Pusey

, Cook

, Vyse

, Aitman

. 2007. FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat Genet, 39:721–723.

24.

Redon

, Ishikawa

, Fitch

, Feuk

, Perry

, Andrews

, Fiegler

, Shapero

, Carson

, Chen

, Cho

, Dallaire

, Freeman

, Gonzalez

, Gratacos

, Huang

, Kalaitzopoulos

, Komura

, MacDonald

, Marshall

, Mei

, Montgomery

, Nishimura

, Okamura

, Shen

, Somerville

, Tchinda

, Valsesia

, Woodwark

, Yang

, Zhang

, Zerjal

, Zhang

, Armengol

, Conrad

, Estivill

, Tyler-Smith

, Carter

, Aburatani

, Lee

, Jones

, Scherer

, Hurles

. 2006. Global variation in copy number in the human genome. Nature, 444:428–429.

25.

Conrad

, Pinto

, Redon

, Feuk

, Gokcumen

, Zhang

, Aerts

, Andrews

, Barnes

, Campbell

, Fitzgerald

, Hu

, Ihm

, Kristiansson

, Macarthur

, Macdonald

, Onyiah

, Pang

, Robson

, Stirrups

, Valsesia

, Walter

, Wei

Wellcome Trust Case Control Consortium Tyler-Smith

, Carter

, Lee

, Scherer

, Hurles

. 2010. Origins and functional impact of copy number variation in the human genome. Nature, 464:704–712.

26.

Wellcome Trust Case Control Consortium. Craddock

, Hurles

, Cardin

, Pearson

, Plagnol

, Robson

, Vukcevic

, Barnes

, Conrad

, Giannoulatou

, Holmes

, Marchini

, Stirrups

, Tobin

, Wain

, Yau

, Aerts

, Ahmad

, Andrews

, Arbury

, Attwood

, Auton

, Ball

, Balmforth

, Barrett

, Barroso

, Barton

, Bennett

, Bhaskar

, Blaszczyk

, Bowes

, Brand

, Braund

, Bredin

, Breen

, Brown

, Bruce

, Bull

, Burren

, Burton

, Byrnes

, Caesar

, Clee

, Coffey

, Connell

, Cooper

, Dominiczak

, Downes

, Drummond

, Dudakia

, Dunham

, Ebbs

, Eccles

, Edkins

, Edwards

, Elliot

, Emery

, Evans

, Eyre

, Farmer

, Ferrier

, Feuk

, Fitzgerald

, Flynn

, Forbes

, Forty

, Franklyn

, Freathy

, Gibbs

, Gilbert

, Gokumen

, Gordon-Smith

, Gray

, Green

, Groves

, Grozeva

, Gwilliam

, Hall

, Hammond

, Hardy

, Harrison

, Hassanali

, Hebaishi

, Hines

, Hinks

, Hitman

, Hocking

, Howard

, Howson

, Hughes

, Hunt

, Isaacs

, Jain

, Jewell

, Johnson

, Jolley

, Jones

, Kirov

, Langford

, Lango-Allen

, Lathrop

, Lee

, Lees

, Lewis

, Lindgren

, Maisuria-Armer

, Maller

, Mansfield

, Martin

, Massey

, McArdle

, McGuffin

, McLay

, Mentzer

, Mimmack

, Morgan

, Morris

, Mowat

, Myers

, Newman

, Nimmo

, O'Donovan

, Onipinla

, Onyiah

, Ovington

, Owen

, Palin

, Parnell

, Pernet

, Perry

, Phillips

, Pinto

, Prescott

, Prokopenko

, Quail

, Rafelt

, Rayner

, Redon

, Reid

, Renwick , Ring

, Robertson

, Russell

, St Clair

, Sambrook

, Sanderson

, Schuilenburg

, Scott

, Seal

, Shaw-Hawkins

, Shields

, Simmonds

, Smyth

, Somaskantharajah

, Spanova

, Steer

, Stephens

, Stevens

, Stone

, Su

, Symmons

, Thompson

, Thomson

, Travers

, Turnbull

, Valsesia

, Walker

, Wallace

, Warren-Perry

, Watkins

, Webster

, Weedon

, Wilson

, Woodburn

, Wordsworth

, Young

, Zeggini

, Carter

, Frayling

, Lee

, McVean

, Munroe

, Palotie

, Sawcer

, Scherer

, Strachan

, Tyler-Smith

, Brown

, Burton

, Caulfield

, Compston

, Farrall

, Gough

, Hall

, Hattersley

, Hill

, Mathew

, Pembrey

, Satsangi

, Stratton

, Worthington

, Deloukas

, Duncanson

, Kwiatkowski

, McCarthy

, Ouwehand

, Parkes

, Rahman

, Todd

, Samani

, Donnelly

. 2010. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature, 464:713–720.

27.

Huber

, Concepcion

, Gandhi

, Menconi

, Smith

, Keddache

, Tomer

. 2011. Analysis of immune regulatory genes copy number variations (CNVs) in Graves' disease. Thyroid, 21this issue.

28.

Ueda

, Howson

, Esposito

, Heward

, Snook

, Chamberlain

, Rainbow

, Hunter

, Smith

, Di Genova

, Herr

, Dahlman

, Payne

, Smyth

, Lowe

, Twells

, Howlett

, Healy

, Nutland

, Rance

, Everett

, Smink

, Lam

, Cordell

, Walker

, Bordin

, Hulme

, Motzo

, Cucca

, Hess

, Metzker

, Rogers

, Gregory

, Allahabadia

, Nithiyananthan

, Tuomilehto-Wolf

, Tuomilehto

, Bingley

, Gillespie

, Undlien

, Rønningen

, Guja

, Ionescu-Tirgoviste

, Savage

, Maxwell

, Carson

, Patterson

, Franklyn

, Clayton

, Peterson

, Wicker

, Todd

, Gough

. 2003. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature, 423:506–511.