Characterization of Y-Chromosomal Short Tandem Repeat Markers in Pakistani Populations

Abstract

The Y chromosome microsatellite markers have been extensively used for population genetic studies and in individual identification and paternity testing in forensic medicine. In the present study, we report the data of five male-specific, polymorphic microsatellites in 740 unrelated male individuals from 12 different ethnic groups of Pakistan. The overall diversities of these individual loci in Pakistan ranged from 0.236 to 0.799. The total haplotypes identified were 152, and of these, 70 different haplotypes were present in only single individuals. Two haplotypes were found more frequently, 9_8_17_11_24 (13.5%) and 9_8_17_11_25 (8.6%), showing population-specific clustering in the Mohanna and the Brahui, respectively. An overall haplotype diversity of 0.965 in Pakistan suggested a high power of discrimination for these loci. Few populations, particularly the Mohanna and the Balti, showed lower haplotype diversity values for these loci (0.662 and 0.758, respectively). This set of microsatellite loci reported in the study can be used for population genetics and forensic medicine analysis. This study also demonstrates the importance of studying haplotype distribution pattern in population genetics.

Introduction

Microsatellite loci have been extensively used to study genetic variation in humans (Weber and May, 1986; Tautz, 1989). These markers have proven to be both accurate and informative for the construction of evolutionary relationships among human populations (Ayub et al., 2003). The Y chromosome is responsible for male-specific functions and is inherited paternally and most of it is passed on to the next generation without any change because of nonrecombination of this chromosome (except for mutations and recombination with the X chromosome at the pseudoautosomal regions). This property of the Y chromosome is helpful in identification of male lineages, just like mitochondrial DNA, which helps in the identification of female lineages (Tyler-Smith, 1999; Jobling and Tyler-Smith, 2000). The male-specific region of the Y chromosome consists of numerous stable binary markers including base substitution and retroposons, as well as rapidly evolving short tandem repeats (STRs) of 2-6 base pairs, which are dispersed throughout the length of the Y chromosome (Roewer et al., 1992). Some of these markers are polymorphic, which make them useful DNA markers (Kayser et al., 1997; Hurles and Jobling, 2001). Their widespread distribution and ease and accuracy of typing (Redd et al., 1997; Ayub et al., 2000) together with high levels of polymorphism make these microsatellite loci potentially useful for evolutionary studies (Deka et al., 1996; de Knijff et al., 1997; Kayser et al., 2001), forensic investigations (Jobling et al., 1997; Kayser et al., 1997), genealogical restructuring (Jobling, 2001), and medical genetics (Jobling and Tyler-Smith, 2000).

To get further insight into population structure and also to increase the power of discrimination for forensic samples, we have studied five Y-chromosomal STRs, beyond the previously reported 16 Y markers (Mohyuddin et al., 2004). Variability of these STRs was studied in 12 different Pakistani ethnic groups living in different parts of the country. These populations included Baluch, Brahui, Makrani, Sindhi, Parsi, Mohanna, Myo, Pathan, Burusho, Kalash, Hazara, and Balti; a detailed description of these populations has been given earlier (Qamar et al., 2002). In addition to the populations described by Qamar et al. (2002), the Mohanna reported here are a group of fishermen who reside in the Sindh province. Very little is known about their history, but it is believed that they were the original inhabitants of the Indian subcontinent who were subsequently replaced by different invading ethnic groups. Today, little pockets of this ethnic group still exist in the Sindh province of Pakistan around the river Indus.

We report here the variation of five Y-chromosomal microsatellite makers in 12 different ethnic groups of Pakistan and also demonstrate the importance of obtaining haplotype diversity and distribution pattern of any marker before its selection for population study or forensic casework.

Materials and Methods

Subjects

The allelic frequencies of five microsatellite loci were studied in 740 unrelated male individuals belonging to the following 12 different ethnic groups of Pakistan: Baluch (n = 61), Brahui (n = 94), Makrani (n = 58), Sindhi (n = 117), Parsi (n = 83), Mohanna (n = 66), Myo (n = 18), Pathan (n = 88), Burusho (n = 72), Kalash (n = 42), Hazara (n = 27), Balti (n = 14). Details of the sampling region of the ethnic groups along with their map location have been reported earlier (Qamar et al., 2002). The ethnic origin of each individual was confirmed, and after obtaining informed written consent, their blood samples were collected and lymphoblastoid cell lines were established for each individual (Walls and Crawford, 1987). DNA was extracted from these lymphoblastoid cell lines using a standard organic extraction method (Sambrook et al., 1989). This study has been approved by the Departmental Review Board/Ethics Committee, and subsequently, the article submission was also approved by the Institutional Review Board and Ethics Committee of Shifa International Hospital. This study conforms to the principles expressed in the Declaration of Helsinki.

Marker selection

Some preliminary work on the markers used in this study has been described previously (Mohyuddin et al., 2004), in which they were only used to study the genetic instability of Epstein-Barr virus (EBV)-transformed lymphoblastoid cell lines. These markers were identified, according to the method described earlier (Ayub et al., 2000), from the Y-chromosomal DNA sequence data of the Whitehead Institute/MIT Genome Sequencing Project's database (www-seq.wi.mit.edu/public_release/humanY.shtm).

Marker structure

The loci studied consisted of four simple repeats and one complex repeat. Of the simple repeats, three were tetranucleotide repeats, CATT repeat (DYS648; modal unit number = 9), TTTC repeat (DYS649; modal unit number = 8), and AAGG repeat (DYS650; modal unit number = 17), and the fourth repeat was a hexanucleotide, AAAGGG repeat (DYS651; modal unit number = 11). The fifth repeat was a complex pentanucleotide repeat AATAT (DYS652; modal unit number = 24) and its GenBank structure is represented as follows: (AATAT)₆(AAAAT)₁(AATAT)₉(AAAAT)₁(AATAT)₆. In counting the repeat units for the DYS652 locus, the insertion AAAAT was taken to be part of the overall repeating unit, for example, the GenBank structure given above has two such insertions in an otherwise perfect repeating unit of 21 AATATs, and thus, for this structure, the repeat number assigned would be 23.

Multiplex PCR and data collection

The five microsatellites were optimized for amplification in a single multiplex reaction as follows: Each sample was amplified in a multiplex PCR consisting of five primer pairs. One primer of each pair was fluorescently labeled with either TET, HEX, or FAM (Table 1). The PCR reactions were performed in a final volume of 10 μl consisting of 20 ng of DNA, 1 × PCR buffer (10 mM Tris-HCl, pH 9.0, 50 mM KCl, 0.1% Triton X-100, 0.01% (w/v) gelatin), 2.2 mM MgCl₂, 300 μM dNTPs, 0.13 U SuperTaq^® DNA polymerase enzyme (HT Biotechnology Ltd.), and 0.357 μg TaqStart^® Antibody (Clontech). The primer sequences and concentrations used in the multiplex PCR are given in Table 1. Touchdown PCR cycling method was used, which consisted of a preincubation step at 94°C for 10 min to denature the TaqStart^® Antibody. The first eight cycles consisted of denaturation at 94°C for 1 min, annealing at 60°C for 1 min, and extension at 72°C for 1 min; the annealing temperature was reduced by 0.5°C in each cycle. After these 8 cycles, 30 more cycles were performed, which consisted of denaturation at 94°C for 1 min, annealing at 56°C for 1 min, and extension at 72°C for 1 min. After the completion of the 38 cycles, a final extension step was carried out at 72°C for 5 min. The amplified product (0.3 μl) was mixed with TAMRA350 lane size standard as per the manufacturer's recommendations, and the samples were separated on 5% denaturing polyacrylamide gels using an ABI 377 DNA sequencer. The data were collected using the ABI Collection^® software supplied by the manufacturer, the fragment sizes were estimated using the GeneScan^® software (version 2.1), and the alleles were scored using the Genotyper^® software (version 2.0). Allelic sizes were verified by sequencing, and the correspondence between the number of repeat units and the allele size in base pairs was determined. The allele sizes of all the samples were calibrated based upon the number of repeat units. A set of DNA samples were used as a size standard in each gel to correct any gel-to-gel variation. Male specificity of the markers was checked with the help of female DNA, which was then used as a negative control in all the amplifications.

Table 1.

Primer Sequences and Concentrations Used in the Multiplex PCR

Primer	Primer sequences	Dye label	Repeat unit	Size (bp)	Final concentration (μM)
DYS648 F	CGACCATAAGGCTGCAGTT	FAM	CATT	250	0.1
DYS648 R	GAAGCACAAGAGTTGCCTGA				0.1
DYS649 F	TGACATCTTTGTGGCAATCTG	TET	TTTC	116	0.2
DYS649 R	GGTGAGAGTGCAAG				0.2
DYS650 F	TCACATGCCTACAGTCACAGC	HEX	AAGG	288	0.1
DYS650 R	CTCTTTCTCCCTTCCCACCT				0.1
DYS651 F	GGCTAGAGAGGGGGAATCAC	FAM	AGGGAA	304	0.1
DYS651 R	AAAAATCAGGGGAGGCATTT				0.1
DYS652 F	CACAGCATGGCTTGGTTTTA	HEX	AATAT	204	0.1
DYS652 R	TTTGCGTTATCTCTGCCTTTC				0.1

The current data of five markers along with the previously reported 16 STR data of the ethnic groups under study were compared to assess the differences in different comparison groups. Gene frequencies of the alleles were obtained for each locus or haplotype by gene/haplotype counting, and the gene/haplotype diversity and standard errors were calculated as described earlier (Ayub et al., 2000). The G_ST values were calculated for all the loci as well as each individual locus, using the software Dispan^®. The network of the most common haplotypes (shared by more than 10 individuals) was constructed using the program Network 4.5 (www.fluxus-engineering.com).

SNP typing

The Y-chromosome Alu element polymorphism (YAP) element was typed as described previously (Hammer, 1994; Qamar et al., 1999). Briefly, primers flanking the Alu insertion were used to amplify the region by using the PCR conditions as reported by Hammer and Horai (1995); after amplification, the samples were separated on 2% agarose gels. The gels were then stained in ethidium bromide and the bands were visualized by UV transillumination. The samples were typed according to the size of the band observed.

Results

A total of five markers were amplified for 740 samples collected from different Pakistani ethnic groups. Table 2 shows the repeat units, fragment sizes in base pairs, allele frequencies, and G_ST for all the five markers analyzed in the 12 different ethnic groups. The number of alleles ranged from 4 to 12 for all the five markers (Table 3). Two of the markers were highly polymorphic with the number of alleles ranging from 11 (DYS650) to 12 (DYS652) (Table 3).

Table 2.

Allele Size and Frequency Distributions and G_ST Values of Different loci

Locus	GenBank	Repeat units	Size (bp)	n	Allele frequencies	G_ST values
DYS648	BV005731					0.076
		6	238	2	0.0027
		7	242	2	0.0027
		8	246	342	0.4622
		9	250	394	0.5324
DYS649	BV005734					0.083
		8	116	650	0.8784
		9	120	68	0.0919
		10	124	21	0.0284
		11	128	1	0.0014
DYS650	BV005732					0.077
		12	268	2	0.0027
		13	272	38	0.0514
		14	276	15	0.0203
		15	280	112	0.1514
		16	284	82	0.1108
		17	288	299	0.4041
		18	292	135	0.1824
		19	296	47	0.0635
		20	300	8	0.0108
		21	304	1	0.0014
		24	308	1	0.0014
DYS651	BV005733					0.094
		7	280	7	0.0095
		8	286	1	0.0014
		9	292	10	0.0135
		10	298	53	0.0716
		11	304	580	0.7838
		12	310	27	0.0365
		13	316	26	0.0351
		15	322	36	0.0486
DYS652	BV005730					0.143
		18	179	3	0.0041
		19	184	11	0.0149
		20	189	3	0.0041
		21	194	1	0.0014
		22	199	15	0.0203
		23	204	129	0.1743
		24	209	201	0.2716
		25	214	190	0.2568
		26	219	119	0.1608
		27	224	56	0.0757
		28	229	5	0.0068
		29	234	7	0.0095

Table 3.

Locus Diversity in the Pakistani Ethnic Groups

STR	No. of alleles	Pakistan (SD) (n = 740)	Baluch (SD) (n = 61)	Balti (SD) (n = 14)	Brahui (SD) (n = 94)	Burusho (SD) (n = 72)	Hazara (SD) (n = 27)	Kalash (SD) (n = 42)	Makrani (SD) (n = 58)	Myo (SD) (n = 18)	Pathan (SD) (n = 88)	Parsi (SD) (n = 83)	Mohanna (SD) (n = 66)	Sindhi (SD) (n = 117)
DYS648	4	0.502	0.507	0.440	0.505	0.560	0.519	0.396	0.506	0.471	0.485	0.453	0.416	0.442
		(0.003)	(0.00)	(0.073)	(0.002)	(0.017)	(0.005)	(0.046)	(0.006)	(0.052)	(0.015)	(0.024)	(0.033)	(0.021)
DYS649	4	0.236	0.236	0.000	0.228	0.203	0.510	0.345	0.220	0.209	0.255	0.093	0.198	0.158
		(0.014)	(0.047)	(0.000)	(0.037)	(0.042)	(0.045)	(0.051)	(0.048)	(0.081)	(0.038)	(0.030)	(0.045)	(0.030)
DYS650	11	0.750	0.730	0.659	0.780	0.757	0.724	0.740	0.790	0.595	0.761	0.753	0.500	0.705
		(0.008)	(0.025)	(0.082)	(0.018)	(0.018)	(0.023)	(0.029)	(0.024)	(0.074)	(0.016)	(0.013)	(0.050)	(0.023)
DYS651	8	0.384	0.432	0.385	0.302	0.318	0.484	0.138	0.499	0.366	0.384	0.620	0.117	0.234
		(0.016)	(0.050)	(0.104)	(0.041)	(0.040)	(0.034)	(0.050)	(0.050)	(0.077)	(0.042)	(0.029)	(0.038)	(0.035)
DYS652	12	0.799	0.810	0.670	0.650	0.851	0.670	0.719	0.801	0.392	0.744	0.769	0.579	0.734
		(0.004)	(0.012)	(0.085)	(0.032)	(0.012)	(0.030)	(0.023)	(0.013)	(0.092)	(0.020)	(0.019)	(0.043)	(0.014)
Haplotype		0.965	0.969	0.758	0.904	0.945	0.835	0.859	0.970	0.863	0.944	0.942	0.662	0.942
		(0.002)	(0.006)	(0.032)	(0.017)	(0.010)	(0.032)	(0.022)	(0.009)	(0.045)	(0.011)	(0.010)	(0.046)	(0.008)

Locus diversity and allele distribution pattern in Pakistani ethnic groups

Two markers DYS650 and DYS652 showed highest locus diversity in the five markers studied here, 0.750 and 0.799, respectively. The DYS649 locus was the least polymorphic (diversity = 0.236) with only four different alleles (Table 3), of which the eight-repeat unit containing allele was found at a high frequency (87.8% in Pakistan) (Table 2) and also in all the ethnic groups individually (data not shown). DYS651 also had one common allele at high frequency (11-repeat unit at 78.4%), with the other alleles occurring at lower frequencies. This locus also had comparatively low diversity values in Pakistan (0.384). The DYS648 had four alleles with only two being the modal alleles, the eight-repeat unit occurring at 46.2% and the nine-repeat unit found at 53.2% (Table 2). The overall G_ST value for all the loci was 9.9%, and for the individual markers, it varied from about 7.6% (DYS648, DYS650) to 14.3% (DYS652) (Table 2).

Haplotype diversity in the Pakistani ethnic groups

Individual haplotypes were constructed by combining the data for all the five loci; this resulted in a total of 152 haplotypes in the 740 samples. Of these, 70 different haplotypes were present in single individuals, that is, they were unique and not shared by other individuals in the study population. In addition, 26 different haplotypes were shared by two individuals each and 11 different haplotypes by three individuals each, and subsequently, the number of individuals sharing any single haplotype increased progressively (data not shown). Two haplotypes were found more frequently, the haplotype 9_8_17_11_24 and haplotype 9_8_17_11_25 (allele repeat units represent DYS648, 649, 650, 651, and 652, respectively). The first one was shared by 100 individuals (13.5%), and of these, 38 individuals were from the Mohanna population, that is, 57.5% of the total Mohanna population. The second most frequent haplotype was shared by 64 individuals (8.6%), and of these, 27 individuals were from the Brahui population (29% of total Brahui population) (Fig. 1). Another haplotype 9_8_18_11_23 also had a high degree of population-specific trend and was found to be shared by 19 individuals, of which 18 were Pathan, which consisted of 20% of the total Pathan population (Fig. 1). The other haplotypes were less frequent. The overall haplotype diversity present in Pakistan was 0.965; the Mohanna (0.662) were least diverse, whereas the Makrani (0.970) were most diverse (Table 3). The phylogenetic network of the most common haplotypes reveals that the Y-chromosomes in majority of the Mohanna are represented by a single haplotype (Fig. 2) and the rest of the chromosomes are separated from the most common haplotype by one or two mutation events. Analysis of the network reveals that only the Mohanna are concentrated around the most common haplotype in this population; other populations, for example, the Brahui, do form a central cluster around the second most common haplotype in the Pakistani population but are also found in other haplotypes, which are several mutation events separated from the most common haplotype. Similarly, the Pathan form a cluster around the most common haplotype in this population consisting of 18 chromosomes, but, like the Brahui, the Pathan are also found spread all over the network. The haplotype distribution in all the other populations under study here also show a similar type of distribution as the Brahui and the Pathan, except for some smaller populations in which the sample size is too small to get an effective analysis, and thus, only the Mohanna are unique in their haplotype distribution pattern. The network analysis of all the 15 Mohanna haplotypes reveals grouping of most of the samples in a single haplotype (Fig. 2).

FIG. 1.

Overall distribution pattern of the 22 most frequent haplotypes in the populations studied. Color images available online at www.liebertonline.com/gtmb.

FIG. 2.

Distribution pattern of 15 Mohanna haplotypes.

SNP analysis

The YAPs, in which the YAP⁺ chromosomes are diagnostic of an African ancestry, were studied in the Mohanna samples. Of the 66 Mohanna chromosomes studied, none was found to contain the Alu element insertion. In comparison, other groups with a reported African ancestry, for example, the Makrani, had at least some YAP⁺ chromosomes, thus supporting their oral history and physical appearance similar to present-day African populations (Qamar et al., 1999).

Discussion

We have previously demonstrated that the method of identifying new microsatellites using sequence database information is feasible and very efficient (Ayub et al., 2000). The utility of these makers in population genetic studies has been also demonstrated by us (Qamar et al., 2002). The present study also validates our previous findings, as we show the utility of using these microsatellite markers in population genetic studies and forensic analysis. We also demonstrate that the current set of markers is polymorphic and male specific. Diversity values of two of these loci, the tetranucleotide repeat (DYS650) and the pentanucleotide repeat (DYS652), were very high (Table 3), with 11 and 12 different alleles, respectively. It is worth mentioning that, of the 21 Y-STRs studied in these populations, the diversity of these two loci is amongst the highest, with DYS652 diversity being the maximum (Table 4). Such polymorphic markers have a high power of discrimination and are suitable for applications in forensic science, including paternity determination and individual identification. However, it should be noted that the G_ST value for only DYS652 was relatively high at 0.143, and for the locus DYS650, it was comparatively lower at 0.077, thus indicating a relatively lesser genetic variation for the latter locus amongst the different populations studied. Although DYS650 had a high degree of diversity and high number of alleles, that is, 11, its lower G_ST values might preclude it from being a good marker for population studies. This dichotomy in high diversity but low G_ST values can be seen clearly from a review of the distribution pattern of the allele frequencies in the different populations. Raw data showed that, in the case of DYS650, eight of the ethnic groups have the same modal allele of 17 repeats, and in only four of the remaining ethnic groups, the modal allele was different. As opposed to this, when the DYS652 locus was reviewed it was observed that this locus had a more even distribution pattern of the alleles across different ethnic groups, which was also supported by the high G_ST value of this locus across all the different ethnic groups studied. Markers with high diversity and high G_ST values are ideal for population genetic studies; however, less polymorphic loci when used in combination with other microsatellite markers can provide a less-biased view of the evolutionary relationships between groups of Y-chromosomes from different ethnic backgrounds and are especially useful in comparing closely related populations.

Table 4.

Arrangement of the Data with Simple and Complex Repeats

Locus	Repeated units	Status	Diversity	G_ST
DYS435	4	Simple	0.064	0.026
DYS436	3	Simple	0.083	0.069
DYS434	4	Simple	0.177	0.117
DYS649	4	Simple	0.236	0.083
DYS651	5	Simple	0.384	0.094
DYS388	3	Simple	0.454	0.058
DYS391	3	Simple	0.467	0.082
DYS425	3	Simple	0.47	0.096
DYS648	4	Simple	0.502	0.076
DYS426	3	Simple	0.53	0.061
DYS392	3	Simple	0.601	0.097
DYS393	4	Simple	0.678	0.128
DYS438	5	Simple	0.684	0.13
DYS650	4	Simple	0.75	0.077
DYS19	4	Complex	0.714	0.073
DYS439	4	Complex	0.721	0.062
DYS390	4	Complex	0.766	0.07
DYS652	6	Complex	0.799	0.143
DYS389I	4	Complex	0.604	0.091
DYS437	4	Complex	0.618	0.129
DYS389B	4	Complex	0.678	0.131

Another locus (DYS651) showed an intermediate level of variation with eight different alleles having comparatively lower G_ST value at 0.094. With a qualitative review of the structuring of the distribution pattern, it was revealed that the 11-repeat unit was the modal in all the ethnic groups studied. Thus, the lower diversity of DYS651 is a result of the lesser number of alleles and their distribution pattern in the different ethnic groups. The low G_ST value of DYS651 is a result of the concentration of the distribution pattern in all the ethnic groups along a single modal allele, and thus, no significant genetic difference occurred between the ethnic groups. The loci DYS648 and DYS649 were the least variable loci, with only four alleles. The former locus had relatively higher diversity due to the presence of two major alleles in all the ethnic groups, whereas the latter had a lower diversity due to clustering of samples around one allele, that is, the one containing the eight-repeat unit.

The present study revealed an interesting aspect of the Hazara population: a high diversity value (0.510) for the DYS649 locus was observed. It has been previously shown that the Hazara are the least diverse amongst the Pakistani ethnic groups (Qamar et al., 2002). However, in comparison with the previous observation, the high diversity value of the DYS649 is not as surprising as it seems. This locus is the least polymorphic in the rest of the ethnic groups, as all of them have the 8-repeat unit as the modal, with more than 80% of group members sharing this allele, but in the Hazara only 63% of the sampled individuals had the 8-repeat unit and about 33% had the 10-repeat unit. This agrees very well with the way this ethnic group is structured; our previous study has also shown that this group has mostly Y-chromosomal haplogroup R and C (previously labeled as 1 and 10), as well as Y-STRs, which are usually distributed over two to five closely related modal alleles (separated by one to five mutational steps) (Qamar et al., 2002). We thus postulate that the Hazara have probably arisen from the Y-chromosomal contribution of more than one individual; our previous admixture analysis also points to a substantial contribution from Mongols (Qamar et al., 2002), but does not rule out multiple lineage contributions to this ethnic group. When one looks at the other four loci, a similar distribution pattern is observed for the Hazara: two to three alleles account for 93%-100% of the subjects, which could be either due to a particular pattern of repeat expansion in this ethnic group or due to contributions from more than one lineage.

Using all the five microsatellites, we constructed highly informative haplotypes that provided an insight into the ancestral relationships and male lineages of some of the populations. For example, the Hazara ethnic group had comparatively low haplotype diversity (0.835) (Table 3), with two haplotypes accounting for 80% of the ethnic group (9-8-17-11-23 and 8-10-15-10-26). This was also demonstrated in earlier studies of this ethnic group based upon variation at 16 Y-biallelic and 16 Y-STR loci (Ayub et al., 2000; Qamar et al., 2002).

In addition to the Hazara, another population of interest, which is being reported for the first time here, is the Mohanna—this ethnic group represents the ancient fishermen tribes who reside along the Indus river in southern Pakistan, and their oral history lays claim to them being the oldest population of the Indian subcontinent. This group has mostly been marrying amongst themselves and has lived in small groups in select areas of southern Pakistan. In addition, they have also undergone multiple bottlenecks caused by different invaders, which resulted in significant decrease in their population size. This is reflected in the haplotype network of this ethnic group (Fig. 2), in which all the chromosomes cluster around the most common haplotype, 9-8-17-11-24. On typing the DRD4 locus, we have previously observed that the Mohanna had considerable African and particularly Ethiopian ancestry (Mansoor et al., 2008), which was in agreement with the observations of Quintana-Murci et al. (2004), who proposed an Ethiopian link of the populations residing in the Indus valley, based upon the presence of haplogroup U9 in this area. In the present study, the absence of YAP⁺ signature chromosomes in the Mohanna could be either a result of a loss of these chromosomes in this population due to subsequent admixture with different invading populations or the origin of this group from a subset of YAP-negative African population.

The low haplotype diversity in the Balti (0.758) (Table 3) could be a result of the small sample size (n = 14) studied for this ethnic group. Previously, population-specific haplotypes were also observed in the Brahui, Kalash, and Parsi ethnic groups (Mohyuddin et al., 2001). The population-specific clustering of the haplotype (9_8_17_11_25) in 29% Brahui (Fig. 1) is in agreement with our previous data in which this ethnic group had 16 individuals sharing a single haplotype across 16 STRs (Mohyuddin et al., 2001); this clustering is probably a result of a population substructuring, which is specific to the Brahui. Similarly, 20% Pathan shared the haplotype 9_8_18_11_23, which also seems to be very population specific (at least for these five Y-STRs) as only one other individual shared this haplotype amongst all the ethnic groups studied (Fig. 1). This sharing of the haplotype in the Pathan seems to be a unique phenomenon of these five Y-STRs, because our previous studies do not show such a sharing of haplotypes by the Pathan.

Conclusions

The detailed analysis of these polymorphic microsatellites on the Y-chromosome offers new possibilities for investigation of forensic caseworks and in the studies of human population substructure in closely related populations. The selection of markers should be based upon not only the diversity values but also the haplotype distribution, which is an equally important criterion when choosing between different markers for forensic casework and human population studies. Of the five markers reported here, DYS652 is a strong candidate for use in forensic casework, and the other markers can provide insight into population substructuring and would thus be useful in human population studies.

Footnotes

Acknowledgments

The authors are grateful to all the blood donors for their help in this project. The authors are also thankful to a number of our colleagues for their assistance in this work and especially to Chris-Tyler Smith for his valuable advice in setting the multiplex PCR. This work was supported by a core grant to the Institute of Biomedical and Genetic Engineering from the Government of Pakistan.

Disclosure Statement

No competing financial interests exist.

References

Ayub

, Mansoor

, Ismail

et al. 2003. Reconstruction of human evolutionary tree using polymorphic autosomal microsatellites. Am J Phys Anthropol, 122:259-268.

Ayub

, Mohyuddin

, Qamar

et al. 2000. Identification and characterization of novel human Y-chromosomal microsatellites from sequence database information. Nucleic Acid Res, 28:e8i-v.

de Knijff

, Kayser

, Caglia

et al. 1997. Chromosome Y microsatellites: population genetic and evolutionary aspects. Int J Legal Med, 110:134-149.

Deka

, Jin

, Shriver

et al. 1996. Dispersion of human Y chromosome haplotypes based on five microsatellites in global populations. Genome Res, 6:1177-1184.

Hammer

. 1994. A recent insertion of an Alu element on the Y chromosome is a useful marker for human population studies. Mol Biol Evol, 11:749-761.

Hammer

, Horai

. 1995. Y chromosomal DNA variation and the peopling of Japan. Am J Hum Genet, 56:951-962.

Hurles

, Jobling

. 2001. Haploid chromosomes in molecular ecology: lessons from the human Y. Mol Ecol, 10:1599-1613.

Jobling

. 2001. In the name of the father: surnames and genetics. Trends in Genet, 17:353-375.

Jobling

, Pandya

, Tyler-Smith

. 1997. The Y chromosome in forensic analysis and paternity testing. Int J Legal Medicine, 110:118-124.

10.

Jobling

, Tyler-Smith

. 2000. New uses for new haplotypes: the human Y chromosome, disease and selection. Trends Genet, 16:356-362.

11.

Kayser

, Caglia

, Corach

et al. 1997. Evaluation of Y-chromosomal STRs: a multicenter study. Int J Legal Med, 110:125-133, 141-149.

12.

Kayser

, Krawczak

, Excoffier

et al. 2001. An extensive analysis of Y-chromosomal microsatellite haplotypes in globally dispersed human populations. Am J Hum Genet, 68:990-1018.

13.

Mansoor

, Mazhar

, Qamar

. 2008. VNTR polymorphism of the DRD4 locus in different Pakistani ethnic groups. Genet Test, 12:299-304.

14.

Mohyuddin

, Ayub

, Qamar

et al. 2001. Y-chromosomal STR haplotypes in Pakistani populations. For Sci Int, 118:141-146.

15.

Mohyuddin

, Ayub

, Siddiqi

et al. 2004. Genetic instability in EBV-transformed lymphoblastoid cell lines. Biochim Biophys Acta, 1670:81-83.

16.

Qamar

, Ayub

, Khaliq

et al. 1999. African and Levantine origins of Pakistani YAP+ Y chromosomes. Hum Biol, 71:745-755.

17.

Qamar

, Ayub

, Mohyuddin

et al. 2002. Y-chromosomal DNA variation in Pakistan. Am J Hum Genet, 70:1107-1124.

18.

Quintana-Murci

, Chaix

, Wells

et al. 2004. Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor. Am J Hum Genet, 74:827-845.

19.

Redd

, Clifford

, Stoneking

. 1997. Multiplex DNA typing of short-tandem- repeat loci on the Y-chromosome. Biol Chem, 378:923-927.

20.

Roewer

, Arnemann

, Spurr

et al. 1992. Simple repeat sequences on the human Y chromosome are equally polymorphic as their autosomal counterparts. Hum Genet, 89:389-394.

21.

Sambrook

, Fritsch

, Maniatis

. 1989. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Press: New York.

22.

Tautz

. 1989. Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acid Res, 17:6463-6471.

23.

Tyler-Smith

. 1999. Y-chromosomal DNA markers. Papiha

, Deka

, Chakraborty

. Genomic Diversity: Applications in Human Population Genetics. Kluwer Academic/Plenum Publishers: New York, 65-73.

24.

Walls

, Crawford

. 1987. Generation of lymphoblastoid cell lines using Epstein-Barr virus. Klaus

GGB

. Lymphocytes, a practical approach. IRL Press: Oxford, 157.

25.

Weber

, May

. 1986. Abundant class of human DNA polymorphisms which can be typed using the Polymerase Chain Reaction. Am J Hum Genet, 44:388-396.