Abstract
The Y chromosome microsatellite markers have been extensively used for population genetic studies and in individual identification and paternity testing in forensic medicine. In the present study, we report the data of five male-specific, polymorphic microsatellites in 740 unrelated male individuals from 12 different ethnic groups of Pakistan. The overall diversities of these individual loci in Pakistan ranged from 0.236 to 0.799. The total haplotypes identified were 152, and of these, 70 different haplotypes were present in only single individuals. Two haplotypes were found more frequently, 9_8_17_11_24 (13.5%) and 9_8_17_11_25 (8.6%), showing population-specific clustering in the Mohanna and the Brahui, respectively. An overall haplotype diversity of 0.965 in Pakistan suggested a high power of discrimination for these loci. Few populations, particularly the Mohanna and the Balti, showed lower haplotype diversity values for these loci (0.662 and 0.758, respectively). This set of microsatellite loci reported in the study can be used for population genetics and forensic medicine analysis. This study also demonstrates the importance of studying haplotype distribution pattern in population genetics.
Introduction
To get further insight into population structure and also to increase the power of discrimination for forensic samples, we have studied five Y-chromosomal STRs, beyond the previously reported 16 Y markers (Mohyuddin et al., 2004). Variability of these STRs was studied in 12 different Pakistani ethnic groups living in different parts of the country. These populations included Baluch, Brahui, Makrani, Sindhi, Parsi, Mohanna, Myo, Pathan, Burusho, Kalash, Hazara, and Balti; a detailed description of these populations has been given earlier (Qamar et al., 2002). In addition to the populations described by Qamar et al. (2002), the Mohanna reported here are a group of fishermen who reside in the Sindh province. Very little is known about their history, but it is believed that they were the original inhabitants of the Indian subcontinent who were subsequently replaced by different invading ethnic groups. Today, little pockets of this ethnic group still exist in the Sindh province of Pakistan around the river Indus.
We report here the variation of five Y-chromosomal microsatellite makers in 12 different ethnic groups of Pakistan and also demonstrate the importance of obtaining haplotype diversity and distribution pattern of any marker before its selection for population study or forensic casework.
Materials and Methods
Subjects
The allelic frequencies of five microsatellite loci were studied in 740 unrelated male individuals belonging to the following 12 different ethnic groups of Pakistan: Baluch (n = 61), Brahui (n = 94), Makrani (n = 58), Sindhi (n = 117), Parsi (n = 83), Mohanna (n = 66), Myo (n = 18), Pathan (n = 88), Burusho (n = 72), Kalash (n = 42), Hazara (n = 27), Balti (n = 14). Details of the sampling region of the ethnic groups along with their map location have been reported earlier (Qamar et al., 2002). The ethnic origin of each individual was confirmed, and after obtaining informed written consent, their blood samples were collected and lymphoblastoid cell lines were established for each individual (Walls and Crawford, 1987). DNA was extracted from these lymphoblastoid cell lines using a standard organic extraction method (Sambrook et al., 1989). This study has been approved by the Departmental Review Board/Ethics Committee, and subsequently, the article submission was also approved by the Institutional Review Board and Ethics Committee of Shifa International Hospital. This study conforms to the principles expressed in the Declaration of Helsinki.
Marker selection
Some preliminary work on the markers used in this study has been described previously (Mohyuddin et al., 2004), in which they were only used to study the genetic instability of Epstein-Barr virus (EBV)-transformed lymphoblastoid cell lines. These markers were identified, according to the method described earlier (Ayub et al., 2000), from the Y-chromosomal DNA sequence data of the Whitehead Institute/MIT Genome Sequencing Project's database (www-seq.wi.mit.edu/public_release/humanY.shtm).
Marker structure
The loci studied consisted of four simple repeats and one complex repeat. Of the simple repeats, three were tetranucleotide repeats, CATT repeat (DYS648; modal unit number = 9), TTTC repeat (DYS649; modal unit number = 8), and AAGG repeat (DYS650; modal unit number = 17), and the fourth repeat was a hexanucleotide, AAAGGG repeat (DYS651; modal unit number = 11). The fifth repeat was a complex pentanucleotide repeat AATAT (DYS652; modal unit number = 24) and its GenBank structure is represented as follows: (AATAT)6(AAAAT)1(AATAT)9(AAAAT)1(AATAT)6. In counting the repeat units for the DYS652 locus, the insertion AAAAT was taken to be part of the overall repeating unit, for example, the GenBank structure given above has two such insertions in an otherwise perfect repeating unit of 21 AATATs, and thus, for this structure, the repeat number assigned would be 23.
Multiplex PCR and data collection
The five microsatellites were optimized for amplification in a single multiplex reaction as follows: Each sample was amplified in a multiplex PCR consisting of five primer pairs. One primer of each pair was fluorescently labeled with either TET, HEX, or FAM (Table 1). The PCR reactions were performed in a final volume of 10 μl consisting of 20 ng of DNA, 1 × PCR buffer (10 mM Tris-HCl, pH 9.0, 50 mM KCl, 0.1% Triton X-100, 0.01% (w/v) gelatin), 2.2 mM MgCl2, 300 μM dNTPs, 0.13 U SuperTaq® DNA polymerase enzyme (HT Biotechnology Ltd.), and 0.357 μg TaqStart® Antibody (Clontech). The primer sequences and concentrations used in the multiplex PCR are given in Table 1. Touchdown PCR cycling method was used, which consisted of a preincubation step at 94°C for 10 min to denature the TaqStart® Antibody. The first eight cycles consisted of denaturation at 94°C for 1 min, annealing at 60°C for 1 min, and extension at 72°C for 1 min; the annealing temperature was reduced by 0.5°C in each cycle. After these 8 cycles, 30 more cycles were performed, which consisted of denaturation at 94°C for 1 min, annealing at 56°C for 1 min, and extension at 72°C for 1 min. After the completion of the 38 cycles, a final extension step was carried out at 72°C for 5 min. The amplified product (0.3 μl) was mixed with TAMRA350 lane size standard as per the manufacturer's recommendations, and the samples were separated on 5% denaturing polyacrylamide gels using an ABI 377 DNA sequencer. The data were collected using the ABI Collection® software supplied by the manufacturer, the fragment sizes were estimated using the GeneScan® software (version 2.1), and the alleles were scored using the Genotyper® software (version 2.0). Allelic sizes were verified by sequencing, and the correspondence between the number of repeat units and the allele size in base pairs was determined. The allele sizes of all the samples were calibrated based upon the number of repeat units. A set of DNA samples were used as a size standard in each gel to correct any gel-to-gel variation. Male specificity of the markers was checked with the help of female DNA, which was then used as a negative control in all the amplifications.
The current data of five markers along with the previously reported 16 STR data of the ethnic groups under study were compared to assess the differences in different comparison groups. Gene frequencies of the alleles were obtained for each locus or haplotype by gene/haplotype counting, and the gene/haplotype diversity and standard errors were calculated as described earlier (Ayub et al., 2000). The GST values were calculated for all the loci as well as each individual locus, using the software Dispan®. The network of the most common haplotypes (shared by more than 10 individuals) was constructed using the program Network 4.5 (www.fluxus-engineering.com).
SNP typing
The Y-chromosome Alu element polymorphism (YAP) element was typed as described previously (Hammer, 1994; Qamar et al., 1999). Briefly, primers flanking the Alu insertion were used to amplify the region by using the PCR conditions as reported by Hammer and Horai (1995); after amplification, the samples were separated on 2% agarose gels. The gels were then stained in ethidium bromide and the bands were visualized by UV transillumination. The samples were typed according to the size of the band observed.
Results
A total of five markers were amplified for 740 samples collected from different Pakistani ethnic groups. Table 2 shows the repeat units, fragment sizes in base pairs, allele frequencies, and GST for all the five markers analyzed in the 12 different ethnic groups. The number of alleles ranged from 4 to 12 for all the five markers (Table 3). Two of the markers were highly polymorphic with the number of alleles ranging from 11 (DYS650) to 12 (DYS652) (Table 3).
Locus diversity and allele distribution pattern in Pakistani ethnic groups
Two markers DYS650 and DYS652 showed highest locus diversity in the five markers studied here, 0.750 and 0.799, respectively. The DYS649 locus was the least polymorphic (diversity = 0.236) with only four different alleles (Table 3), of which the eight-repeat unit containing allele was found at a high frequency (87.8% in Pakistan) (Table 2) and also in all the ethnic groups individually (data not shown). DYS651 also had one common allele at high frequency (11-repeat unit at 78.4%), with the other alleles occurring at lower frequencies. This locus also had comparatively low diversity values in Pakistan (0.384). The DYS648 had four alleles with only two being the modal alleles, the eight-repeat unit occurring at 46.2% and the nine-repeat unit found at 53.2% (Table 2). The overall GST value for all the loci was 9.9%, and for the individual markers, it varied from about 7.6% (DYS648, DYS650) to 14.3% (DYS652) (Table 2).
Haplotype diversity in the Pakistani ethnic groups
Individual haplotypes were constructed by combining the data for all the five loci; this resulted in a total of 152 haplotypes in the 740 samples. Of these, 70 different haplotypes were present in single individuals, that is, they were unique and not shared by other individuals in the study population. In addition, 26 different haplotypes were shared by two individuals each and 11 different haplotypes by three individuals each, and subsequently, the number of individuals sharing any single haplotype increased progressively (data not shown). Two haplotypes were found more frequently, the haplotype 9_8_17_11_24 and haplotype 9_8_17_11_25 (allele repeat units represent DYS648, 649, 650, 651, and 652, respectively). The first one was shared by 100 individuals (13.5%), and of these, 38 individuals were from the Mohanna population, that is, 57.5% of the total Mohanna population. The second most frequent haplotype was shared by 64 individuals (8.6%), and of these, 27 individuals were from the Brahui population (29% of total Brahui population) (Fig. 1). Another haplotype 9_8_18_11_23 also had a high degree of population-specific trend and was found to be shared by 19 individuals, of which 18 were Pathan, which consisted of 20% of the total Pathan population (Fig. 1). The other haplotypes were less frequent. The overall haplotype diversity present in Pakistan was 0.965; the Mohanna (0.662) were least diverse, whereas the Makrani (0.970) were most diverse (Table 3). The phylogenetic network of the most common haplotypes reveals that the Y-chromosomes in majority of the Mohanna are represented by a single haplotype (Fig. 2) and the rest of the chromosomes are separated from the most common haplotype by one or two mutation events. Analysis of the network reveals that only the Mohanna are concentrated around the most common haplotype in this population; other populations, for example, the Brahui, do form a central cluster around the second most common haplotype in the Pakistani population but are also found in other haplotypes, which are several mutation events separated from the most common haplotype. Similarly, the Pathan form a cluster around the most common haplotype in this population consisting of 18 chromosomes, but, like the Brahui, the Pathan are also found spread all over the network. The haplotype distribution in all the other populations under study here also show a similar type of distribution as the Brahui and the Pathan, except for some smaller populations in which the sample size is too small to get an effective analysis, and thus, only the Mohanna are unique in their haplotype distribution pattern. The network analysis of all the 15 Mohanna haplotypes reveals grouping of most of the samples in a single haplotype (Fig. 2).

Overall distribution pattern of the 22 most frequent haplotypes in the populations studied. Color images available online at www.liebertonline.com/gtmb.

Distribution pattern of 15 Mohanna haplotypes.
SNP analysis
The YAPs, in which the YAP+ chromosomes are diagnostic of an African ancestry, were studied in the Mohanna samples. Of the 66 Mohanna chromosomes studied, none was found to contain the Alu element insertion. In comparison, other groups with a reported African ancestry, for example, the Makrani, had at least some YAP+ chromosomes, thus supporting their oral history and physical appearance similar to present-day African populations (Qamar et al., 1999).
Discussion
We have previously demonstrated that the method of identifying new microsatellites using sequence database information is feasible and very efficient (Ayub et al., 2000). The utility of these makers in population genetic studies has been also demonstrated by us (Qamar et al., 2002). The present study also validates our previous findings, as we show the utility of using these microsatellite markers in population genetic studies and forensic analysis. We also demonstrate that the current set of markers is polymorphic and male specific. Diversity values of two of these loci, the tetranucleotide repeat (DYS650) and the pentanucleotide repeat (DYS652), were very high (Table 3), with 11 and 12 different alleles, respectively. It is worth mentioning that, of the 21 Y-STRs studied in these populations, the diversity of these two loci is amongst the highest, with DYS652 diversity being the maximum (Table 4). Such polymorphic markers have a high power of discrimination and are suitable for applications in forensic science, including paternity determination and individual identification. However, it should be noted that the GST value for only DYS652 was relatively high at 0.143, and for the locus DYS650, it was comparatively lower at 0.077, thus indicating a relatively lesser genetic variation for the latter locus amongst the different populations studied. Although DYS650 had a high degree of diversity and high number of alleles, that is, 11, its lower GST values might preclude it from being a good marker for population studies. This dichotomy in high diversity but low GST values can be seen clearly from a review of the distribution pattern of the allele frequencies in the different populations. Raw data showed that, in the case of DYS650, eight of the ethnic groups have the same modal allele of 17 repeats, and in only four of the remaining ethnic groups, the modal allele was different. As opposed to this, when the DYS652 locus was reviewed it was observed that this locus had a more even distribution pattern of the alleles across different ethnic groups, which was also supported by the high GST value of this locus across all the different ethnic groups studied. Markers with high diversity and high GST values are ideal for population genetic studies; however, less polymorphic loci when used in combination with other microsatellite markers can provide a less-biased view of the evolutionary relationships between groups of Y-chromosomes from different ethnic backgrounds and are especially useful in comparing closely related populations.
Another locus (DYS651) showed an intermediate level of variation with eight different alleles having comparatively lower GST value at 0.094. With a qualitative review of the structuring of the distribution pattern, it was revealed that the 11-repeat unit was the modal in all the ethnic groups studied. Thus, the lower diversity of DYS651 is a result of the lesser number of alleles and their distribution pattern in the different ethnic groups. The low GST value of DYS651 is a result of the concentration of the distribution pattern in all the ethnic groups along a single modal allele, and thus, no significant genetic difference occurred between the ethnic groups. The loci DYS648 and DYS649 were the least variable loci, with only four alleles. The former locus had relatively higher diversity due to the presence of two major alleles in all the ethnic groups, whereas the latter had a lower diversity due to clustering of samples around one allele, that is, the one containing the eight-repeat unit.
The present study revealed an interesting aspect of the Hazara population: a high diversity value (0.510) for the DYS649 locus was observed. It has been previously shown that the Hazara are the least diverse amongst the Pakistani ethnic groups (Qamar et al., 2002). However, in comparison with the previous observation, the high diversity value of the DYS649 is not as surprising as it seems. This locus is the least polymorphic in the rest of the ethnic groups, as all of them have the 8-repeat unit as the modal, with more than 80% of group members sharing this allele, but in the Hazara only 63% of the sampled individuals had the 8-repeat unit and about 33% had the 10-repeat unit. This agrees very well with the way this ethnic group is structured; our previous study has also shown that this group has mostly Y-chromosomal haplogroup R and C (previously labeled as 1 and 10), as well as Y-STRs, which are usually distributed over two to five closely related modal alleles (separated by one to five mutational steps) (Qamar et al., 2002). We thus postulate that the Hazara have probably arisen from the Y-chromosomal contribution of more than one individual; our previous admixture analysis also points to a substantial contribution from Mongols (Qamar et al., 2002), but does not rule out multiple lineage contributions to this ethnic group. When one looks at the other four loci, a similar distribution pattern is observed for the Hazara: two to three alleles account for 93%-100% of the subjects, which could be either due to a particular pattern of repeat expansion in this ethnic group or due to contributions from more than one lineage.
Using all the five microsatellites, we constructed highly informative haplotypes that provided an insight into the ancestral relationships and male lineages of some of the populations. For example, the Hazara ethnic group had comparatively low haplotype diversity (0.835) (Table 3), with two haplotypes accounting for 80% of the ethnic group (9-8-17-11-23 and 8-10-15-10-26). This was also demonstrated in earlier studies of this ethnic group based upon variation at 16 Y-biallelic and 16 Y-STR loci (Ayub et al., 2000; Qamar et al., 2002).
In addition to the Hazara, another population of interest, which is being reported for the first time here, is the Mohanna—this ethnic group represents the ancient fishermen tribes who reside along the Indus river in southern Pakistan, and their oral history lays claim to them being the oldest population of the Indian subcontinent. This group has mostly been marrying amongst themselves and has lived in small groups in select areas of southern Pakistan. In addition, they have also undergone multiple bottlenecks caused by different invaders, which resulted in significant decrease in their population size. This is reflected in the haplotype network of this ethnic group (Fig. 2), in which all the chromosomes cluster around the most common haplotype, 9-8-17-11-24. On typing the DRD4 locus, we have previously observed that the Mohanna had considerable African and particularly Ethiopian ancestry (Mansoor et al., 2008), which was in agreement with the observations of Quintana-Murci et al. (2004), who proposed an Ethiopian link of the populations residing in the Indus valley, based upon the presence of haplogroup U9 in this area. In the present study, the absence of YAP+ signature chromosomes in the Mohanna could be either a result of a loss of these chromosomes in this population due to subsequent admixture with different invading populations or the origin of this group from a subset of YAP-negative African population.
The low haplotype diversity in the Balti (0.758) (Table 3) could be a result of the small sample size (n = 14) studied for this ethnic group. Previously, population-specific haplotypes were also observed in the Brahui, Kalash, and Parsi ethnic groups (Mohyuddin et al., 2001). The population-specific clustering of the haplotype (9_8_17_11_25) in 29% Brahui (Fig. 1) is in agreement with our previous data in which this ethnic group had 16 individuals sharing a single haplotype across 16 STRs (Mohyuddin et al., 2001); this clustering is probably a result of a population substructuring, which is specific to the Brahui. Similarly, 20% Pathan shared the haplotype 9_8_18_11_23, which also seems to be very population specific (at least for these five Y-STRs) as only one other individual shared this haplotype amongst all the ethnic groups studied (Fig. 1). This sharing of the haplotype in the Pathan seems to be a unique phenomenon of these five Y-STRs, because our previous studies do not show such a sharing of haplotypes by the Pathan.
Conclusions
The detailed analysis of these polymorphic microsatellites on the Y-chromosome offers new possibilities for investigation of forensic caseworks and in the studies of human population substructure in closely related populations. The selection of markers should be based upon not only the diversity values but also the haplotype distribution, which is an equally important criterion when choosing between different markers for forensic casework and human population studies. Of the five markers reported here, DYS652 is a strong candidate for use in forensic casework, and the other markers can provide insight into population substructuring and would thus be useful in human population studies.
Footnotes
Acknowledgments
The authors are grateful to all the blood donors for their help in this project. The authors are also thankful to a number of our colleagues for their assistance in this work and especially to Chris-Tyler Smith for his valuable advice in setting the multiplex PCR. This work was supported by a core grant to the Institute of Biomedical and Genetic Engineering from the Government of Pakistan.
Disclosure Statement
No competing financial interests exist.
