Abstract
Campylobacter is a leading cause of foodborne illness in humans, and improving our understanding of the epidemiology of this organism is essential. The objective of this study was to identify the genes that discriminate isolates of C. jejuni by analysis with whole-genome DNA microarrays. Statistical analyses of whole-genome data from 95 geographically diverse cattle, chicken, and human C. jejuni isolates identified 142 most significant variable genes. Of this total, 125 (88%) belonged to genomic prophage and hypervariable regions. The significance of genomic prophage and hypervariable regions in determining C. jejuni isolate genomic diversity is emphasized by these results. These genes will be useful as biomarkers and components of genotyping systems for C. jejuni to improve our understanding of the epidemiology and population genetics of this major foodborne pathogen.
Introduction
C
Over the last decade, the sequences of a large number of bacterial genomes have been completed and made available, including C. jejuni (
Methods
Bacterial isolates and culture conditions
A total of 95 C. jejuni isolates were included in this study. Twenty-nine isolates from chicken carcass rinses and 31 C. jejuni isolates from beef cattle feces were selected from the National Antimicrobial Resistance Monitoring System (NARMS) (
Microarray construction
Microarrays were constructed with 70mer DNA oligonucleotides using the commercially available C. jejuni NCTC11168 array set (1601 oligonucleotides) and seven randomly generated control oligonucleotides from Operon Technologies (Alameda, CA). The sequences of an additional 326 genes present in C. jejuni RM1221, a wild-type chicken isolate, were acquired and processed with OligoWiz 2.0 (Neilson, 2003; Wernersson, 2005). The oligonucleotide sequences generated using this program were synthesized by Operon and added to the array, for a total of 1934 features (1927 genes plus seven controls). Arrays were printed in triplicate on Corning Ultra Gaps slides (Corning Life Sciences, Corning, NY) in a QArray Mini instrument (Genetix, Boston, MA) using a 50% dimethyl sulfoxide (DMSO) solution and a final oligonucleotide concentration of 20 pmol DNA/μL.
Microarray hybridizations and data acquisition
Genomic DNA from each isolate was extracted using the PUREGEGE Yeast & Gram-Positive Bacteria kit (Gentra Systems, Minnealpolis, MN) according to the manufacturer's directions. For each hybridization reaction, 3 μg of DNA from the reference isolates (NCTC11168 and RM1221) and a test isolate were fluorescently labeled with indodicarbocyanine (Cy5) and indocarbocyanine (Cy3), respectively, as previously described (Porwollik et al., 2003). Labeled DNA was purified from unincorporated label using Qiaquick PCR Cleanup kits (Qiagen, Valencia, CA) following the manufacturer's protocol and dried in a centrifugal evaporator (SpeedVac, Savant Instruments, Holbrook, NY).
Hybridizations were performed as previously described by Porwollik et al. (2003). The DNA microarrays were then scanned using a ScanArray Lite microarray laser scanner (Perkin Elmer, Waltham, MA). Features (spots) and local background intensities were measured and quantified using ScanArray Express software (PerkinElmer). Spots that contained abnormalities or were within regions of high fluorescent background were excluded from further analysis. The data were filtered so that spots with mean signal intensities lower than background plus two standard deviations of the background were discarded.
Statistical analysis
A statistical method was developed using the SAS program (SAS Institute, Cary, NC) to analyze the whole-genome content of the different isolates to compare the relative presence or absence (i.e., variability) of each of the genes. The Cy5 and Cy3 fluorescence intensity values were first converted to log2 values (denoted as Z5 and Z3) to remove data skew and improve the symmetry of the data distributions. The presence or absence of each gene for each isolate was then determined using a three-step statistical procedure. First, a best fitting line was obtained using Least-Median-Squares (LMS) regression (Rousseeuw, 1984) for all three replications of all non-control genes for each of the 95 isolates in our study. This method finds the line that minimizes the sum-of-squared-error for the middle 50% of all data points. Assuming that most data pairs arise from the case where Cy5 and Cy3 are both present, this procedure finds a line close to that characterizing the true {Z5, Z3} relationship for such genes. The calculated line represents the mean behavior of genes by both Cy5 and Cy3 for the isolate in question. A lower bound on this line was then established by drawing a line parallel 1.96 standard deviations below the LMS line, where the standard deviation is measured by the scale of the LMS regression fit. This procedure was performed for the three replicates of each gene before determining its presence or absence. The “majority rule” was used for the final determination; that is, if a gene was absent for at least two of the three replications, it was considered absent overall. Once each gene was classified as either present or absent, Fisher's Exact Test was used to find the genes that were most variable between isolate sources. Genes were considered significantly variable if p<0.005. Since the analysis involves performing a number of tests on the same data, a low p-value was chosen to minimize false significance.
Results
The statistical analysis yielded a total of 142 significant genes (approximately 7% of the 1927 C. jejuni genes on the microarray), and these are listed in Table 1. Only 12 NCTC11168 genes were significantly variable. Although comprising only a small percentage (8.5%, 12/142) of the total, the significant NCTC11168 genes covered a broad range of functions, including Type I restriction-modification (RM; Cj1553), a haemolysin (tylA), haemin uptake (chuA), and a flagellin gene (flaD), in addition to hypothetical proteins. Six of the NCTC11168 variable genes were previously identified with C. jejuni hypervariable loci (Taboada et al., 2004; Parker et al., 2006), representing regions 2 (Cj0055), 13 (Cj1420 and Cj1427), 14 (Cj1550 and Cj1553), and 21 (Cj1152).
As reported by Parkhill et al. (2000) for C. jejuni NCTC11168 (Cj-) and by Fouts et al. (2005) for C. jejuni RM1221 (CJE-).
Hypervariable gene region number.
The majority of significant C. jejuni genes (91.5%, 130/142) were found in RM1221 (Table 1). Most of the 130 RM1221 genes (71.3%, 93/130) represented portions of prophage (i.e., genomic island or integrated element) regions (Fouts et al., 2005). Specifically, 18.4% (9/49) of the C. jejuni integrated element 1 (CJIE1; also identified as Campylobacter Mu-like phage 1 [CMLP1]) genes were identified, 65.3% (32/49) of the CJIE2 genes, 67.2% (43/64) of the CJIE3 genes, and 22.5% (9/40) of the CJIE4 genes. Nearly all of the rest of the significant RM1221 genes were associated with smaller C. jejuni hypervariable regions. These included regions 2 (CJE0052-53), 5 (CJE0472), 10 (CJE1050), 11 (CJE1279, CJE1281), 12 (CJE1489, CJE1497, CJE1500-02, CJE1515), 13 (CJE1602-06, CJE1612-16), and 14 (CJE1719, CJE1723-25, CJE1728).
Among the functions of the hypervariable region genes identified, lipooligosaccharide biosynthesis (region 11, n=2), flagellar modification and o-linked glycosylation (region 12, n=6), capsular biosynthesis (region 13, n=11), and restriction modification (region 14, n=7) were most prominent. Only 12% (17/142) of the all significant C. jejuni genes found were not part of a previously described C. jejuni prophage or hypervariable region.
A more detailed picture of the distribution of the variable genes among the 95 C. jeuni isolates is illustrated in Figures 1 –3. Most striking is how commonly the prophage genes (Fig. 1) were found among the human C. jejuni isolates, particularly considering that RM1221, the source strain for these genes on the microarray, was isolated from a chicken (Parker et al., 2006). In 37.1% (13/35) of the human C. jejuni isolates, all 93 genes representing the four prophage categories were identified. By contrast, in only one chicken isolate (PC0079) were all prophage genes present, and none of the cattle isolates were positive for every prophage variable gene. This observation is in accord with the overall distribution of the prophage genes within the isolate sources, as these genes were found less frequently in C. jejuni from chickens and cattle compared to those from humans (Fig. 1). The largest prophage region, CJIE3, also had the largest representation of genes present among isolates from all three sources.

Distribution of prophage region genes identified in the Campylobacter jejuni isolates obtained from cattle, chickens and humans. CMLP1, Campylobacter mu-like phage 1; CJIE2, CJIE3, CJIE4, Campylobacter jejuni insertion elements 2, 3, and 4. Gene locus designations are taken from Fouts et al. (2005) for C. jejuni RM1221 (CJE-). Black indicates that the gene is present (p≤0.005), and white indicates that the gene is absent (p>0.005).

Distribution of genes in the eight hypervariable regions (numbered 2–21) identified in the Campylobacter jejuni isolates obtained from cattle, chickens, and humans. Gene locus designations are taken from Parkhill et al. (2000) for C. jejuni NCTC11168 (Cj-) and Fouts et al. (2005) for C. jejuni RM1221 (CJE-). Black indicates that the gene is present (p≤0.005), and white indicates that the gene is absent (p>0.005).

Distribution of the 17 uncategorized variable genes identified in the Campylobacter jejuni isolates obtained from cattle, chickens, and humans. Gene locus designations are taken from Parkhill et al. (2000) for C. jejuni NCTC11168 (Cj-) and Fouts et al. (2005) for C. jejuni RM1221 (CJE-). Black indicates that the gene is present (p≤0.005), and white indicates that the gene is absent (p>0.005).
The distribution of the hypervariable region genes among the 95 C. jeuni isolates is shown in Figure 2. As with the prophage genes, the hypervariable genes were most commonly found among the human C. jejuni isolates. However, only one human isolate (PC0055) had all 32 genes representing the eight C. jejuni hypervariable regions that were identified. Also, in only one chicken isolate (PC0079) were all 32 hypervariable genes present; none of the cattle isolates was positive for every hypervariable region gene. The overall distribution of the hypervariable genes within the isolate sources was similar to that for the prophage genes, and these genes were less frequently identified in C. jejuni from chickens and cattle compared to those from humans (Fig. 2). Hypervariable regions 14 and 21 were represented most commonly among isolates from all three sources.
The distribution of uncategorized variable genes among the 95 C. jeuni isolates is illustrated in Figure 3. These variable genes followed the same general pattern noted for the prophage and hypervariable genes, being most common among the human C. jejuni isolates followed by the chicken and cattle isolate groups. However, just three of the human isolates (PC0044, PC0052, and PC0054) had all 17 uncategorized genes present, and in only one chicken isolate (PC0079) and one cattle isolate (PC0129) were all uncategorized variable genes identified. All six uncategorized genes from C. jejuni NCTC11168 (Cj0588-Cj1614) were frequently identified among all 95 of the study isolates, as were the RM1221 genes CJE1548 and CJE1552 (Fig. 3). There was no common functional theme among the eight frequently identified uncategorized variable genes (Table 1).
Discussion
Previous CGI studies have identified regions of divergent genes in the Campylobacter genome, which suggested that these variable regions might contain genes useful for discriminating isolates. One of the first studies utilizing Campylobacter microarrays examined the genomic diversity C. jejuni strains from several sources and identified seven hypervariable plasticity regions (PR1 to PR7) within the C. jejuni genome (Pearson et al., 2003). Many of these regions are believed to help with adaptation to different ecological niches along with production and modification of antigenic surface structures, including capsular polysaccharide (Karlyshev et al., 2005) and lipooligosaccharide biosynthesis (Parker et al., 2006, 2007). Taboada et al. (2004) showed that a large proportion of the variable genes were found to be absent or divergent in single strains only, and that these uniquely variable genes could be mapped to previously defined variable loci, indicating that large regions of the C. jejuni genome are genetically stable. Attempts have also been made to identify genetic markers to help identify specific disease manifestations, but these attempts have been unsuccessful (Leonard et al., 2004).
The addition of 326 RM1221 genes to the microarray design in our study provided considerably more information on which to base isolate discrimination compared to using the NCTC1168 genome alone. NCTC11168 was originally isolated from a case of human campylobacteriosis in 1977, and was part of a laboratory collection more than 20 years prior to its sequencing, apparently losing some of its original genetic characteristics over time (Ahmed et al., 2002). Consistent with this observation, our analysis identified just 12 significant genes from NCTC11168, while the wild-type chicken isolate, RM1221, contributed 130 of the 142 most variable genes we identified. The majority of these were associated with genomic islands (i.e., prophage genes) and hypervariable regions. A number of studies have cited the contribution of these variable regions to the genetic diversity of C. jejuni (Taboada et al., 2004; Fouts et al., 2005; Parker et al., 2006, 2007; Clark and Ng, 2008; Quiñones et al., 2008). Consequently, it was not surprising to find these genetic elements among the significantly variable genes identified in this study. However, we did not anticipate finding RM1221 variable genes to be more widely distributed in cattle and human C. jejuni than those from chickens.
Prophages are known to be particularly conspicuous in bacterial pathogens, and prophage DNA has had a significant role in their evolution (Canchaya et al., 2004). In C. jejuni, Gaasbeek et al. (2009) have recently shown that CJIE1 incorporation inhibits the ability of the isolate to undergo natural transformation. A key extracellular DNase gene in the inhibition process (dns, CJE0256) was not among the 18.4% (n=9) divergent CJIE1 genes we identified. Therefore, like the majority of CJIE1 genes, dns is apparently relatively stable within this element and not suitable as a C. jejuni marker gene. This group (Gaasbeek et al., 2010) has also identified nonspecific endonucleases in CJIE2 and CJIE4 that contribute to the inhibition of natural transformation in C. jejuni. Interestingly, the putative endonuclease genes CJE0556 (CJIE2) and CJE1441 (CJIE4) highlighted in that study were among the significantly variable genes we found. This may add further significance to the usefulness of these genes in distinguishing isolates of C. jejuni.
Of the eight C. jejuni hypervariable regions represented in the results of this study, surface antigenic structure and function figured prominently. Lipoologosaccharide synthesis (region 11), flagellar modification (region 12), and capsular biosynthesis (region 13), which together facilitate immune system evasion and colonization of the host, were conspicuous among the variable genes. The C. jejuni type I restriction-modification system, represented by hypervariable region 14, was also notable among the divergent genes identified. The type I enzyme complex component HsdM amino acid sequence has been shown to be quite conserved within the same family but varies widely among different families (Miller et al., 2005). This variability may be related to survival in differing environmental niches which in turn suggests the potential utility of the hsdM gene (CJE1724) in characterizing different C. jejuni isolates. However, hsdM variants have not been correlated with specific host sources, although using this gene as a source attribution marker remains an interesting possibility.
Few of the other significantly variable genes identified in our study have been well characterized. Two of these, the hemolysin gene tylA (Cj0588) and the heme transport gene chuA (Cj1614), were NCTC11168 genes. The tylA gene is involved in cell adherence and appears to have an important role in host colonization and virulence (Salamaszyńska-Guz and Klimuszko, 2008). The divergence of tylA within populations of C. jejuni has not been well characterized, but its role in virulence warrants further investigation as a genotyping marker. Iron acquisition is also essential for bacterial colonization, and survival in the host and the iron regulon gene system is typically associated with virulence. The chuA gene (Cj1614) has a major transport function as a component of the heme acquisition gene cluster (Ridley et al., 2006). We identified chuA as significantly variable and thus a second notable virulence factor with potential utility as a genetic marker. Similar to hsdM, tylA and chuA have thus far not been associated with particular C. jejuni hosts.
The major outer membrane protein of Campylobacter is encoded by porA, a highly diverse gene that has been cited for its value in epidemiological studies (Clark et al., 2007; Cody et al., 2009). Multiple lineages of porA have been described and Cody et al. (2009) found nearly 200 gene variants in their investigation. porA was also among the few well-characterized divergent genes identified in our study. Thus, porA could be employed in both sequence-based and presence-absence genotyping schemes.
The stringent statistical analysis used here and the resulting predominance of prophage and hypervariable genes among 142 genes identified are consistent with previous studies on the significance of these regions in C. jejuni isolate variability. Moreover, as previously noted, in addition to chicken isolates, the divergent genes identified in this study are widely distributed in both cattle and human C. jejuni, further emphasizing their value as genetic markers for distinguishing C. jejuni isolates from widely varying host sources.
Footnotes
Acknowledgments
We thank Sandra House of the Bacterial Epidemiology and Antimicrobial Resistance Research Unit excellent for technical assistance. This work was supported by the United States Department of Agriculture, Agricultural Research Service CRIS (projects 6612-32000-035 and 5325-42000-0450.
Disclosure Statement
No competing financial interests exist.
