Abstract
Yersinia pseudotuberculosis is a foodborne pathogen often detected and identified using polymerase chain reaction (PCR) with primers targeted to virulence genes. Sequence variability of the virulence genes in strains representing different serotypes is unknown. Sequence variability could hinder the recognition of this pathogen by PCR and affect the host–pathogen interactions. Sequencing of inv, virF, and yadA of 18 Y. pseudotuberculosis strains showed limited variability of inv and virF, whereas the sequences of yadA varied considerably.
Introduction
Y. pseudotuberculosis is currently divided into 21 serotypes, with O:1a and O:1b being the most common in Europe, Australasia, and North America, and O:4b and O:5b being prevalent in the East Asia (Carniel et al., 2006). Serotype O:15 strains are common in human patients in South Korea (Fukushima et al., 2001; Laukkanen-Ninios et al., 2011; De Castro et al., 2009). Serotypes O:6 to O:14 have been isolated mainly from animals and environmental sources (Carniel et al., 2006; Laukkanen-Ninios et al., 2011). The presence of the highly conserved 70-kb virulence plasmid (pYV) is required for pathogenicity and is found in all human and animal pathogenic Yersinia strains (Carniel et al., 2006). pYV has not been found in serotypes O:8, O:9, O:11, O:12, and O:13 isolated from animals and environmental sources (Carniel et al., 2006; Iwata et al., 2008).
To initiate Y. pseudotuberculosis infection, the chromosomally encoded outer membrane protein invasin (Inv) is used for passage of the bacterium through the intestinal epithelium in the terminal ileum (Leo and Skurnik, 2011). In addition to the role in traversing through the epithelium, Inv induces inflammatory response in intestinal epithelial cells, which contributes to the spread of Yersinia in the host (Grassl et al., 2003). After passage through the epithelium, the Yersinia adhesin YadA, encoded by pYV, mediates binding to collagen, fibronectin, and laminin, and also to intestinal mucus, epithelial cells, and macrophages (Leo and Skurnik, 2011). YadA protects Y. pseudotuberculosis from complement and from antimicrobial peptides generated by granulocytes. It also shields Yersinia cells from phagocytosis (Leo and Skurnik, 2011). In addition, YadA functions as a hemagglutinin and autoagglutinin, and it enhances biofilm formation (Leo and Skurnik, 2011; Heise and Dersch, 2006). VirF is a pYV-encoded transcriptional activator needed for the expression of several virulence genes, including those encoding the Yops (Yersinia outer protein) and yadA in Yersinia (Cornelis et al., 1998).
The isolation of pathogenic Yersinia, especially Y. pseudotuberculosis, is demanding due to slow growth and poor competition (Skurnik et al., 2009). Furthermore, identification of Y. pseudotuberculosis based on biochemical reactions is challenging because of its indistinguishable phenotype from the closely related Yersinia similis and Yersinia pekkanenii (Sprague et al., 2008; Niskanen et al., 2009; Murros-Kontiainen et al., 2011). To overcome these problems, detection and identification of Y. pseudotuberculosis is often based on polymerase chain reaction (PCR) with primers targeted to the virulence genes inv, virF, and yadA (Skurnik et al., 2009; Fredriksson-Ahomaa et al., 2010). However, while the genetic diversity within the species Y. pseudotuberculosis is generally narrow (Laukkanen-Ninios et al., 2011), there is little information on the sequence variability of virulence genes in the different serotypes of Y. pseudotuberculosis. Variation in virulence gene sequences would challenge the design of potent PCR primers and hinder detection, distorting the epidemiological understanding of this pathogen. The objective of this study was to reveal the sequence similarity of inv, virF, and yadA of 18 Y. pseudotuberculosis and two Y. similis strains originating from 12 different countries.
Materials and Methods
Bacterial strains and growth conditions
The 18 Y. pseudotuberculosis strains and two Y. similis strains (Laukkanen-Ninios et al., 2011) (Table 1) were grown on Luria-Bertani (LB) agar plates (BD, Franklin Lakes, NJ) or in LB broth (BD) at 30°C. The genomic DNA was extracted from overnight cultures by using Pitcher's method (Pitcher et al., 1989). Plasmid DNA was extracted using the Qiagen Plasmid Midi Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's instructions.
BT, biotype; pYV, virulence plasmid; ST, sequence type based on multilocus sequence analysis (Laukkanen-Ninios et al., 2011); NK, not known; MS, Mikael Skurnik; DFHEH, Department of Food Hygiene and Environmental Health, University of Helsinki, Finland.
PCR amplification and sequencing
Primers for amplification and sequencing were designed for the strain IP32953 (Chain et al., 2004) (GenBank accession number BX936398) by using the Primer3 software (Rozen and Skaletsky, 2000) and are listed in Supplementary Table S1 (Supplementary Data are available online at
Sequence analysis
Base-calling and quality assignment of the raw sequences were done using Phred (Ewing and Green, 1998). Gene sequences for each strain were constructed from multiple fragments with the gap4 program in the Staden package (Staden, 1996). Phylogenetic trees were constructed using MEGA (Tamura et al., 2007). Gene sequences were aligned with CLUSTAL and trimmed at both ends to have equal lengths. All nucleotide sequences were confirmed to contain an ATG start codon before converting them into amino acid sequences. Neighbor-joining trees with 1000 bootstrap replicates were constructed with the Kimura 2-parameter model for nucleotide sequences. Pairwise sequence similarities shown in Tables 2 –4 were calculated without evolution models.
Y. similis.
Column NT and row AA represent similarity with the reference strain IP32953.
NT, nucleotide; AA, amino acid.
Column NT and row AA represent similarity with the reference strain IP32953.
NT, nucleotide; AA, amino acid.
Column NT and row AA represent similarity with the reference strain IP32953.
NT, nucleotide; AA, amino acid.
Results and Discussion
We sequenced the virulence genes in 18 Y. pseudotuberculosis strains originating from four continents and 11 countries, representing five different serotypes, and in two Y. similis strains (the sequence database accession numbers are HE805213–HE805268). Y. similis is a close relative to Y. pseudotuberculosis, but it is considered to be nonpathogenic due to lack of pYV (Fukushima et al., 2001; Laukkanen-Ninios et al., 2011; Sprague et al., 2008). The smallest sequence variation was observed in virF and the greatest one in yadA (Tables 2 –4).
The similarity of the nucleotide and amino acid sequences of inv encoding the invasin, a virulence factor needed for the passage of Y. pseudotuberculosis through the intestinal epithelium and spread in the host (Leo and Skurnik, 2011; Grassl et al., 2003), is presented in Table 2. At the amino acid level, the inv genes of the Y. pseudotuberculosis strains differed from each other at the most by 1.5% (Table 2). Nucleotide sequences of inv in the Y. pseudotuberculosis strains differed from each other at the most by 1.1% (Table 2). The token clustering of the Y. pseudotuberculosis strains based on inv sequences (Fig. 1) requires further support by a greater number of strains. Previously, three allelic variants were found when a 600-bp fragment of the 2958-bp inv was sequenced for 84 Y. pseudotuberculosis strains, 83 of which originated from the Russian Federation (Adgamov et al., 2010). The 84 strains belonged to serotypes 1 and 3 and originated from humans, wild rodents, and the environment (Adgamov et al., 2010). Allele distribution depended on the source of isolation of a strain, but the serotype and geographical origin were insignificant for allele distribution in the previous study (Adgamov et al., 2010).

Phylogenetic tree with bootstrap values on branches of inv of 18 Yersinia pseudotuberculosis and three Yersinia similis strains based on nucleotide sequences (EMBL accession numbers HE805213–HE805232). IP32953, sequence of inv of the reference strain IP32953.
Based on inv sequences, the Y. similis strains formed an outgroup (Fig. 1). The similarity of the nucleotide and amino acid sequences of the inv genes of Y. similis strains is presented in Table 2. The inv sequences of Y. similis strains R626R and R220, and that of another Y. similis strain N916Ysi (retrieved from whole genome sequence, accession number ERS008562) were compared with that of Y. pseudotuberculosis strain 283 (Supplementary Fig. S1). The inv sequences of the three Y. similis strains were not identical. When compared with inv of the strain 283, the Y. similis inv sequences contained a 1-bp deletion at position 40, causing a frame shift mutation. However, another start codon is located immediately downstream, at position 52–54. This is preceded by a perfect ribosomal binding site AGGAG (position 39–44). Furthermore, as the signal peptide of the predicted Y. pseudotuberculosis invasin is 48 amino acids long, a protein starting from the codon 52–54 would still contain a fully functional signal peptide of 31 amino acids. Translation of the Y. similis invasin protein likely initiates from this start codon.
When the published inv-specific PCR detection primers (Kaneko et al., 1995; Kageyama et al., 2002; Nakajima et al., 1992; Thoerner et al., 2003) were compared to the DNA sequences of the Y. pseudotuberculosis and Y. similis inv genes studied here, 0 to 1 and 0 to 9 mismatches per primer pair were found, respectively. Thus, the commonly used primers are expected to perform well in the detection of Y. pseudotuberculosis. However, also some Y. similis strains are likely to be detected with these primers, which may cause problems in diagnostics.
Only small differences were found in virF of the Y. pseudotuberculosis strains investigated at the nucleotide level, and none at the amino acid level (Table 3). Comparison of the virF-specific primer sequences (Kaneko et al., 1995; Kageyama et al., 2002; Thoerner et al., 2003; Harnett et al., 1996; Lambertz and Danielsson-Tham, 2005; Wren and Tabaqchali, 1990) to the virF sequences of our 18 Y. pseudotuberculosis strains revealed only 0–2 mismatches per primer pair. VirF is a member of the AraC family of transcriptional regulators and a key activator of virulence genes (Cornelis et al., 1998). Thus, the high conservation of virF in pathogenic Y. pseudotuberculosis is not surprising. The pYV plasmid encoding virF is absent in Y. similis (Fukushima et al., 2001; Laukkanen-Ninios et al., 2011; Sprague et al., 2008).
The similarity of the nucleotide and amino acid sequences of the pYV-encoded adhesin gene yadA of Y. pseudotuberculosis strains investigated is illustrated in Table 4. Y. similis does not carry the pYV (Fukushima et al., 2001; Laukkanen-Ninios et al., 2011; Sprague et al., 2008). The maximum difference between the Y. pseudotuberculosis strains at amino acid level was 5.2%, and at nucleotide level 2.2% (Table 4). The constructed phylogenetic tree (Fig. 2) did not support the presence of subgroups or clusters. However, the observed sequence differences can affect annealing of PCR primers, and thus, the ability to detect pathogenic Y. pseudotuberculosis by PCR. Hence, detection primers should be designed to the conserved areas of yadA (Supplementary Fig. S2). The commonly used yadA detection primers (Thoerner et al., 2003, Fukushima et al., 2003) have 2–3 predicted mismatches per primer pair with the 18 Y. pseudotuberculosis strains studied.

Phylogenetic tree with bootstrap values on branches of yadA of 18 Yersinia pseudotuberculosis strains based on nucleotide sequences (EMBL accession numbers HE805233–HE805250). IP32953, sequence of yadA of the reference strain IP32953.
Apart from challenging diagnostics, a high variation at the amino acid level of YadA could have an impact on the dissemination of this bacterium in host tissues (Heise and Dersch, 2006) or on the immune responses of the host. In Clostridium botulinum, a difference of 7% at the amino acid level in neurotoxin gene sequences has been shown to affect the antigenic properties of botulinum neurotoxin, but no data are available on the effects of smaller differences on the binding of antibodies (Lou et al., 2010).
Detailed comparison between the yadA sequences (Supplementary Fig. S2) revealed that most of the nucleotide substitutions resulted in amino acid substitutions; only for three out of the 40 affected codons were the nucleotide substitutions synonymous. The substitutions were heavily concentrated in the N-terminal proline-rich brim-like domain at the top of YadA head (Supplementary Fig. S3), and a few substitutions affected the head domain β-rolls, while the C-terminal half of the protein comprising the stalk and the membrane-anchoring barrel was virtually unaffected. In the left-handed parallel β-rolls, the 100% conserved glycine residues (highlighted in Supplementary Fig. S3) allow very tight turns (Nummelin et al., 2004). In each β-roll, the four amino acids preceding the glycine face inside the head, and the four residues following the glycine face outside. In a highly consistent manner, substitutions occurred only in residues following the glycine residue, indicating that these substitutions were tolerated and those introducing a charged residue could potentially change the surface properties of the YadA head. The changes affecting the proline-rich brim included, in addition to substitutions, deletions of 3 and 9 bp and an insertion of 6 bp; all of these affected the brim and could also potentially affect the functional properties of YadA.
Based on the distribution of the amino acid substitutions in the YadA sequences, the strains were divided into nine groups (Supplementary Fig. S2). Groups 2 and 3 differed only at one non-synonymous nucleotide residue from group 1, while group 4 differed at three positions. Group 6, in addition to nucleotide substitutions, had a 9-bp deletion, groups 7 and 8 had 3- and 9-bp deletions, and group 9 had a 6-bp insertion. It is possible that at some point the genes have been subjected to recombination since several identical substitutions are present in group 5, 6, and 9 sequences (Supplementary Fig. S2).
Finally, we made pairwise comparisons of the concatenated housekeeping genes (total length 2627 bp) for the nine strains for which the sequence type (ST) data was available (Supplementary Table S2) (Laukkanen-Ninios et al., 2011). Identity percentage between the concatenated sequences of Y. pseudotuberculosis strains ranged from 99.543% (12 mismatches) to 99.924% (two mismatches). It was 99.467% between the Y. similis strains (STs 76 and 83) and ranged from 95.166% to 95.965% between the Y. pseudotuberculosis and Y. similis strains. For Y. pseudotuberculosis, the corresponding identity percentage between inv and yadA followed roughly the ST percentage pattern (Tables 2 and 4 and Supplementary Table S2), but were throughout 0.057–0.795 smaller for inv and −0.043 to 1.595 smaller for yadA (Supplementary Table S2). Thus, the virulence genes represent relatively narrow diversity.
Conclusion
The virulence genes inv and virF are very similar among Y. pseudotuberculosis strains representing different serotypes and originating from diverse sources. In contrast, variation observed in yadA is reasonably high, which may affect the sensitivity or specificity of PCR-based detection, as well as the spread and immune responses induced by Y. pseudotuberculosis in the host.
Footnotes
Acknowledgments
This work was supported by the Finnish Center of Excellence in Microbial Food Safety Research, Academy of Finland (grants 118602 and 141140), the Doctoral Program of the Faculty of Veterinary Medicine of the University of Helsinki, the Finnish Veterinary Foundation, the Walter Ehrström Foundation, and the Medical Fund of the University of Helsinki.
Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
