Multilaboratory Validation Study of Standardized Multiple-Locus Variable-Number Tandem Repeat Analysis Protocol for Shiga Toxin–Producing Escherichia coli O157: A Novel Approach to Normalize Fragment Size Data Between Capillary Electrophoresis Platforms

Abstract

The PulseNet USA subtyping network recently established a standardized protocol for multiple-locus variable-number tandem repeat analysis (MLVA) to characterize Shiga toxin–producing Escherichia coli O157. To enable data comparisons from different laboratories in the same database, reproducibility and high quality of the data must be ensured. The aim of this study was to test the robustness and reproducibility of the proposed standardized protocol by subjecting it to a multilaboratory validation process and to address any discrepancies that may have arisen from the study. A set of 50 strains was tested in 10 PulseNet participating laboratories that used capillary electrophoresis instruments from two manufacturers. Six out of the 10 laboratories were able to generate correct MLVA types for 46 (92%) or more strains. The discrepancies in MLVA type assignment were caused mainly by difficulties in optimizing polymerase chain reactions that were attributed to technical inexperience of the staff and suboptimal quality of reagents and instrumentation. It was concluded that proper training of staff must be an integral part of technology transfer. The interlaboratory reproducibility of fragment sizing was excellent when the same capillary electrophoresis platform was used. However, sizing discrepancies of up to six base pairs for the same fragment were detected between the two platforms. These discrepancies were attributed to different dye and polymer chemistries employed by the manufacturers. A novel software script was developed to assign alleles based on two platform-specific (Beckman Coulter CEQ™8000 and Applied Biosystems Genetic Analyzer 3130xl) look-up tables containing fragment size ranges for all alleles. The new allele assignment method was validated at the PulseNet central laboratory using a diverse set of 502 Shiga toxin–producing Escherichia coli O157 isolates. The validation confirmed that the script reliably assigned the same allele for the same fragment regardless of the platform used to size the fragment.

Introduction

M any microbial genes and intergenic regions contain loci of repetitive DNA sequences that may vary among strains with respect to the number of repeat units present. These so-called “variable-number tandem repeat” (VNTR) regions have been identified in essentially all pro- and eukaryotic species and have been successfully used for subtyping purposes (Kashi et al., 1997; van Belkum et al., 1998). In the simplest form, VNTR analysis may only include a single locus (Shopsin et al., 1999). When multiple loci are targeted, the technology is often referred to as multiple-locus VNTR analysis (MLVA) (Keim et al., 2000). A typical MLVA protocol consists of multiplex polymerase chain reaction (PCR) amplification of VNTR targets followed by fragment sizing using high-resolution capillary electrophoresis (Lindstedt et al., 2003; Hyytia-Trees et al., 2006). Allele types for each VNTR locus are assigned either manually using arbitrary numbers as new types are discovered (Lindstedt et al., 2003) or automatically using specialized software to assign the actual copy number as the allele type (Hyytia-Trees et al., 2006; Sperry et al., 2008). When no amplification is detected, the allele type is called as being a “null allele.” Null alleles are due to a change in one or more primer sites, loss of an entire genomic region, or loss of a plasmid if the VNTR is located on one (Keys et al., 2005).

MLVA-related research on foodborne organisms has mainly focused on Shiga toxin–producing Escherichia coli O157 (STEC O157) (Keys et al., 2005; Lindstedt et al., 2007), Salmonella enterica serotypes Typhimurium (Lindstedt et al., 2003) and Enteritidis (Boxrud et al., 2007; Beranek et al., 2009), and Listeria monocytogenes (Murphy et al., 2007; Sperry et al., 2008). PulseNet USA, the national molecular subtyping network for foodborne disease surveillance, recently established a standardized MLVA protocol for STEC O157 (Hyytia-Trees et al., 2006). This protocol was intended to be used as a complimentary technique to pulsed-field gel electrophoresis (PFGE) which the PulseNet system uses as the “gold standard” method for subtyping foodborne bacterial pathogens (Gerner-Smidt et al., 2006). This MLVA assay shows great promise as a molecular epidemiologic tool particularly when used to further discriminate PFGE clusters of isolates with commonly seen restriction patterns (Hyytia-Trees et al., 2006). PulseNet USA was originally built on the concept of decentralized laboratory testing and centralized data management (Swaminathan et al., 2001) to generate and share subtyping results in more timely manner. Thus far, PulseNet USA has used MLVA in centralized manner, that is, isolates requiring MLVA testing have been analyzed at the PulseNet central laboratory at the Centers for Disease Control and Prevention (CDC, Atlanta, GA). The current format often results in delays in the availability of MLVA results by 1 to 2 weeks. Therefore, implementing MLVA at the local public health laboratories has become a high priority for PulseNet USA. Implementation of MLVA protocols into a system like PulseNet, where data will be generated locally by different laboratories and compared nationally in a reference database, demands that such data be reproducible and of the highest quality possible.

Multilaboratory external validations have been part of PulseNet USA's protocol development strategy since the network was established in 1996 (Swaminathan et al., 2001). The external validation phase is designed to further test the robustness of the PulseNet protocol in laboratories outside the agency that developed it. The idea is to include laboratories with different levels of expertise and resources, and to assess how their preferences, in terms of some of the reagents and equipment, may influence the performance of the protocol being tested.

The aim of this study was to test the robustness and reproducibility of the proposed standardized PulseNet MLVA protocol for STEC O157 by subjecting it to a multilaboratory external validation process and to address any discrepancies that may have arisen from the study. As a follow-up, over 500 STEC O157 strains with a wide range of alleles were tested to fully assess the allelic diversity in each VNTR locus.

Materials and Methods

Multilaboratory validation study

The PulseNet standardized MLVA protocol for STEC O157 was validated in 10 PulseNet participating laboratories. PulseNet USA steering committee selected the participating laboratories based on their research proposals and their overall track record in the PulseNet system. Five laboratories used the Beckman Coulter (Fullerton, CA) CEQ™8000 capillary electrophoresis platform and the other five the Applied Biosystems (Foster City, CA) Genetic Analyzer 3100-Avant™ (2) or 3130xl (3) platforms (Table 1). Only 1 (Laboratory A) of the 10 laboratories that volunteered for the study had considerable experience with capillary electrophoresis and MLVA. All 10 laboratories analyzed the same blinded sample set of 50 STEC O157 strains provided by CDC as live cultures following the standard operating procedure (SOP) for each platform. The validation panel included strains that cover a wide spectrum of fragment sizes for each VNTR. Four strains were included in duplicate to test the intralaboratory reproducibility of the method. Six test strains, including one duplicate pair, had one or more null loci. STEC O157 strain EDL933 (ATCC 43895) was included in the panel as a reference control strain from which the fragments generated for each VNTR locus were predicted based on the published sequence. The VNTR raw data generated by the laboratories were sent to CDC as peak files (comma delimited or text delimited) and were evaluated by the CDC staff for its quality. Accuracy of sizing for each strain was compared to sizing results produced at CDC and reproducibility of sizing was determined by comparing results between laboratories.

Table 1.

Summary of Results on the Ability of the 10 Laboratories to Generate Correct Multiple-Locus Variable-Number Tandem Repeat Analysis Types and Allele Types

			Correct allele type a
Laboratory	Platform	Correct MLVA type a	O157-3	O157-34	O157-9	O157-25	O157-17	O157-19	O157-36	O157-37
A	CEQ 8000	50	50	50	50	50	50	50	50	50
B	CEQ 8000	43	50	50	50	49	49	49	44	47
C	CEQ 8000	48	50	50	49	50	50	50	49	50
D	CEQ 8000	47	50	50	50	49	50	50	48	50
E	CEQ 8000	29	43	37	42	41	42	41	39	42
F	3130xl	49	49	50	50	50	50	50	50	50
G	3130xl	46	48	50	50	50	50	49	49	50
H	3100-Avant	50	50	50	50	50	50	50	50	50
I	3100-Avant	28	50	36	48	36	48	46	45	46
J	3130xl	43	50	50	48	50	49	46	47	48

Number of isolates with a correct MLVA type or allele type out of the 50 strains tested. Alleles were assigned in the Bionumerics software using the look-up table method for the CEQ 8000 and the Genetic Analyzer 3130xl data, and manually for the 3100-Avant data.

MLVA, multiple-locus variable-number tandem repeat analysis.

Standardized MLVA protocols for STEC O157

The MLVA protocol for the Beckman Coulter CEQ 8000 platform was slightly modified from that published earlier by Hyytia-Trees et al. (2006). The original protocol was a refined version of the protocol published by Keys et al. (2005), and hence employed the same locus nomenclature. The most significant modification was the exclusion of one VNTR locus (O157-10) because it showed too much variability during outbreaks generating data that confounded epidemiological investigations. The eight VNTR loci included in the final protocol were amplified in two multiplex PCRs. Reaction 1 contained primers for loci O157-3, O157-34, O157-9, and O157-25 at final concentrations of 0.6, 0.18, 0.13, and 0.05 μM, respectively. Reaction 2 contained primers for loci O157-17, O157-19, O157-36, and O157-37 at final concentrations of 0.1, 0.02, 0.012, and 0.03 μM, respectively. The PCR primer concentrations recommended in the standard protocol served as a starting point but in most cases each laboratory had to reoptimize the primer concentrations to achieve optimal amplification of all targets. Optimization included increasing primer concentrations for targets that did not amplify well (fluorescence below 5000 units) and decreasing primer concentrations for targets that generated too intense amplicons (fluorescence above 80,000 units). No other parameters specified in the SOP should have been changed. The WellRed dyes D3 and D4 (Beckman Coulter) were replaced with more inexpensive substitutes Quas705 and Quas670, respectively (Biosearch Technologies, Novato, CA). The WellRed D2 dye was used as the label for the forward primers of loci O157-3 and O157-17. All primers were synthesized by the CDC Biotechnology Core Facility (Atlanta, GA). The DNA template preparation, PCR amplification, fragment analysis setup, and data analysis in the BioNumerics software (Applied Maths, Kortrijk, Belgium) were performed as described by Hyytia-Trees et al. (2006).

The MLVA protocol for the Applied Biosystems Genetic Analyzer 3100-Avant and 3130xl platforms was developed by modifying the CEQ 8000–specific MLVA protocol described above as follows: (1) the fluorescent dyes Quas670 and Quas705 and WellRed D2 in the forward primers were substituted with FAM, HEX, and CalRed590 (Biosearch Technologies); (2) the following primer concentrations were used as a starting point in Reaction 1: O157-3, 0.5 μM; O157-34, 0.08 μM; O157-9, 0.06 μM; O157-25, 0.05 μM; and in Reaction 2: O157-17, 0.15 μM; O157-19, 0.015 μM; O157-36, 0.012 μM; O157-37, 0.015 μM; (3) after PCR amplification, the PCR products were diluted 1:20 with distilled water, and a 1 μL aliquot of the dilution was mixed with 8 μL of Hi-Di Formamide (Applied Biosystems) and 1 μL of GeneFlo 625 ROX-labeled DNA size standard (Chimerx, Madison, WI); (4) the PCR products were sized in the Genetic Analyzer 3130xl using the default running conditions for 50 cm capillary array and POP7 polymer; laboratories using the Genetic Analyzer 3100-Avant and POP6 polymer had to increase the default running time of 4000 to 6500 s to be able to record all size standard peaks. As with the laboratories using the Beckman Coulter CEQ 8000 platform, the Applied Biosystems instrument users were allowed to optimize the primer concentrations in the PCR amplifications (target peak heights 1000–6000 fluorescence units).

Development and validation of a strategy to normalize fragment size data in BioNumerics

A total of 502 diverse STEC O157 isolates from the CDC strain collection were tested using both the Beckman Coulter CEQ 8000 and the Applied Biosystems Genetic Analyzer 3130xl at CDC. Fragment size ranges for each allele in both platforms were manually recorded in two fragment size range tables. The allele (copy number) assignment in the CEQ 8000–specific table (Table 2) was based on the approach that was described by Hyytia-Trees et al. (2006), that is, the alleles were assigned in the BioNumerics software by using a mathematical algorithm that deducted the offset in base pairs (flanking sequences outside the VNTR region) from the observed fragment size and divided the remaining fragment size by the repeat unit size. In the Genetic Analyzer 3130xl–specific table (Table 3) adjustments were manually made, so that the same allele was always assigned to the same fragment regardless of the platform used to size it. A novel BioNumerics script was developed by Applied Maths that referred to the platform-specific fragment size range tables (look-up tables, i.e., Tables 2 and 3) for allele assignment instead of using the mathematical algorithm described above. The script and the tables were integrated into the BioNumerics software by saving them in specific subfolders in the database file. The new script and the tables were validated by reanalyzing the data from the 502 isolates used to construct the tables in the same database using the look-up table approach. The datasets for the 50 isolates generated by the eight laboratories that possessed either the CEQ 8000 or the Genetic Analyzer 3130xl were also reanalyzed using the look-up table approach.

Table 2.

Beckman Coulter CEQ 8000–Specific Fragment Size Range Table for Analyzing Multiple-Locus Variable-Number Tandem Repeat Analysis Data in Bionumerics

	VNTR locus (bp)
Allele	O157-3	O157-34	O157-9	O157-25	O157-17	O157-19	O157-36	O157-37
1			471–472	116–119
2	333–334			122–124	130–133	282–284
3	339–340		482–484	128–130	137–140	291–292	123–125	159–160
4	345–346	170–172	488–490	135–136	143–146	296–299	130–132	164–166
5	350–353	188–190	494–497	141–142	150–152	302–304	136–139	170–173
6	356–358	206–209	499–502	146–148	156–159	307–311	143–146	177–178
7	362–364	224–227	507–509	153–154	162–165	314–316	150–153	183–185
8	368–370	241–245	512–515	158–160	169–172	320–321	158–161	188–191
9	374–377	260–262	518–521	165–166	175–178	326–328	165–167	195–197
10	380–383	278–280	525–527	170–171	182–184	332–333	172–174	200–203
11	386–389	295–297	530–533		188–190	338–339	179–181	207–209
12	392–395		536–539		195–197	344–345	187–188	213–215
13	398–401		543–546	188–190	199–201		193–195	219–220
14	404–407		549–551				201–202	225–226
15	410–413		555–558				207–209	231–232
16	417–419		561–563					237–238
17	423–425		566–570					241–244
18	429–431		572–576		227–228
19	435–437		579–581
20	442–443		584–586
21	448–449		590–593
22	454–456		597–598					273–274
23	460–464		603–604
24	466–468		608–611
25			614–615

VNTR, variable-number tandem repeat.

Table 3.

Applied Biosystems Genetic Analyzer 3130xl–Specific Fragment Size Range Table for Analyzing Multiple-Locus Variable-Number Tandem Repeat Analysis Data in Bionumerics

	VNTR locus
Allele	O157-3	O157-34	O157-9	O157-25	O157-17	O157-19	O157-36	O157-37
1			474–475	122–124
2	339–340			127–128	135–136	283–284
3	343–346		485–486	132–134	140–142	291–292	123–124	157–158
4	349–352	169–171	491–492	138–140	147–148	297–298	129–131	163–165
5	355–358	187–188	496–498	144–146	153–155	302–304	136–137	169–171
6	361–364	205–206	502–504	151–152	159–161	309–310	141–144	175–176
7	367–370	223–224	508–510	157–158	166–167	315–316	150–151	181–182
8	373–376	242–243	514–516	162–164	172–173	321–322	157–158	187–188
9	380–382	260–262	520–522	168–169	177–179	327–328	164–165	193–194
10	385–388	278–280	526–528	174–175	184–185	332–334	170–172	199–200
11	391–393	295–298	531–534		190–191	338–340	177–179	205–206
12	397–401		536–539		196–197	344–345	184–185	211–212
13	404–406		543–545	191–192	201–202		191–192	217–218
14	410–412		549–551				198–199	223–224
15	416–417		555–557				204–205	229–231
16	422–424		560–563					236–237
17	429–430		566–569					241–244
18	434–435		572–575		227–228
19	440–441		578–581
20	447–448		584–585
21	453–455		590–591
22	459–461		596–597					273–274
23	466–467		601–603
24	472–473		607–608
25			613–614

Results

Interlaboratory reproducibility of generating correct MLVA types

Six out of the 10 laboratories were able to generate correct MLVA types for 46 (92%) or more strains out of the 50 included in the study (Table 1). Only two laboratories (A and H) generated correct MLVA types for all 50 isolates. The main reason for incorrect MLVA types was the failure to be able to amplify all eight VNTRs in the two multiplex PCRs due to poorly optimized primer concentrations. This was particularly notable with laboratories E and I. Some of the laboratories that performed well otherwise (B, D, J) had trouble with the locus O157-36 because it frequently contains null alleles. A total of 5 out of the 50 isolates had a null allele in this locus. False-positive results were associated with problems, such as too high primer concentrations that resulted in either high background (peaks with fluorescence level below 1000 units) or fluorescence overflow between channels (a too strong signal at one channel spills over as an artifact to the adjacent channel) causing nonspecific peaks with fragment sizes that could be correct for locus O157-36. On the other hand, all 10 laboratories correctly identified null alleles (one strain each) for loci O157-3 and O157-37. Less frequent reasons for discrepancies in MLVA type assignment included incorrect fragment sizing due to the DNA size standard not running properly because of suboptimal quality of the capillary array or reagents, and an apparent sample transposition. Six laboratories (A, B, D, F, H, and J) were able to generate the same MLVA pattern for the four duplicate isolate pairs. Again, locus O157-36 caused most difficulties for laboratories in terms of reproducibility, since one of the duplicate isolate pairs had a null allele in this locus. Three laboratories (E, G and I) were not able to reproducibly identify the null allele in this locus.

Interlaboratory reproducibility of fragment sizing

The interlaboratory reproducibility of the fragment sizing for the same strain was within 1.5 base pairs when the exact same platform was used with the exception of the Applied Biosystems Genetic Analyzer 3100-Avant. Size ranges of 1.5–3.0 base pairs were detected for the large fragments of locus O157-9 with the 3100-Avant (Table 4). Even more significant sizing discrepancies were detected between the Beckman Coulter CEQ 8000 and the Applied Biosystems Genetic Analyzer 3100-Avant/3130xl (Tables 4 and 5). There were also sizing discrepancies in all three platforms between the observed fragment sizes and the fragment sizes that were predicted from the genome sequence of the reference strain EDL933 (Table 4). The sizing differences were entirely locus dependent, that is, sizing was not consistently off a certain number of base pairs in all eight loci. In two loci (O157-34 and O157-19) sizing agreed well between the platforms. In some loci (O157-3, O157-25, O157-17) sizing varied as much as three to six base pairs which is a discrepancy large enough to result in a difference in the allele type if the allele type is assigned using the mathematical algorithm designed to deduct the offset region from each amplicon (Table 6). In the loci O157-9 and O157-36 the sizing discrepancies were not linear but were decreasing or increasing with the increasing fragment size (Table 5). The locus-dependent sizing discrepancies could not be explained by an association with any particular fluorescent dye, fragment size, repeat unit size, or the sequence GC content. Even though the sizing agreed better between the Genetic Analyzer versions 3100-Avant and 3130xl, there were notable discrepancies in three out of the eight loci (O157-9, O157-17 and O157-37).

Table 4.

Fragment Size Ranges for the Shiga Toxin–Producing Escherichia coli O157-Positive Control Strain EDL933 in the Beckman Coulter CEQ 8000, Applied Biosystems Genetic Analyzer 3130xl and 3100-Avant

	VNTR fragment size (bp)
Locus	Predicted a	CEQ 8000 b	3130xl c	3100-Avant d
O157-3	377	374.9–376.0	380.3–381.3	380.4–381.6
O157-34	277	277.9–279.1	278.9–279.7	279.5–279.8
O157-9	532	531.0–531.4	532.0–532.8	533.8–535.6
O157-25	142	133.9–135.0	138.5–139.1	139.0–139.4
O157-17	156	157.0–158.3	159.9–160.3	158.9–159.1
O157-19	308	309.0–309.5	309.1–309.9	309.4–310.2
O157-36	156	159.2–159.8	157.1–157.6	157.3–157.4
O157-37	184	188.9–189.9	187.3–187.9	186.4–186.7

Amplicon size based on the EDL933 genome.

Size range observed in six laboratories (study participants and CDC).

Size range observed in four laboratories (study participants and CDC).

Size range observed in two laboratories (study participants).

Table 5.

Examples of Fragment Sizes for the Same Alleles in Beckman Coulter CEQ 8000 and Applied Biosystems Genetic Analyzer 3130xl and 3100-Avant in the Eight Variable-Number Tandem Repeat Loci Included in the Shiga Toxin–Producing Escherichia coli O157 Multiple-Locus Variable-Number Tandem Repeat Analysis Assay

VNTR allele size range a (no. of isolates)	CEQ 8000 (bp) b	3130xl (bp) c	3100-Avant (bp) d
O157-3
Small (1)	351.1–351.6	355.7–357.2	356.3–357.6
Medium (9)	398.0–400.8	404.0–405.0	404.1–405.5
Large (1)	441.6–442.6	447.0–447.9	446.1–447.3
O157-34
Small (24)	224.2–226.4	223.2–223.8	223.7–224.2
Medium (14)	259.9–261.9	261.0–262.0	261.1–261.9
Large (11)	277.7–279.5	279.0–280.0	279.5–280.1
O157-9
Small (2)	493.4–495.0	496.5–497.5	497.4–498.6
Medium (10)	543.1–543.8	543.4–545.6	543.6–547.6
Large (2)	579.2–580.0	578.5–579.0	581.2–583.4
O157-25
Small (6)	121.6–123.1	127.3–127.9	127.6–128.0
Medium (37)	133.9–135.5	138.5–139.4	138.5–139.4
Large (1)	140.6–141.2	144.7–145.2	145.0–145.4
O157-17
Small (1)	137.5–138.7	141.1–141.2	140.2–140.6
Medium (6)	150.2–151.6	153.8–154.2	152.6–153.0
Large (3)	169.2–171.3	172.3–172.6	171.1–171.3
O157-19
Small (2)	297.3–297.9	297.3–297.8	297.2–297.8
Medium (16)	314.2–315.6	315.0–316.0	315.5–316.3
Large (1)	332.5–332.7	333.1–333.7	333.7–334.3
O157-36
Small (4)	138.0–138.6	136.1–136.4	136.2–136.4
Medium (17)	158.8–159.8	157.1–157.6	157.3–157.4
Large (2)	187.4–187.9	184.5–184.9	185.1–185.2
O157-37
Small (2)	170.4–171.2	169.7–170.2	168.4–168.6
Medium (19)	188.5–189.6	187.4–188.0	186.3–186.7
Large (1)	200.7–201.2	199.5–199.7	198.1–198.3

Fragment size ranges for each locus given for the smallest, the medium-sized, and the largest allele included in the multilaboratory study sample set.

Fragment size ranges in six laboratories (CDC and study participants).

Fragment size ranges in four laboratories (CDC and study participants).

Fragment size ranges in two laboratories.

Table 6.

Allele Types (Copy Numbers) for Each of the Eight Variable-Number Tandem Repeat Loci in the Shiga Toxin–Producing Escherichia col i O157 Strain EDL933 Based on the Fragment Sizes Generated by the Beckman Coulter CEQ 8000 and the Applied Biosystems Genetic Analyzer 3130xl

		VNTR allele (copy number)
Platform	Allele assignment method	3 a	34 a	9 a	25 a	17 a	19 a	36 a	37 a
CEQ 8000	Algorithm	9.0	10.0	11.0	4.0	6.0	6.0	8.0	8.0
3130xl	Algorithm	10.0	10.0	11.0	5.0	7.0	6.0	8.0	8.0
CEQ 8000	Look-up table	9.0	10.0	11.0	4.0	6.0	6.0	8.0	8.0
3130xl	Look-up table	9.0	10.0	11.0	4.0	6.0	6.0	8.0	8.0

Comparison between the mathematical algorithm and the look-up table allele assignment methods.

VNTR locus nomenclature as originally described by Keys et al. (2005).

Validation of the look-up tables

Fragment size data from the CEQ 8000 and the Genetic Analyzer 3130xl for 502 diverse STEC O157 isolates were analyzed in the same BioNumerics database. Allele assignment based on the platform-specific look-up tables was in complete agreement between the two platforms for all isolates. Table 6 displays an example for one isolate on how the sizing discrepancies resulted in a difference in the allele assignment when the mathematical algorithm was used and how reanalyzing the data using the look-up table approach corrected the allele discrepancy. The discrepancies in the allele assignment between the laboratories that participated in the external validation (Table 1) were mainly caused by false-positive or false-negative PCRs, as discussed earlier, not by platform-dependent sizing discrepancies.

Discussion

The most commonly seen problems encountered during external validations of PulseNet protocols, regardless of the method being validated, usually stem from differences in reagents and equipments used by the participating laboratories (Kam et al., 2008). Laboratory-specific conditions, such as the type and calibration status of the thermocycler, the type of Taq polymerase used, and the overall familiarity of the staff in working with small pipetting volumes, can drastically affect the amplification efficiency. Therefore, each laboratory in this study had to reoptimize the primer concentrations so that all targets would amplify with approximately the same efficiency. The reoptimization process was especially challenging for many laboratories, since most of them had no significant prior experience in fragment analysis. As a consequence, poorly optimized PCRs and nonauthorized deviations (use of a different type of Taq polymerase) from the SOP were the main reasons for a failure to produce high-quality data or correct MLVA types. If anything, this study demonstrated clearly that staff training is an essential part of technology transfer. It is crucial that the technical staff understand which parameters can be modified in the process of optimization, and how changing different parameters will affect the outcome of the assay. For example, the use of Taq polymerase with a different level of stringency than what was specified in the SOP (Platinum Taq; Invitrogen, Carlsbad, CA) can either cause nonspecific peaks and/or incorrect sizing (lower stringency) or a failure of the target to amplify (higher stringency). As a consequence of this validation study, some adjustments to the SOP were made. For example, some primer working concentrations were made more dilute to enable larger pipetting volumes for easier pipetting. Additionally, the parameters that are flexible are now explicitly stated. Hawes et al. (2006) reported a multilaboratory study assessing the ability of DNA sequencing core facilities to successfully sequence a set of well-defined templates containing difficult repeats. They, too, found that laboratory-specific variables such as instrument robustness, reagent quality, and technical experience affected the results even when a standardized protocol was used.

In addition to the issues in optimizing the PCRs, the comparability of fragment sizing between laboratories using different capillary electrophoresis platforms turned out to be a major problem. In the recent multilaboratory study that evaluated the reproducibility of a microsatellite-based typing assay for Aspergillus fumigatus, de Valk et al. (2009) also reported significant sizing discrepancies between different platforms. The Beckman Coulter and the Applied Biosystems capillary electrophoresis platforms differ in the dye chemistries, DNA size standards, and the types of polymers used. The Applied Biosystems Genetic Analyzer 3130xl and 3100-Avant employ the same dye chemistry and size standard but different polymers (POP7 vs. POP6). The different dye chemistries and DNA size standards probably accounted for some of the sizing discrepancies seen in this study since different fluorescent labels have differing dye motilities due to different structures (Tu et al., 1998). The size standards also had a differing number of fragments in them although the size range covered was similar. However, a more important source for the discrepancies most likely was the different polymers used by these three platforms. Capillary electrophoresis systems employ interpolation algorithms that can convert mobility data from an unknown peak into size information, based on the size versus mobility data from the standard set (Rosenblum et al., 1997). The electrophoretic mobility of DNA is sequence dependent; therefore, DNA fragments of the same length can have different mobility, and hence can vary in the calculated fragment size. The different polymers are likely to have different compositions in terms of types and concentrations of denaturating agents (the exact composition of each polymer is proprietary information). Consequently, each system will have slightly different denaturating conditions that result in various degrees of secondary and tertiary structure and anomalous migration of DNA (Rosenblum et al., 1997). As a result, in most loci the observed fragment size did not agree with the size predicted from the sequence (Table 4) and fragment sizes from different platforms did not agree with each other. To draw an analogy to more commonly used PFGE technology, it is known that the type of agarose used to generate PFGE patterns will affect run time and band resolution, hence resulting in differences in banding patterns between laboratories using different agaroses (PulseNet USA, unpublished data).

Since the sizing discrepancies between the Beckman Coulter and Applied Biosystems platforms were significant in many loci, a strategy had to be developed to allow direct comparison of data in the same database. In their study, de Valk et al. (2009) proposed accomplishing this by using allelic ladders that would contain reference peaks for each allele. From PulseNet's viewpoint such an approach would be impractical, since constructing an allelic ladder would require considerable effort and the ladder would also have to be made available for all laboratories participating in the network. Larsson et al. (2009) recently suggested that fragment sizing data for each platform should be normalized to be comparable to the fragment size predicted from the actual sequence. They proposed achieving this by sending out a reference set of isolates that covers the most common alleles. Since all alleles for each locus were not covered by the reference set even though the authors sequenced at least one representative of each allele, an assumption was made that a mathematical algorithm can be developed to normalize the data for all alleles. The theoretical benefit of this approach is that VNTR data could be compared even when the same primer sequences are not used for the same locus, since multiple competing partially overlapping protocols are being used around the world (Keys et al., 2005; Hyytia-Trees et al., 2006; Noller et al., 2006; Lindstedt et al., 2007). However, the disadvantage is that because the sizing discrepancies between different platforms and the actual sequence are not always linear, developing an algorithm that accurately accounts for nonlinear discrepancies would be extremely challenging. Additionally, sequencing just one representative of each allele may not give adequate information on the actual sequence length, since partial repeats are prevalent in some loci. Therefore, PulseNet USA's strategy was to normalize size data from one platform so that it was comparable with the other platform, and not to give any consideration to normalizing data from both platforms so that it would be comparable with the actual sequence data. Since PulseNet USA has for the last 4 years collected data using the Beckman Coulter CEQ 8000 platform, it was natural to use these data as the basis of the national database and normalize the Genetic Analyzer 3130xl data to be comparable with the CEQ 8000 data. Because the older version (3100-Avant) of the Genetic Analyzer is being phased out by Applied Biosystems, PulseNet USA decided not to support that platform in the network, particularly since the sizing did not completely agree between the new and old versions, and since the reproducibility of sizing large fragments seemed to be compromised with the 3100-Avant. The platform-specific look-up tables (Tables 2 and 3) that were constructed to automate the analysis of MLVA data in the BioNumerics software are flexible in that they can be easily adjusted when new alleles are discovered. In practice, these tables will only be able to be modified by PulseNet USA's database managers, and only after the identity of the new allele has been confirmed by PulseNet USA's central laboratory at CDC. In future, when platforms are upgraded or replaced, new tables will most likely have to be constructed to maintain backward comparability of the data.

Conclusions

The results from the multilaboratory validation study of the PulseNet STEC O157 MLVA protocol reported here highlighted the typical challenges that must be overcome by laboratory networks that collect and analyze data generated in different laboratories. The challenges with PCR-based protocols are magnified due to the availability of multiple different platforms (thermocyclers, capillary electrophoresis equipment) and reagents of various price ranges that do not always produce data of comparable quality. The fact that many of the laboratories that participated in this study performed well, even when experience in fragment analysis technology was lacking, suggests that the challenges can be overcome by strict adherence to the SOPs, providing training and establishing a comprehensive quality assurance/quality control program. One of the major challenges that PulseNet USA had to overcome was to provide public health laboratories in the United States with the ability to directly compare and analyze MLVA data obtained with the different capillary electrophoresis platforms described herein. This was achieved by developing platform-specific look-up tables that can be used in the BioNumerics software to automatically assign allele types based on the fragment size data.

Footnotes

Acknowledgments

The authors wish to thank the following PulseNet USA participating laboratories: Arkansas Department of Health Public Health Laboratories, Colorado Department of Public Health and Environment, Florida Department of Health, Hawaii Department of Health, Michigan Department of Community Health, New Jersey State Department of Health, New Mexico Department of Health, Orange County Public Health Laboratory, Commonwealth of Virginia Division of Consolidated Laboratories; and the following PulseNet International partners: National Microbiology Laboratory, Canada, and National Institute of Infectious Diseases, Japan.

Disclosure Statement

No competing financial interests exist.

References

Beranek

, Mikula

, Rabold

et al. Multiple-locus variable-number tandem repeat analysis for subtyping of Salmonella enterica subsp. enterica serovar Enteritidis. Int J Med Microbiol, 2009; 299:43–51.

Boxrud

, Pederson-Gulrud

, Wotton

et al. Comparison of multiple-locus variable-number tandem repeat analysis, pulsed-field gel electrophoresis, and phage typing for subtype analysis of Salmonella enterica serotype Enteritidis. J Clin Microbiol, 2007; 45:536–543.

de Valk

, Meis

, Bretagne

et al. Interlaboratory reproducibility of a microsatellite-based typing assay for Aspergillus fumigatus through the use of allelic ladders: proof of concept. Clin Microbiol Infect, 2009; 15:180–187.

Gerner-Smidt

, Hise

, Kincaid

et al. PulseNet USA: a five-year update. Foodborne Pathog Dis, 2006; 3:9–19.

Hawes

, Knudtson

, Escobar

et al. Evaluation of methods for sequence analysis of highly repetitive DNA templates. J Biomol Tech, 2006; 17:138–144.

Hyytia-Trees

, Smole

, Fields

et al.

Second generation subtyping: a proposed PulseNet protocol for multiple-locus variable-number tandem repeat analysis of Shiga toxin-producing Escherichia coli O157 (STEC O157)

Foodborne Pathog Dis, 2006; 3:118–131.

Kam

, Luey

, Parsons

et al. Evaluation and validation of a PulseNet standardized pulsed-field gel electrophoresis protocol for subtyping Vibrio parahaemolyticus: an international multicenter collaborative study. J Clin Microbiol, 2008; 46:2766–2773.

Kashi

, King

, Soller

. Simple sequence repeats as a source of quantitative genetic variation. Trends Genet, 1997; 13:74–78.

Keim

, Price

, Klevytska

et al. Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis. J Bacteriol, 2000; 182:2928–2936.

10.

Keys

, Kemper

, Keim

. Highly diverse variable number tandem repeat loci in the E. coli O157:H7 and O55:H7 genomes for high-resolution molecular typing. J Appl Microbiol, 2005; 98:928–940.

11.

Larsson

, Torpdahl

, Petersen

et al.

Development of a new nomenclature for Salmonella Typhimurium multilocus variable number of tandem repeats analysis (MLVA)

Eur Surveill, 2009; 14:1–5.

12.

Lindstedt

, Brandal

, Aas

et al. Study of polymorphic variable-number of tandem repeats loci in the ECOR collection and in a set of pathogenic Escherichia coli and Shigella isolates for use in a genotyping assay. J Microbiol Methods, 2007; 69:197–205.

13.

Lindstedt

, Heir

, Gjernes

et al. DNA fingerprinting of Salmonella enterica subsp. enterica serovar Typhimurium with emphasis on phage type DT104 based on variable number of tandem repeat loci. J Clin Microbiol, 2003; 41:1469–1479.

14.

Murphy

, Corcoran

, Buckley

et al. Development and application of multiple-locus variable number of tandem repeat analysis (MLVA) to subtype a collection of Listeria monocytogenes. Int J Food Microbiol, 2007; 115:187–194.

15.

Noller

, McEllistrem

, Shutt

et al. Locus-specific mutational events in a multilocus variable-number tandem repeat analysis of Escherichia coli O157:H7. J Clin Microbiol, 2006; 44:374–377.

16.

Rosenblum

, Oaks

, Menchen

et al. Improved single-strand DNA sizing accuracy in capillary electrophoresis. Nucleic Acids Res, 1997; 25:3925–3929.

17.

Shopsin

, Gomez

, Montgomery

et al. Evaluation of protein A gene polymorphic region DNA sequencing for typing of Staphylococcus aureus strains. J Clin Microbiol, 1999; 37:3556–3563.

18.

Sperry

, Kathariou

, Edwards

et al. Multiple-locus variable-number tandem-repeat analysis as a tool for subtyping Listeria monocytogenes strains. J Clin Microbiol, 2008; 46:1435–1450.

19.

Swaminathan

, Barrett

, Hunter

et al. PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerg Infect Dis, 2001; 7:382–389.

20.

, Knott

, Marsh

et al. The influence of fluorescent dye structure on the electrophoretic mobility of end-labeled DNA. Nucleic Acids Res, 1998; 26:2797–2802.

21.

van

Belkum A

, Scherer

, van

Alphen L

et al. Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev, 1998; 62:275–293.