Abstract
Background:
Population-, assay-, and trimester-specific reference intervals for thyroid function tests are necessary to assess thyroid status accurately and manage thyroid disease throughout pregnancy. This study's objective was to verify if the manufacturer's recommended trimester-specific reference intervals for thyroid tests and the American Thyroid Association's recommended total thyroxine (TT4) pregnancy reference intervals were verifiable and appropriate for use in the authors' multicultural population.
Methods:
Blood samples were obtained from the following sources: stored frozen surplus blood from women undergoing routine aneuploidy screening (first- and second-trimester samples, n = 274), women participating in an observational cohort study (second- and third-trimester samples, n = 135), and blood collected from women presenting for assessment to the labor and delivery ward (third-trimester samples, n = 35). Exclusions included thyroid medication or disease and positive thyroid peroxidase antibodies (anti-TPO). Samples were analyzed for thyrotropin (TSH), free T4 (fT4), free triiodothyronine (fT3), TT4, and anti-TPO using the Roche Cobas 8000 Modular e602 electrochemiluminescence immunoassay.
Results:
Nine percent of the aneuploidy screening samples were excluded prior to thyroid testing due to maternal use of thyroid medication. Six percent of analyzed samples were excluded: 5.9% with positive anti-TPO and one with a TSH >10 mIU/L. The manufacturer's recommended trimester-specific reference intervals for TSH were not verified by described standardized methods. Therefore, 95th percentile reference intervals were determined using a minimum number of samples. Reference intervals for TSH and fT4 were as follows: 9–12 weeks, 0.18–2.99 mIU/L and 11–19.2 pmol/L; second trimester, 0.11–3.98 mIU/L and 10.5–18.2 pmol/L; and third trimester, 0.48–4.71 mIU/L and 9.0–16.1 pmol/L, respectively. The TT4 reference interval after 19 weeks' gestation was 77–186 nmol/L.
Conclusions:
This study provides a simple approach to verify or establish trimester-specific thyroid function reference intervals in local populations. The TT4 reference interval was lower than the interval proposed by the American Thyroid Association, suggesting the need for further study of TT4 in pregnancy and reliance on locally established fT4 reference intervals after 19 weeks, especially when there are no equivalent reference intervals for TT4.
Introduction
Thyroid dysfunction is common in pregnancy. It is important to diagnose and treat overt thyroid disease effectively throughout pregnancy, as complications including spontaneous abortion, preeclampsia, stillbirth, and heart failure can be avoided with appropriate and timely therapy (1,2). Accurate assessment of thyroid function during pregnancy is challenging. The 2017 American Thyroid Association (ATA) pregnancy guidelines strongly recommend the determination of local trimester-specific reference intervals for thyrotropin (TSH) (1). However, local trimester-specific reference intervals are often not available. In these circumstances, these guidelines recommend the use of “pregnancy-specific TSH reference intervals obtained from similar patient populations performed using similar TSH immunoassays.” They also caution that the accuracy of immunoassays for free thyroxine (fT4) measurement is influenced in pregnancy and that reference intervals on immunoassays for fT4 vary substantially by assay used (1,3). Therefore, the ATA pregnancy guidelines strongly recommend the use of total T4 (TT4; with a pregnancy-adjusted reference interval 50% higher than the non-pregnant limit) instead of fT4 as “a highly reliable way of estimating hormone concentrations during the last part of pregnancy” (1). Lack of locally established reference intervals for thyroid function tests throughout pregnancy can lead to misclassification of thyroid status in pregnant women. Such misclassification may result in unnecessary and potentially harmful therapy for some pregnant women and failure to diagnose and treat accurately other pregnant women who would benefit from therapy (1,2,4,5).
The ATA 2017 pregnancy guidelines provide a helpful summary table of reference intervals for TSH and fT4 in early pregnancy, but there is a relative lack of published laboratory thyroid function reference intervals after 18 weeks' gestation (1). The only assay platform similar to the one used at the authors' institution (the Roche Cobas 8000 Modular e602 electrochemiluminescence immunoassay), with published reference intervals in pregnancy, is the Roche Modular E170 (6,7). Using the Roche Modular E170 assay platform, Vaidya et al. established TSH and fT4 reference intervals in British women at <12 weeks' gestation but provided no data on trimester-specific thyroid function test reference intervals after 12 weeks' gestation (6). Gong and Hoffman determined the fT4 reference interval in each trimester of pregnancy, also using the Roche Modular E170 electrochemiluminescence immunoassay, but did not clearly establish trimester-specific reference intervals for TSH (7).
Given the importance of using appropriate reference intervals for the interpretation of thyroid function tests in pregnancy and the variability between assays, the goals of this study were (i) to determine if the manufacturer's recommended trimester-specific reference intervals could be verified for use in the study population (8); (ii) to establish trimester-specific reference intervals in the study's local multi-ethnic population for TSH, fT4, free triiodothyronine (fT3), and TT4 with the Roche Cobas 8000 Modular e602 electrochemiluminescence immunoassay if the manufacturer's recommended reference intervals were not verifiable; (iii) to verify if the ATA's recommendation to use a calculated reference interval for TT4 of 50% higher than the non-pregnant limits after 19 weeks' gestation was appropriate at the authors' site; and (iv) to describe a simple method to guide other centers in verifying and determining local trimester-specific reference intervals for laboratory thyroid function test results.
Methods
Study participants
This study was conducted in Calgary, Canada. Calgary is a large multicultural city with >1.2 million residents and where immigrants make up 30% of the population (9). Universal salt iodization has been mandatory in Canada since the 1920s.
Blood samples were obtained from the following sources: stored frozen surplus blood from women undergoing routine aneuploidy screening (first- and second-trimester samples) from 2014 and 2015, stored frozen blood from healthy pregnant women participating in an observational cohort study, All Our Families (second- and third-trimester samples), and blood collected from women presenting for assessment to the labor and delivery ward (third-trimester samples) (10). In Calgary, in 2014 and 2015, 64.5% of pregnant women underwent first-trimester aneuploidy screening (J.A. Johnson, Calgary Early Risk Assessment program, pers. commun.). These women were of a similar age to the general local population of all pregnant women during this same time period (i.e., 31.2 and 31.0 years, respectively; J.A. Johnson, pers. commun., and S. Crawford, Alberta Perinatal Health Program, pers. commun.). The All Our Families (formerly All Our Babies) cohort study has been described elsewhere (10). Briefly, the All Our Families study established a birth cohort of 2000 women that looked at the predictive potential of blood markers for preterm birth in non-laboring women. They provided random samples that were collected between 27 and 32 weeks' gestation from healthy women without thyroid disease with singleton pregnancies who received prenatal care in Calgary.
Exclusion criteria included known thyroid disease. For the samples that came from surplus blood from aneuploidy screening, this was based upon administrative data, that is, hospitalization data and pharmacy records were reviewed to exclude samples from women using thyroid medication (levothyroxine, liothyronine, desiccated thyroid, propylthiouracil, or methimazole) and/or with International Classification of Disease v10 Canadian Modification (ICD10-CA) codes for thyroid disease (E00-E07, E35.0, E89.0, R94.6, T38.1, T38.2, Y42.1, Y42.2) during the previous two years. Samples from the other two sources excluded women with known thyroid disease from study entry based on the participants' self-reported medical history and medication list. Other exclusion criteria included thyroid peroxidase antibodies (anti-TPO) that were positive (>34 kIU/L) and overt thyroid dysfunction.
Thyroid hormone testing
Samples were tested for anti-TPO, TSH, fT4, fT3, and TT4 by the methods routinely used in Calgary (i.e., the Roche Cobas 8000 Modular e602 electrochemiluminescence immunoassay). The anti-TPO, fT4, fT3, and TT4 are competitive immunoassays, while the TSH method is a sandwich immunoassay (Roche Diagnostics GmbH, Mannheim, Germany) (8).
Verifying and establishing reference intervals
As outlined in Figure 1 (steps 1–3), a standardized approach was followed to verify the manufacturer's reference intervals for TSH (8,11). For each trimester, TSH results of 20 randomly selected participants were examined to eliminate outliers by the Reed/Dixon method described by the Clinical and Laboratory Standards Institute (11). Briefly, the ratio D/R is calculated for each sample, where D is the absolute difference between an extreme observation and the next largest (or smallest) observation and R is the range of all observations. When that ratio is >1:3, the value is discarded, and the next participants' sample is included in the 20-participant group. Since the distribution of TSH is not Gaussian, the log of TSH values was used to eliminate outliers. Once the 20 samples for inclusion were identified, they were compared to the trimester-specific reference interval of the manufacturer (Fig. 1).

Steps 1–3 are used for validation of a proposed reference interval. Steps 4–7 are for establishment of 95th percentile reference intervals using a minimum number of samples, as recommend by the Clinical and Laboratory Standards Institute (11).
In the clinical laboratory, it is accepted practice that a reference interval can be accepted as verifiable locally if no more than 2/20 participants' results in a selected population fall outside of a proposed reference interval (11). Therefore, if any trimester-specific group had more than two participants' values fall outside of the proposed reference interval, a second group of 20 participants' samples was randomly selected and the process was repeated. According to the Clinical Laboratory Standards Institute guidelines, usually 20 samples are used. However, extra samples were obtained to allow for at least 20 samples after anti-TPO-positive women are excluded (11). When more than two participants' results of the 20 samples are again outside of the proposed reference interval, the interval is to be rejected. With this binomial test approach to verification of a reference interval, the probability that more than two test results will fall outside the comparison reference interval, when 95% of the site-specific population falls within those limits, is only 7.5%. When collecting samples from an additional 20 reference individuals, if more than two in the original set of 20 were outside the proposed limits, the probability of false rejection drops to <1% (11). When using this binomial test to validate a reference interval, when none of the 20 samples values fall outside the proposed reference interval, one should consider the possibility that the proposed reference interval may be too wide for the target population (11). When reference intervals are not verified by this binomial test, additional samples to total at least 120 anti-TPO-negative samples per trimester are required, so that local 95th percentile reference intervals can be determined using a minimum number of values, as recommend by the Clinical and Laboratory Standards Institute (11) and outlined in Figure 1 (steps 4–7).
As an additional step, the study also evaluated whether the ATA recommended reference interval for TT4 after 19 weeks' gestation of 50% higher than the non-pregnant reference interval was verifiable locally.
Statistical analysis
Kruskal–Wallis tests were used to assess for differences between trimesters for TSH, fT3, fT4, and TT4 followed by post hoc Mann–Whitney U-tests. Relationships between TSH and fT4 in the first trimester and for TSH, fT4, and TT4 after 19 weeks' gestation were visually examined, and Pearson's correlation coefficients were determined. TT4 values were compared to the ATA pregnancy recommended reference intervals. Statistical analysis was performed using IBM SPSS Statistics for Windows v19 (IBM Corp., Armonk, NY). This study was approved by the Conjoint Health Research Ethics Board at the University of Calgary.
Results
A total of 595 samples from healthy women with healthy fetuses were considered for inclusion (Fig. 2). This included 425 samples from women undergoing routine aneuploidy screening, 135 from the All Our Families study, and 35 from presentation to case room with no known history of thyroid disease or thyroid medication use. Hospitalization data and pharmacy records for aneuploidy samples were reviewed, resulting in the exclusion of 39 (9%) samples because of thyroid medication use and/or ICD10-CA codes for thyroid disease in the previous two years. From the remaining 386 samples, 132 first- and 142 second-trimester samples were randomly selected for analysis. No samples were excluded because they were outliers based on the Reed/Dixon method described above. One sample drawn at 14 weeks' gestation was excluded because the TSH was compatible with overt hypothyroidism (i.e., 10.3 mIU/L). The reasons for and number of samples excluded from further analysis are indicated in Figure 2.

Flow diagram of sample sources and exclusions.
Maternal characteristics
The mean maternal age, weight, and gestational age at time of sample collection are shown in Table 1. Maternal age, weight, TSH, and fT4 measurements of anti-TPO-negative and -positive women were similar (Appendix A1). Of the All Our Families participants included in this study, 75% were Canadian born. The remaining 25% of women were born in other places, including Southeast Asia, Africa, Russia, Europe, the Philippines, South America, China, New Zealand, or the Middle East. The country of origin of the other study participants is not known.
Maternal Characteristics
All values are presented as means ± standard deviations.
Verification of manufacturer's reference intervals and establishment of reference intervals
More than two of 20 randomly selected sample values fell outside the manufacturer's provided TSH reference intervals for the first-, second-, and third-trimester samples. Upon repeating this step again, more than two of 20 randomly selected samples fell outside the manufacturer's reference intervals again. Therefore, the manufacturer's recommended trimester-specific reference intervals for TSH were not found to be verifiable for the authors' site. So, trimester-specific 95th percentile reference intervals for TSH, fT4, fT3, and TT4 were determined by standardized methods (11), as outlined in Figure 1 (steps 4–7). These results are provided in Table 2, along 90% confidence intervals (CI) for these reference intervals. The reference intervals that were considered for verification are also provided in Table 1.
Reference Intervals Established by This Study and Manufacturer's and ATA Recommended Reference Intervals
ATA, American Thyroid Association; RI, reference interval; CI, confidence interval.
Trimester-specific results of TSH, fT4, fT3, and TT4
The trimester-specific results of TSH, fT4, fT3, and TT4 are shown in Figure 3. All tests differed significantly between trimesters, with the exception of fT3 in the first and second trimesters. Kruskal–Wallis tests showed significant differences in TSH, fT4, fT3, and TT4 between trimesters (p < 0.001). Post hoc analysis via Mann–Whitney U-tests comparing the first and second trimesters revealed significantly different concentrations of TSH (p = 0.001), fT4 (p < 0.001), and TT4 (p < 0.001) but not fT3 (p = 0.372). Post hoc analysis also showed that all thyroid laboratory values significantly differed between second and third trimesters (TSH, p = 0.017; fT4, p < 0.001; TT4, p < 0.001; fT3, p < 0.001). Notably, as expected, the TSH reference intervals increased and fT4 reference intervals decreased between the first, second, and third trimesters. The reference intervals for TT4 and fT3 between trimesters did not follow the same pattern as fT4, as shown in Figure 3C and D.

(
Relationships between TSH and fT4, and TT4
The correlation coefficient between TSH and fT4 in the first trimester was −0.348 (p = 0.01). After 19 weeks' gestation, the relationships between TSH and fT4 (r = −0.06, p = 0.442) and between TSH and TT4 (r = 0.079, p = 0.319) appeared similar without one measurement being superior to the other. After 19 weeks' gestation, there was good correlation between fT4 and TT4 (r = 0.77, p < 0.001).
Assessment of ATA pregnancy recommended reference interval for TT4
The TT4 reference interval for the second half of pregnancy (after 19 weeks' gestation) was 76–186 nmol/L. These limits are much lower than the reference intervals suggested for use by the ATA 2017 guidelines (i.e., 50% higher than the non-pregnant reference interval 107–207 nmol/L) (1) for the manufacturer's reported reference interval in non-pregnant women aged 20–39 years not on contraception (8). If the ATA recommended pregnancy TT4 reference interval had been applied in the study population, it would have resulted in classifying 25% of the population as having a low TT4, despite values for TSH and fT4 that fell within the trimester-specific reference interval (Appendix A2). The TT4 results throughout pregnancy are provided in Appendix A2, along with the ATA pregnancy recommended reference intervals shown as black lines (i.e., an increase in the non-pregnant TT4 reference interval by 5% per week beginning with week 7 to a maximum of 50% at and after 17 weeks' gestation; Appendix A2). For example, at 13 weeks' gestation (6 weeks beyond week 7), the reference interval for T4 is increased by 30% (six weeks × 5%/week) (12).
Discussion
The main findings of this study center around the demonstration that the manufacturer's recommended trimester-specific reference intervals for TSH on the Roche electrochemiluminescence immunoassay were not verifiable for the study population, nor was the ATA pregnancy recommended reference interval for TT4 in latter half of pregnancy (i.e., 50% higher than the non-pregnant reference interval). Although the Roche electrochemiluminescence immunoassays perform similarly across direct Roche instruments platforms according to external proficiency surveys, reference intervals provided by the manufacturer were not transferable to the study population. Therefore, each laboratory should verify the transferability of reference intervals to its own population or determine its own reference intervals if necessary. The simple methods described herein are a realistic way to determine reference intervals for each laboratory's own specific immunoassay and population by trimester.
Had the manufacturer's reference ranges been implemented in the study population, this would have resulted in 8% of all TSH sample results falling outside of the manufacturer's recommended trimester-specific reference intervals. Specifically, 8% of first-trimester samples would fall below the manufacturer's recommended reference interval, 6.5% of second-trimester samples (i.e., 4.3% below and 2.2% above the reference interval) would be misclassified, and 9.2% of third-trimester samples would be above the manufacturer's recommended reference interval. Potential explanations for the difference between the manufacturer's recommended reference intervals for the Roche Modular electrochemiluminescence immunoassays and the findings of the current study may be due to the study methodology, subject recruitment, and population used to establish the manufacturer's recommended reference intervals. Roche's values were developed from 957 samples of healthy pregnant women (<40 years old) from Germany with no known pregnancy complications and without a history of thyroid dysfunction or thyroid replacement (8). The manufacturer does not provide any information about country of origin or ethnicity of the women used to determine their reference intervals. Since ethnicity has an impact on thyroid test reference intervals in pregnancy (13), this may partially explain why the manufacturer's reference range for TSH were not confirmed in this study. Unlike the trimester-specific reference intervals for TSH provided by the manufacturer, but in keeping with others (1) and the study by Gong and Hoffman, which used a Roche Modular E170 (7), this study found that TSH progressively increased from the first trimester to the third trimester. The first-trimester TSH and fT4 reference intervals were similar to the ones found by Vaidya et al. from the United Kingdom on 1089 mostly Caucasian women with mild-moderate iodine insufficiency. The 95th percentile reference interval for fT4 decreased from the first trimester to the third trimester for the locally derived reference interval, similar to the results of Gong and Hoffman using the Roche Modular E170, but the reference intervals were slightly higher for the second and third trimesters. This might be explained by the mean time of sample collection in this study at 15.5 (3.0) and 31.6 (4.3) weeks' gestation for the second and third trimesters, respectively.
Locally established assay- and trimester-specific reference intervals for fT4 or TT4 throughout pregnancy are essential for the proper diagnosis and management of hyperthyroidism in pregnancy. This is because when antithyroid medications are titrated to normalization of TSH in hyperthyroid pregnant women, this can result in fetal hypothyroidism and fetal goiter (14). The ATA 2017 guidelines recommend that antithyroid drugs be titrated to the upper limit or moderately above the trimester-specific reference interval of fT4, if available, or 50% higher than the TT4 non-pregnancy reference interval in the second half of pregnancy (after 19 weeks' gestation) (1). This ATA recommended TT4 reference interval was higher than the TT4 reference interval found in the study population. The reference interval after 19 weeks' gestation was so much lower than the ATA 2017 recommended reference interval that it is feared that use of the ATA recommended TT4 reference interval in the second half of pregnancy at the study site could lead to failure to diagnose and manage hyperthyroidism appropriately in pregnancy and to overdiagnosis of hypothyroidism. The trimester-specific TT4 reference intervals were also much lower than the TT4 reference intervals developed in Iranian women with the use of a different assay platform (15). This was a surprising finding, since TT4 measurements have been regarded as much less subject to method-dependent inaccuracies that hamper fT4 immunoassay performance in pregnancy (16). Indeed, the use of differing TT4 methods and populations may contribute to these discordant ranges. Thus, further study of TT4 measurements throughout pregnancy on different immunoassays and further investigation of the different ethnic populations are merited to assess TT4 immunoassay performance between assay platforms.
The strengths of this study are that women with positive anti-TPO, a personal history of thyroid dysfunction, or thyroid medication use were excluded from the study prior to determining reference intervals. Determination of thyroid function reference intervals for all pregnancy trimesters is another strength of this study. The gestational age of the sample collection was well distributed throughout the third trimester, thus providing a trimester reference interval that is clinically useful throughout the third trimester. The methodology that is reported in order to verify and establish trimester-specific reference intervals is a simple and pragmatic way for other centers to establish population-, assay-, and trimester-specific thyroid function reference intervals. Using this approach, a relatively small number of samples can be used to estimate reference intervals that may be more appropriate than those provided by the assay manufacturer.
A limitation of this study was that we did not have laboratory confirmation that the samples were from iodine-replete women. However, iodine status is best defined at a population level, and such data indicate that Canadians aged 20–39 years are iodine replete (17). Furthermore, universal salt iodization has been mandatory in Canada since the 1920s, and a study of pregnant women from Calgary indicated that the median intake of iodine during pregnancy from non-dietary sources alone was >150 μg/day (18). Another limitation is that trimester-specific reference intervals were determined cross-sectionally rather than longitudinally. However, others have shown that there is no significant difference between these study designs of determining reference intervals for thyroid function in pregnant women (19). The mean maternal age of women whose samples were used for this study was less than a year older than the mean age of women with live births in Calgary during the same time period (31.8 ± 5.0 and 31.0 ± 5.0 years, respectively; S. Crawford, Alberta Perinatal Health Program pers. commun). It is acknowledged that age differences impact thyroid function laboratory results in pregnancy (20). However, this slight difference in mean age should not limit generalizability to the local general pregnant population. Finally, data about the country of birth are not available for the entire sample. However, among the 135 women for whom information is available, the distribution of women born in and outside of Canada is fairly representative of the distribution of Canadians and Calgarians born in and outside of Canada, as indicated by 2016 census data (9).
In summary, this study presents a simple method that can be used by others to verify and establish trimester-specific reference intervals in their populations for thyroid function tests. Trimester- and assay-specific reference intervals established from a multicultural location using the Roche Cobas 8000 Modular e602 immunoassay method are provided. The study also showed that the widely recommended reference interval for TT4 after 19 weeks' gestation of 50% above the non-pregnant reference interval was too high for use in pregnancy at the authors' center. Thus, careful evaluation of the TT4 ranges in pregnancy is required. Until such data are available, it may be more appropriate to use locally established fT4 reference intervals after 19 weeks' gestation, especially if there are no equivalent locally established reference intervals for TT4.
Footnotes
Acknowledgments
We are grateful to the Calgary Early Risk Assessment program and the All Our Families study (formerly All Our Babies [AOB/F]) for providing samples for analysis. The authors acknowledge the contribution and support of AOB/F participants and AOB/F team members. AOB/F is funded through Alberta Innovates Interdisciplinary Team Grant #3200700595. This study was supported by a seed grant from Calgary Laboratory Services. We thank Patricia Johnson, Lori Gervais, and Bernice Frandle from Calgary Laboratory Services for their support with sample analysis. Thank you to Dr. Kara Nerenberg for her very constructive feedback on this manuscript. We are indebted to Dr. Richard Krause. Without his guidance and support, this project would not have occurred, and we are saddened by his death.
Author Disclosure Statement
No competing financial interests exist for any of the authors.
