Abstract
Background:
A population-based reference interval (RI) of thyroid hormones in pregnancy using a standardized methodology is crucial for clinicians to make accurate diagnoses and important for the comparison of test results obtained from different analytic platforms.
Methods:
We enrolled 600 healthy Chinese women to obtain longitudinal serum samples across gestation, after exclusion of subjects with antibodies to thyroid peroxidase, thyroglobulin or thyrotropin receptor. Gestational age-specific RIs were constructed by using polynomial regression equations with MLwiN.
Results:
Free thyroxine (fT4) levels rose to a peak at the 7th–8th gestational weeks and then declined gradually till 28th week, while thyrotropin (TSH) level decreased from early pregnancy to a nadir at the 9th week. The data support the recent notion by the American Thyroid Association to raise the TSH upper RI to 4.0 mIU/L. We also demonstrate that thyroid hormone reference ranges are not affected in a mildly iodine-deficient population and by including women with the presence of antibodies against thyroid peroxidase and thyroglobulin who are otherwise healthy.
Conclusions:
The study highlights a methodology in constructing gestational age-specific thyroid function test RIs on different analytic platforms to provide a better interpretation and comparison of results obtained across different platforms.
Introduction
Overt maternal hyperthyroidism and hypothyroidism are associated with pregnancy complications, namely fetal growth restriction, preterm delivery, pre-eclampsia, and stillbirth. However, whether subclinical hypothyroidism or isolated hypothyroxinemia is linked to adverse pregnancy outcomes and/or impaired neurocognitive development of the unborn child remains uncertain (1 –3). Two large randomized controlled trials consistently showed that routine screening of maternal thyroid function and thyroxine replacement to those with subclinical hypothyroidism or hypothyroxinemia did not result in better cognitive outcomes of their children (4 –6). Nevertheless, both studies were limited by late initiation of thyroxine replacement at the gestation age at which the fetal thyroid starts functioning. It remains uncertain whether detection of the two conditions at an earlier gestation with immediate replacement would improve outcomes. For this reason, many clinicians are more inclined to treat subclinical hypothyroidism once recognized in pregnancy.
The levels of thyroid hormone vary with gestational age due to physiological changes, including human choriogonadotropin (hCG), binding protein and increase in the circulatory volume with gestation. Hence, guidelines and expert committees recommend that institutions have their own platform-specific and trimester-specific reference intervals (RI) for the interpretation of thyroid function for pregnant women (7,8). Further, thyroid hormone reference values can be influenced by other factors, including ethnicity, geographic location, and diet (9 –12). It is, therefore, important for laboratories to establish their own RI based on the populations they serve.
In real practice, most laboratories adopt the RI established and provided by the commercial companies that market the platform, of which the methodology, number and character of subjects are sometimes unspecified, or alternatively according to the latest American Thyroid Association (ATA) guidelines in 2017 by using 4.0 mIU/L as the upper RI of thyrotropin (TSH) if population trimester-specific RIs are not available (8). A recent study indicated that 10% of normal pregnant women would be diagnosed with subclinical hypothyroidism at 4–7 weeks of gestation according to a company-provided RI (13). It is also important to note that thyroid hormone levels on the same sample differ remarkably across the different platforms and can potentially result in misclassification and, hence under- or overtreatment (14,15).
The first gestational age-specific thyroid hormone RI in a Chinese population was reported in 2001, but only 300 subjects were included (16). Several studies thereafter described thyroid hormone levels in pregnancy, but none seems to have provided the details on the RI of different platforms across gestations (17 –22). This study aims at deriving more comprehensive, gestational age-specific thyroid function test (TFT) RIs for Chinese populations on four commonly used assay platforms.
Methods
Pregnant women coming for the antenatal visit in their first trimester of pregnancy were recruited and consented to participate into the study by research staff in the antenatal clinic of Prince of Wales Hospital—a tertiary teaching hospital in Hong Kong. Women who carried a singleton pregnancy without any history of thyroid dysfunction, hyperemesis gravidarum, autoimmune disease, or any other major medical condition were eligible.
All participants underwent an early ultrasound scan to confirm their gestational age. In cases where the gestational age determined by the fetal crown rump length was in discrepancy with that calculated by the last menstrual period, the former was adopted as the gestational age. All participants were asked to collect a morning urine sample into an acid-washed trace element urine bottle within one week of recruitment for determination of urinary iodine concentration (UIC) and urinary iodine to creatinine ratio (UICr) to determine the iodine status of the study population. UIC was determined in all participants by inductively coupled plasma mass spectrometry (ICPMS 7700; Agilent Technologies, Inc.), and the measurement was standardized by using the reference materials from Center for Disease Control. Subjects with UICr below 150 μg/g at the initial screening were counseled on dietary iodine intake and prescribed 150 μg per day of iodine supplement throughout pregnancy. Each participant was invited to have two more blood draws during subsequent antenatal visits, at least four weeks apart, and one final blood draw on admission for delivery.
The subjects' medical records were reviewed by a research nurse to obtain information on pregnancy complications such as pre-eclampsia, preterm delivery, stillbirth, neonatal death, and neonatal thyroid problem. Subjects who eventually delivered elsewhere were contacted and followed by the research staff to obtain the relevant medical information. All participants were invited to return for thyroid hormone determination between six-weeks and six-months postpartum to confirm their euthyroid status outside pregnancy.
The study was approved by the Joint Chinese University of Hong Kong—New Territories East Cluster Clinical Research Ethics Committee (CRE-2013.500), and written informed consent was obtained from all women.
Assay methods
All serum samples were either processed immediately or stored at −80°C until analysis, and they were processed within 18 months. Thyroid hormones (TSH, free thyroxine [fT4] and free triiodothyronine [fT3]) on each individual serum sample during pregnancy were assayed by using four analytical platforms, namely, Abbott Architect i2000SR (Abbott Laboratories), Beckman Coulter DxI800 (Beckman Coulter), Roche Cobas Elecsys 601, and Siemens Advia Centuar XPT (Siemens Healthcare Diagnostics, Tarrytown, NY). The postpartum TSH and fT4 levels were assayed by Roche Cobas Elecsys 601 only. Antibodies to the thyrotropin receptor (TSHR-Ab) were assayed by using Roche Cobas Elecsys 601, while antibodies to thyroid peroxidase (TPO-Ab) and thyroglobulin (TG-Ab) levels were determined by using chemiluminescent immunoassays on Siemens Immulite 2000 XPi (Siemens Healthcare Diagnostics) on the participants' first serum sample. The functional sensitivity for the assay, the intra- and inter-assay coefficients of variation (CV) of serum TSH, fT4, fT3, TPO-Ab, TG-Ab, TSHR-Ab, and UIC are listed in Supplementary Table S1 and Table S2. All intra- and inter-assay CV were within 6 and 8%, respectively.
Reference intervals
Thyroid hormone concentrations from subjects, who were TPO-Ab, TG-Ab, and TSHR-Ab negative, defined as <35 kIU/L, <40 kIU/L, and <1.22 IU/L according to the manufacturers' expected values stated in their respective reagent inserts, were included for the construction of nomograms. As all the thyroid hormone levels (TSH, fT4, and fT3) were skewed, the data were transformed by using Box-Cox transformations before they were modeled on gestational age as a polynomial regression equation. To account for repeated samplings from the same individual, we carried out a multilevel modeling (also named linear mixed-effect model) by using MLwiN Version 2.36 (Downloaded from the Centre for Multilevel Modelling, University of Bristol, Bristol, United Kingdom) (23). The nomograms were constructed by back transforming from the median and variance of the polynomial regression equation. The 2.5th and 97.5th percentiles of the thyroid hormone levels defined the lower and upper RI of individual analytic platform. We used Bland–Altman plot for multiple observations per individual to determine the agreement of thyroid hormone levels assayed by different platforms (24).
We also compared the rates of clinical, subclinical hypothyroidism and isolated hypothyroxinemia classified according to trimester-specific RI (for the first trimester) provided by the company (first trimester RI provided by individual company: TSH = 0.16–3.78 mIU/L [Abbott], 0.05–3.70 [Beckman], 0.35–4.59 mIU/L [Roche]; fT4 = 10.9–17.7 pmol/L [Abbott], 6.67–14.1 pmol/L [Beckman], 12.1–19.6 pmol/L [Roche]; Siemens: not provided) with that derived from this study, by using Fisher's exact test.
The Box-Cox transformations and Bland–Altman plot were performed by using the MedCalc Statistical Software version 16.4.3 (MedCalc Software bvba, Ostend, Belgium; 2016). The nomograms were constructed by using GraphPad Prism version 8.00 for Windows (GraphPad Software, La Jolla, CA).
Sample size and power calculation
International Federation of Clinical Chemistry and Laboratory Medicine recommend that RIs should be established with at least 120 reference individuals when nonparametric ranking method is used (25). We divided the entire gestation into 8 individual blocks of gestational age intervals and targeted 250 samples for each block to construct a nomogram; we estimated that 600 subjects would provide 2000 samples, assuming each participant could contribute 4 serum samples, anticipating a 15% attrition rate.
Results
We enrolled a total of 600 women from the antenatal clinic between July 2014 and January 2016; 76 were excluded for the reasons summarized in Figure 1, most commonly due to presence of thyroid antibodies. The mean age of the included subjects was 31.1 ± 3.9 years; 64.3% were primigravidae. The mean gestational age at delivery and birth weight was 39.2 ± 1.4 week and 3.1 ± 0.4 kg, respectively. A total of 1760 serum samples were available for the measurement of hormone levels by all four platforms. The median [95% CI] of postnatal TSH and fT4 levels were 1.28 (0.35–3.7) mIU/L and 14.6 (10.8–17.8) pmol/L.

Flow chart illustrating the total participants, number of samples, reasons for exclusion, and adverse pregnancy outcomes.
The gestational age-specific nomograms depicting the 2.5th (lower RI) and 97.5th (upper RI) percentiles of TSH, fT4, and fT3 from 5th to 40th gestational weeks of the four platforms are shown in Figure 2. The TSH levels decreased to a nadir at the ninth gestational week, while fT4 levels rose to a peak between the seventh and eighth gestational weeks. The peak levels of fT3 vary among platforms from 6th to 12th weeks. Both fT4 and fT3 declined gradually from the peak and flattened off by 28 weeks. The 95% CI (RI) of TSH was narrowest at the 9th (1.92–2.84 mIU/L) and widest at 40th (3.25–4.93 mIU/L) gestational weeks. The change in 95% CI of fT4 was less remarkable, with the widest at the 7th–8th (7.26–12.42 pmol/L) and narrowest at the 31st–33rd (4.26–7.91 pmol/L) gestational week, while that of fT3 was quite constant across the gestations. Likewise, the upper RIs of TSH were lowest at 9th (1.95–2.87 mIU/L) and highest at the 40th (3.59–5.48 mIU/L) gestational week. There were 90% mean differences in the upper RIs of TSH between the 9th and 40th weeks of gestation across all four platforms. In the sensitivity analysis of which serum samples from participants with positive TG-Ab and TPO-Ab were included into the nomograms, we merely found small discernable rises in upper RIs of TSH (by 0.01–0.15 mIU/L; 2% in average) and indistinguishable falls in the upper RIs of fT4 (by 0.18–0.25 pmol/L; 0.33% in average) and fT3 (by 0.02–0.05 pmol/L; 0.14% in average) (Fig. 2). The gestational age-specific RIs (i.e., 2.5th and 97.5th percentile) of the thyroid hormones based on subjects who were negative for TG-Ab and TPO-Ab are tabulated in Table 1.

Gestational age-specific nomograms of (
Reference Intervals of Thyroid Hormone Levels for All Four Platforms
Roche: Electrochemiluminescence immunoassay on Cobas Elecsys 601 (Roche Diagnostics, Mannheim, Germany); Abbott: Chemiluminescent Microparticle Immunoassay (CMIA) on Architect i2000SR (Abbott Laboratories); Beckman: Chemiluminescent immunoassay on Beckman Coulter DXI 800 (Beckman Coulter); Siemens: Direct Chemiluminescence technology on Advia Centaur XPT (Siemens Health care Diagnostics).
fT3, free triiodothyronine; fT4, free thyroxine; TSH, thyrotropin.
Table 2 and Supplementary Figures S1 to S3 show the mean differences, limits of agreement and their 95% CI of the Bland–Altman plot. There is good agreement between the Beckman and Siemens platforms for the TSH levels, without significant fixed and proportional biases (Supplementary Fig. S1); however, the agreements among different platforms in T4 and T3 are poor.
Mean Differences, Limits of Agreement and Their 95% Confidence Intervals in the Bland–Altman Plot of Thyroid Hormone Levels Between Different Analytic Platforms
Roche: Electrochemiluminescence immunoassay on Cobas Elecsys 601 (Roche Diagnostics, Mannheim, Germany); Abbott: Chemiluminescent Microparticle Immunoassay (CMIA) on Architect i2000SR (Abbott Laboratories); Beckman: Chemiluminescent immunoassay on Beckman Coulter DXI 800 (Beckman Coulter); Siemens: Direct Chemiluminescence technology on Advia Centaur XPT (Siemens Health care Diagnostics).
*95% CI of mean differences cross 0.
CI, confidence interval; LOA, limit of agreement.
The rates of clinical hypothyroidism (TSH level above the upper RI and fT4 below the lower RI), subclinical hypothyroidism (TSH level above the upper RI but fT4 within normal), and isolated hypothyroxinemia (TSH level within the RI but fT4 below the lower RI) diagnosed in the first trimester based on the RI from this study, trimester-specific RI provided by the commercial companies, and the rates of TSH levels above the universal cut-off of 4.0 mIU/L (by the latest 2017 ATA guideline) are shown in Table 3. In this analysis, only the first samples were considered for the calculation if a subject had provided more than one sample in the first trimester. We observed that 2.77–3.37% of the subjects would be diagnosed with subclinical hypothyroidism based on the upper RI of TSH by using the 97.5th percentile, while 0.2–2.97% of them would be diagnosed with isolated hypothyroxinemia. Subjects who had no previous history of thyroid diseases but with positive thyroid antibodies had a higher rate of clinical and subclinical hypothyroidism based on the RI from this study.
Number (Percentage) of Subjects Classified as Clinical Hypothyroidism, Subclinical Hypothyroidism, and Hypothyroxinemia on the Serum Levels Taken at the First Trimester (<13 Weeks of Gestation) Based on Different Criteria
ATA, American Thyroid Association; TG, thyroglobulin; TPO, thyroid peroxidase.
Finally, the UIC and UICr of the study population was previously reported showing that our population is mildly iodine deficient according to the WHO definition (26). Nevertheless, the multiple of the median, calculated by dividing the assay thyroid hormone level by the gestational age-specific median levels from the nomogram, of the first serum samples thyroid hormone levels between subjects who were classified as deficient (UICr <150 μg/g) and those who were sufficient (UICr ≥150 μg/g) were not significantly different by the Mann–Whitney U tests, except slightly higher levels of fT4 (by 2%) and fT3 (by 3%) among those who were iodine deficient with the Abbott Architect i2000SR assay (Supplementary Table S3).
Discussion
Most published literature reported and guidelines proposed the use of trimester-specific instead of truly gestational age-specific RI for thyroid hormone levels in pregnancy. This study shows a >90% difference in the upper RI for TSH, between the 9th and 40th week of gestation, which is consistent with the trajectory of TSH levels derived from a large study in Caucasians (27,28). We also demonstrate that TSH levels are lowest at the ninth week, while both fT4 and fT3 levels peaked at the sixth to eighth week (except fT3 with the Abbott assay), in line with the expected maximal stimulation by the serum hCG, which also peaks at this time during gestation (29). This finding is consistent with the previous observation that fT4 declines steadily from the eighth week toward the end of the second trimester (16). We consider the present mathematical model superior to the use of cubic spline, which apparently results in undulating RIs across gestation (28). We, therefore, propose that future guidelines should consider the use of nomograms constructed by polynomial equations for the RIs in pregnancy. If this is not available, it would also be better if one can consider blocks of gestational age windows such as <6, 6–10, 10–14, 14–18, 18–24, 24–32, and >32 weeks, instead of simply based on trimesters. The former is more in line with the physiological changes of hormone levels across gestation. Contrary to a previous study suggesting that fT4 and fT3 are normally distributed, while only TSH was right skewed (30), our data demonstrate that both fT4 and fT3 are also right skewed.
Another strength in this study compared with previous publications is the inclusion of participants as early as the 4th week of gestation. This enabled adequate data samplings between 4 and 13 weeks to explore the excursion of thyroid hormones related to the hCG effect. We observed that the TSH levels fell from 3.5 to 5 mIU/L at early gestation in different platforms to their trough at 8–10 weeks of gestation, consistent with that reported in the literature (22,31). The result helps to explain previous criticism of the remarkable difference between 2.5 and 4.5 mIU/L for the upper percentile limit in the first-trimester TSH between two studies involving similarly healthy Chinese subjects (16,32,33). The observed discrepancy is most likely attributed to the different gestational age of sampling in the first trimester, as one of the studies included only very few samples between four and seven weeks of gestation, which may result in underestimating the upper RI in the first trimester (16). In addition to the comparison of four different assay platforms, the other strengths of this study include the homogeneous ethnicity and longitudinal data from individual participants. Nevertheless, it remains arguable whether a longitudinal or a self-sequential approach has significant advantages over cross-sectional data (19,34).
This study also assessed the participants' iodine status. Similar to the result of the Norwegian Mother and Child Cohort study, we did not show any significant change in the maternal thyroid hormone with mild iodine deficiency (35). Hence, pregnancy RIs can also be determined based on data from mildly iodine-deficient populations such as that from Hong Kong. Although the inclusion of women with TG-Ab and TPO-Ab, who were otherwise healthy, only slightly altered the thyroid hormone RIs, this subgroup of women with positive TG-Ab and TPO-Ab was found to have a higher rate of clinical and subclinical hypothyroidism. This is in agreement with data from the recent literature showing that the presence of TPO-Ab increases the risk of subclinical hypothyroidism (36).
Nevertheless, the study does also have some limitations. The major drawback is the lack of a gold standard measurement such as LCMS/MS or measurements after equilibrium dialysis. The thyroid hormone measurements determined on a certain platform showed poor agreement with others, in particular the fT3 assay from Abbott in very early gestation. It would be useful if thyroid hormone levels assayed with LCMS/MS can be used as a reference for comparison. However, these data are not available due to limited funding. Moreover, the data were all from singleton pregnancies, so we cannot translate the RIs to women with multiple gestations. Previous studies have shown that women with twin pregnancies have slightly lower TSH levels, especially in the first trimester, probably related to higher hCG levels, but not necessarily associated with higher fT4 levels (26,37). We have not stratified subjects for smoking status but a previous study showed only limited effects of smoking on mean TSH and fT4 during pregnancy (38).
In conclusion, the study illustrates a methodology to construct nomograms of thyroid hormone level RIs based on gestational age and platform-specific characteristics. Construction and implementation of such RIs for other ethnic groups should allow more appropriate classification, study, and management of women with true, rather than physiological or technical abnormalities of thyroid function in pregnancy.
Footnotes
Acknowledgments
The authors would like to thank Prof. Carrie Macdonald-Wallis, The University of Bristol, and Prof. Tong Sit, The Chinese University of Hong Kong, for statistical advice. They are also grateful for the assistance of their research nurse, Ms. Sharon Lai-kwai Chan, on subject recruitment and data collection.
Role of Sponsor
The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of article.
Authors' Contributions
All authors confirmed that they have contributed to the intellectual content of this article and have met the following three requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study is supported by the Hong Kong Hospital Authority as a project to provide reference intervals for all immunoassay platforms available in the obstetric units in the Hospital Authority and Department of Health, Hong Kong SAR Government.
Supplementary Material
Supplementary Figure S1
Supplementary Figure S2
Supplementary Figure S3
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
