Abstract
Background:
Initial evaluation of the hypothalamus–pituitary–thyroid axis is done by measuring serum free thyroxine (fT4) and thyrotropin concentrations. For correct interpretation of these measurements, reliable age-specific reference intervals (RIs) are fundamental. Since neonatal fT4 RIs conforming to the Clinical and Laboratory Standards Institute guidelines are not available for all assays, we set out to create literature-based uniform age-specific neonatal fT4 RIs that may be used for every assay.
Methods:
For meta-analysis of individual participant fT4 concentrations, we systematically searched MEDLINE and Embase (search date December 6, 2023; PROSPERO registration CRD42016041871). We searched for studies reporting fT4 concentrations in healthy term newborns aged 2–27 days, born to mothers without thyroid disease in iodine-sufficient regions. Authors were invited to supply data. Due to standardization differences between assays, data could not be combined for meta-analysis directly, and we attempted to normalize the data using two distinct methods.
Results:
We obtained 4206 fT4 concentrations from 20 studies that used 13 different assays from 6 manufacturers. First, we set out to normalize fT4 data using the mean and standard deviation of (assay-specific) adult RIs. fT4 concentrations were transformed into Z-scores, assuming a normal distribution. Using a linear mixed-effects model (LMM), we still found a significant difference between fT4 concentration across studies (p < 0.001), after this normalization. As a second approach, we normalized the fT4 concentrations using data from a method/assay comparison study. We used the relationship between the Cobas assay and the other assays as a reference point to convert all values to Cobas values. However, this method also failed to produce consistent results, with significant differences between the normalized data (LMM p < 0.001).
Conclusions:
We conclude that our attempts at normalizing fT4 assay results were unsuccessful. Confounders related to our unsuccessful analysis may be assay related and/or biological. These findings have significant implications for patient care, since relying on RIs from literature may result in erroneous interpretation of results. Therefore, we strongly recommend to establish local RIs for accurate interpretation of serum fT4 concentrations in neonates.
Introduction
Laboratory testing for disorders of the hypothalamus–pituitary–thyroid (HPT) axis in children consists of serum free thyroxine (fT4) and thyrotropin (TSH) measurements. 1 Reliable reference intervals (RIs) are fundamental for correct interpretation. However, creating reliable fT4 RIs for the neonatal period remains a major challenge.
Within the first 30 minutes after birth, newborns experience a TSH surge, secondary to cold-induced thyrotropin-releasing hormone secretion. Due to the TSH surge, thyroid hormone production is briefly increased. fT4 concentrations peak around the third day of life, and slowly decline thereafter, until reaching its set-point. 2 This pattern is associated with a considerable day-to-day change in fT4 concentrations in the first 4 weeks of life, stressing the importance of neonatal continuous RIs. 3,4
A number of studies, employing different assays, have reported fT4 RIs for (parts of) the neonatal period. However, neonatal fT4 RIs conforming to the Clinical and Laboratory Standards Institute guidelines 5 are not available for every commercially available assay, which is necessary as there are standardization differences between these assays. 6
Therefore, we aspired to construct continuous fT4 RIs for the 3rd to the 28th day of life, suitable for frequently used assays. To attain this goal, this study aimed to make use of previously published data on fT4 concentrations in healthy term neonates, born to mothers without thyroid disease in iodine-sufficient regions, in an independent participant data meta-analysis. We hypothesized that it would be possible to normalize data from different fT4 assays, thereby making the results comparable.
We considered two potential normalization methods for fT4 concentrations: (1) normalization based on RI comparison, which assumed a normal distribution of fT4 concentrations, and involved normalizing individual fT4 values by comparing them against the adult RIs, and (2) normalization using a method comparison study (i.e., transference), which assumed fT4 concentrations across assays are linearly correlated, and involved using data from a method comparison study, including Passing–Bablok analyses that provide a slope and intercept for converting one assay's value to another assay.
Unfortunately, the normalization efforts were unsuccessful. However, our findings highlighted important caveats in the interpretation of neonatal fT4 concentrations, and an imperative need for local and laboratory-specific neonatal fT4 RIs.
Materials and Methods
Protocol and registration
This study was conducted and reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analysis statement for individual patient data (PRISMA-IPD) systematic reviews. 7,8 We prospectively registered our systematic review in PROSPERO (CRD42016041871).
Information sources and literature search
To identify all studies of serum fT4 concentrations in healthy term neonates, born to mothers without thyroid disease in iodine-sufficient regions, a comprehensive literature search was performed in MEDLINE and Embase (Supplementary Data S1; search date December 6, 2023). No language restrictions were applied. We crosschecked the reference lists and citing articles of identified relevant articles using Web of Science and adapted the search in case of additional relevant hits. The bibliographic records retrieved were imported and deduplicated in EndNote 20.
Eligibility criteria and study selection
Inclusion criteria were defined before the start of the literature search. All observational studies (including cohort, cross-sectional, and case–control studies) published since 1990—as analytical information about assays before that time is not available—that included assessment of serum fT4 of healthy term neonates, born to mothers without thyroid disease, aged >48 hours (third day of life; age 2 days) to the 28th day of life (age 27 days), were selected. Studies with wider populations (e.g., aside from healthy newborns also diseased newborns) were also selected.
Studies were excluded if their full text was not available. Also, we excluded studies that involved a possible iodine-deficient population. Iodine status of study populations was assessed according to the World Health Organization Global Database of Iodine Deficiency, 9 and the Iodine Global Network Global Iodine Nutrition Scorecard. 10 Studies were also excluded if performed before iodine status assessments in both databases or if appropriate regional or national data were unavailable in the literature. Studies reporting measurements in dried blood spots (DBSs) were excluded because of the general use of serum measurements for diagnosing HPT axis disorders, and the significant differences in analysis methodology and outcome between serum measurements and DBSs.
Titles and abstracts of identified citations were screened by two reviewers. Full-text versions of selected citations were retrieved and assessed for eligibility by two reviewers independently, resolving differences by consensus.
Assessment of risk of bias
Risk of bias of selected studies was assessed by two reviewers using an adjusted Newcastle Ottawa Scale adapted for cross-sectional studies (Supplementary Data S2). 11 No changes were made to the main domains of selection, comparability, and outcome, but we adapted the selection and outcome domains to enable more appropriate evaluation of methods of descriptive studies. We only extracted data of healthy subjects from cohort and case–control studies, thus we evaluated all studies with this cross-sectional study evaluation form. Studies were considered at low risk of bias if they achieved full rating in at least two domains. Studies that achieved no stars in one of the domains were considered at high risk of bias and were excluded.
Data collection
Corresponding authors of included studies were approached by e-mail and invited to provide anonymized individual patient data (age at sampling and fT4 concentrations) of healthy newborns. fT4 concentrations reported in conventional units were converted to SI units (ng/dL to pmol/L with a correction factor of 12.87). Adult RIs of fT4 assays were supplied by corresponding authors or collected from manufacturer kit inserts. If corresponding authors could not be reached, other listed authors were contacted. If data were unobtainable, the study was excluded from the meta-analysis.
Data normalization
To normalize the obtained fT4 data, two distinct methods were used. Method 1: Individual participant fT4 concentrations were transformed to Z-scores using the mean and standard deviation (SD) of author-supplied adult RIs (Supplementary Table S1). This transformation hinged on the assumption that fT4 RIs reflect a normal distribution.
3
As an example, with an adult fT4 RI of 12–22 pmol/L (which represents the 95% interval), an fT4 concentration of 19.55 pmol/L was transformed to a Z-score of +1. SD of RIs was calculated with the Z-critical value (
For the other assays, RIs are reported with a 95% interval, rendering the Z-critical score 1.96. Method 2: Individual participant fT4 concentrations were transformed using data from an fT4 method comparison study (30 serum samples from healthy adults measured with 5 different assays in 1 laboratory). 13 Using the slope and intercept of the method comparisons, we converted every concentration toward one assay (we chose randomly for the Cobas assay) (Supplementary Data S3). So, for example, taking an fT4 concentration of 12 pmol/L measured with the Alinity assay (Abbott Laboratories, Chicago, IL, USA), conversion to the Cobas assay would lead to an fT4 concentration of (12 × 1.886) − 7.475 = 15.2 pmol/L.
Statistical analysis
Differences between fT4 concentrations across studies were investigated with a linear mixed-effects model (LMM), with normalized fT4 as the dependent variable, study as the independent variable, and age (day of life) as a random effect. Statistical analyses and data visualization were performed in R (version 4.3.1) with bioconductor (version 3.18) packages.
Ethics statement
In this study, we included individual participant data obtained from other studies. We confirm that all original studies encompassed in our meta-analysis obtained the necessary approval from institutional ethics boards.
Results
We first set out to identify all studies of serum fT4 concentrations in healthy term neonates, born to mothers without thyroid disease in iodine-sufficient regions, through MEDLINE and Embase, and requested anonymized data. The full texts of 76 studies were assessed after screening 1790 studies. Reason for exclusion of studies, risk of bias assessment results, and PRISMA flowchart are given in Supplementary Data S4. Authors of 30 studies were approached by e-mail and invited to provide anonymized individual patient data. In total, we obtained 4206 fT4 concentrations from 20 studies, which used 13 different assays from 6 manufacturers (Supplementary Table S1). 3,4,14 –31
We could not obtain fT4 concentrations from 11 studies. In total, 1,675 fT4 concentrations from 8 studies were measured in the first 4 weeks of life, while for 3 studies we could not ascertain the exact number of fT4 concentrations measured in the first 4 weeks of life (Supplementary Data S4). Excluding these studies, we obtained ∼70% of the fT4 concentration data.
Since the obtained data sets were generated using assays with differences in standardization (Supplementary Table S1), data could not directly be pooled for meta-analysis (Figs. 1, 2A). Considerable differences between study results were easily distinguishable before normalization. For example, from days 13 to 24, data from 2 studies (Omuse et al., 23 which involved the Roche Cobas e601 immunoassay, and Wong et al., 31 which involved the Beckman Coulter Dxl 800 immunoassay) show only partly overlapping intervals.

Collected free thyroxine concentrations. All collected data, colored according to study.

Collected fT4 concentrations and normalization results.
As expected, a significant difference between mean fT4 concentration across studies was found with an LMM, with fT4 as dependent variable, study as the independent variable, and age (day of life) as a random effect (p < 0.001). We, therefore, attempted to normalize data with two different methods.
First, assuming that fT4 is normally distributed, all individual fT4 concentrations were transformed to Z-scores using the mean and SD of author-supplied adult RIs (Supplementary Table S1). Despite normalization, we still observed substantial nonoverlapping intervals (Fig. 2B), and a significant difference between fT4 concentration across studies (LMM p < 0.001).
Second, we tried to normalize fT4 concentrations using data from an fT4 method comparison study (Supplementary Data S3). 13 Again, the results revealed nonoverlapping intervals, signifying that the normalization approach was unsuccessful in achieving alignment (Fig. 2C). In addition, there was a significant difference in mean fT4 concentration across studies (LMM p < 0.001).
To summarize, both normalization methods exhibited both visual and statistical ineffectiveness in achieving comparable data across different assays.
Discussion
fT4 concentrations follow a distinctive course during the first month of life. A number of studies have reported discrete fT4 RIs for the neonatal period, using different assays. However, these RIs neglect the dynamic character of the HPT axis in the neonatal period. Accurate continuous RIs are, therefore, needed, which can preferably be employed irrespective of the assay platform, as there are standardization differences between the different assays. 6 To accomplish this, we performed a systematic review and meta-analysis of previously published fT4 concentrations in healthy term neonates, in an attempt to construct continuous fT4 RIs for the neonatal period, suitable for frequently used assays.
In total, 4206 fT4 concentrations of 20 studies were included. We hypothesized that data could be normalized, to overcome the standardization differences. Unfortunately, our normalization attempts resulted in substantial nonoverlapping intervals and a significant difference in mean fT4 concentrations across studies. Based on these results, we conclude that our attempts at normalizing fT4 assay results were unsuccessful.
The methods comparisons performed by Jansen et al. (Supplementary Data S3) show good correlations that suggests that transference is possible. 5,13 Therefore, it seems that possible confounders related to our unsuccessful analysis may be methodological (e.g., kit variation over the years or nonadequate adult RIs in kit inserts) or biological, such as ethnicity. 32
This holds an important implication for patient care. Although previously published fT4 RIs seem useful in clinical/diagnostic practice, using external neonatal RIs may lead to false interpretation, considering the aforementioned potential confounders. Local/laboratory-specific and age-specific RIs are, therefore, needed. However, collection of healthy neonatal samples requires extensive medical ethical considerations and is laborious. Using indirect methods to calculate RIs, with big data from routine diagnostics, may be an alternative. 33
Incorrect interpretation of neonatal fT4 concentrations is particularly a problem in the context of central congenital hypothyroidism diagnostics. Central congenital hypothyroidism, characterized by dysfunction of the HPT axis at hypothalamic and/or pituitary level, often eludes detection through traditional congenital hypothyroidism markers such as TSH. 34,35 Given the pivotal role of fT4 in the detection of central congenital hypothyroidism, misinterpreting its levels can lead to delayed or overlooked diagnosis of this condition, which may cause long-term developmental damage.
In summary, this study underlines the imperative need for laboratory and age-specific fT4 Ris to avoid incorrect interpretation, and particularly to improve diagnostics in the context of central congenital hypothyroidism.
Footnotes
Acknowledgments
The authors sincerely thank the authors who generously shared their raw data: Akin et al. (2013), Bailey et al. (2013), Bohn et al. (2019), Bokulic et al. (2021), Fideleff et al. (2010), Higgins et al. (2018), Hubner et al. (2002), Karbasy et al. (2016), Kapelari et al. (2008), Lem et al. (2012), Naafs et al. (2020), Omuse et al. (2014), Omuse et al. (2023), Ozdemir et al. (2013), Romero-Villarreal et al. (2014), Santini et al. (1999), Turan et al. (2007), Ucler et al. (2016), Verburg et al. (2011), and Wong et al. (2020).
Authors' Contributions
P.L. contributed to acquisition of data, literature search, interpretation of data, statistical analyses, conceptualization, and writing of first draft. C.A.H. was involved in acquisition of data and literature search. A.W.M.G. carried out acquisition of data and literature search. A.M. carried out literature search. P.H., A.C.H., N.Z.-S., and A.B. took charge of interpretation of data and conceptualization. A.S.P.v.T. was in charge of project coordination, writing of first draft, interpretation of data, and conceptualization. All authors critically reviewed article drafts and approved the final article as submitted.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Data S1
Supplementary Data S2
Supplementary Data S3
Supplementary Data S4
Supplementary Table S1
