Clinical Interpretation of Thyroid Tests: Considerations for Reference Intervals

Abstract

As thyroid disease is common and can present with nonspecific symptoms, thyroid function tests are one of the most widely requested laboratory investigations. The serum thyrotropin (TSH) concentration is an effective rule-out test for primary thyroid disease when used in the context of appropriate reference intervals. The dependence of the TSH reference intervals on iodine status, ethnicity, age, and pregnancy is well described and the use of relevant reference intervals is widely recommended.^1

–4 However, the provision of appropriate reference intervals can be challenging due to the complex and costly clinical, ethical, operational, and financial requirements associated with establishing prospective reference cohorts as well as the methodological differences that still plague thyroid function tests.

In this issue, Yamada et al. describe a retrospective approach for deriving reference intervals for TSH, free thyroxine (FT4), and free triiodothyronine (FT3) using data collected from routine health screening.⁵ This retrospective method for the determination of reference intervals is known as “indirect” or “data mining.”⁶ The study is helpful for illustrating important considerations for deriving and applying reference intervals.

As the reference intervals generated by indirect methods are highly sensitive to the reference population used for the study, it is crucial to have a good understanding of the physiology and pathophysiology of the biomarker in question. This will inform the design of clinical and statistical exclusion criteria to minimize the inclusion of data from patients with potentially confounding conditions that will invariably be present in a clinical laboratory database.⁶ It also informs the need for data partitioning, for example, by age, sex, or ethnicity. Of note, it is generally recommended that at least 400 subjects are included for each partition when using this method.⁶

Clinical exclusion criteria typically include the selection of clinical locations or services where the proportion of abnormal results is relatively low, for example, health screening or outpatient settings.⁶ For thyroid function tests, information such as clinical features of thyroid disease, a history of thyroid conditions or treatment, pregnancy status, and other associated laboratory tests, where available, can also be used to remove subjects with established pathophysiology as well as other clinical conditions or pharmacological treatments that may affect thyroid function.

Thyroid autoantibody (thyroid peroxidase antibodies) status is also often used as an exclusion criterion as the inclusion of patients with positive thyroid antibodies has been shown to widen derived reference intervals^7
–9 As previously mentioned, iodine status, which affects thyroid autoimmunity, will also affect thyroid function test results,^10,11 but this is rarely evaluated in routine practice.

After this initial clinical exclusion, appropriate statistical techniques are required to extract the nondiseased distribution or to exclude the pathological distribution before reference intervals can be calculated from the remaining data. When appropriately applied, these statistical methods can effectively remove the contribution from affected subjects.^6,12,13 If affected subjects are not removed from the reference cohort, generated reference intervals will be artefactually widened. Analytical stability during the study period and a lack of significant interlaboratory method bias should be demonstrated before the data can be pooled for statistical analysis.

The results from the Yamada study need to be interpreted based on the inclusion criteria of the study and the size of the subpopulations analyzed as this may temper the conclusions that can be derived from this type of study. The upper reference intervals for TSH are particularly high in the oldest age group (80–89 years) and are several fold higher than the manufacturers recommendations (with some reaching as high as >20 mIU/L). This finding would need to be replicated in a larger cohort with clearly defined exclusion criteria before use in a clinical setting.

TSH generally increases with age¹⁴ and may be contributed by changes in TSH bioactivity through molecular glycosylation or subclinical hypothyroidism. Ongoing clinical trials to replace thyroxine in the geriatric population to age-appropriate reference intervals may clarify the clinical significance of elevated TSH in this population.¹⁵

This study also highlights the methodological difference that persists between different commercial providers of thyroid function tests. Notably the Abbott median TSH concentration appears negatively biased compared with the Siemens method, which, as stated by the authors, will have a modest impact on the diagnosis of thyroid disease if method-specific intervals are not employed. This observation is unsurprising as it included periods preceding global harmonization initiatives that may be reduced in more recent data.¹⁶

It is, however, important to recognize that the diagnosis of thyroid disease cannot be made using reference limits alone and that a 95% reference interval may not reflect the optimum clinical decision limits.¹⁷ TSH has a relatively narrow within-subject biological variation of 18% compared with a between-subject biological variation of 36%. This indicates that the within-subject physiological fluctuation is much smaller than the difference in physiological set point between subjects. Similar considerations apply to FT4 and FT3.¹⁸

For example, consider a subject who has a physiological TSH set point near the upper end of the reference limits (e.g., 4 mIU/L). This patient will require a fall of 3.5 mIU/L to reach the lower reference limit of 0.5 mIU/L. The fall of 3.5 mIU/L represents nearly six times the within-subject biological variation (4 mIU/L × 16% = 0.6 mIU/L). In other words, the TSH must be highly abnormal in this subject before exceeding the reference limits, to detect subclinical and early thyroid dysfunction.¹⁹ In addition, the relationship between TSH and FT4 that underpins the clinical utility is complex, individualized, and often not the presumed log-linear form.^20,21

In summary, thyroid disease is common in the general population. Its diagnosis requires careful clinical assessment followed by appropriate laboratory testing using reference intervals that have been derived for the intended method and population. A retrospective data mining approach can be a cost-effective and useful method for determining appropriate reference intervals providing careful consideration is given to population stratification, selection of appropriate exclusion criteria, and the statistical methods used for data analysis. Ongoing standardization of laboratory methods is also required to allow the implementation of common reference intervals.

Footnotes

Authors' Contributions

All authors contributed to the conception and writing of the commentary and final approval of the version to be published and agreed to be accountable for all aspects of the study.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

No funding was received for this article.

References

National Guideline Centre (UK). Thyroid Function Tests: Thyroid Disease: Assessment and Management: Evidence Review C. National Institute for Health and Care Excellence (NICE): London, UK; 2019. Available from: https://www.nice.org.uk/guidance/ng145 [Last accessed: February 24, 2023].

Alexander

, Pearce

, Brent

, et al. Guidelines of the American Thyroid Association for the diagnosis and management of thyroid disease during pregnancy and the postpartum. Thyroid, 2017; 27(3):315–389; doi: 10.1089/thy.2016.0457

Taylor

, Razvi

, Pearce

, et al. Clinical review: A review of the clinical consequences of variation in thyroid function within the reference range. J Clin Endocrinol Metab, 2013; 98(9):3562–3571; doi: 10.1210/jc.2013-1315

Garber

, Cobin

, Gharib

, et al. American Association of Clinical Endocrinologists and American Thyroid Association Taskforce on Hypothyroidism in Adults. Clinical practice guidelines for hypothyroidism in adults: Cosponsored by the American Association of Clinical Endocrinologists and the American Thyroid Association. Thyroid, 2012; 22(12):1200–1235; doi: 10.1089/thy.2012.0205

Yamada

, Horiguchi

, Akuzawa

, et al. The impact of age- and sex-specific reference ranges for serum TSH and FT4 on the diagnosis of subclinical thyroid dysfunction: A multi-center study from Japan. Thyroid, 2023 [Epub ahead of print]; doi: 10.1089/thy.2022.0567

Jones

GRD

, Haeckel

, Loh

, et al. IFCC Committee on Reference Intervals and Decision Limits. Indirect methods for reference interval determination—Review and recommendations. Clin Chem Lab Med, 2018; 57(1):20–29; doi: 10.1515/cclm-2018-0073

Takeda

, Mishiba

, Sugiura

, et al. Evaluated reference intervals for serum free thyroxine and thyrotropin using the conventional outliner rejection test without regard to presence of thyroid antibodies and prevalence of thyroid dysfunction in Japanese subjects. Endocr J, 2009; 56(9):1059–1066; doi: 10.1507/endocrj.k09e-123

Sriphrapradang

, Pavarangkoon

, Jongjaroenprasert

, et al. Reference ranges of serum TSH, FT4 and thyroid autoantibodies in the Thai population: The national health examination survey. Clin Endocrinol (Oxf), 2014; 80(5):751–756; doi: 10.1111/cen.12371

Eskelinen

, Suominen

, Vahlberg

, et al. The effect of thyroid antibody positivity on reference intervals for thyroid stimulating hormone (TSH) and free thyroxine (FT4) in an aged population. Clin Chem Lab Med, 2005; 43(12):1380–1385; doi: 10.1515/CCLM.2005.236

10.

Völzke

, Alte

, Kohlmann

, et al. Reference intervals of serum thyroid function tests in a previously iodine-deficient area. Thyroid, 2005; 15(3):279–285; doi: 10.1089/thy.2005.15.279

11.

Cai

, Fang

, Jing

, et al. Reference intervals of thyroid hormones in a previously iodine-deficient but presently more than adequate area of Western China: A population-based survey. Endocr J, 2016; 63(4):381–388; doi: 10.1507/endocrj.EJ15-0574

12.

Tan

, Markus

, Vasikaran

, et al. APFCB Harmonization of Reference Intervals Working Group. Comparison of 8 methods for univariate statistical exclusion of pathological subpopulations for indirect reference intervals and biological variation studies. Clin Biochem, 2022; 103:16–24; doi: 10.1016/j.clinbiochem.2022.02.006

13.

Ammer

, Schützenmeister

, Prokosch

, et al. A proposed benchmark for the standardized evaluation of indirect methods for reference interval estimation. Clin Chem, 2022:hvac142; doi: 10.1093/clinchem/hvac142

14.

Hollowell

, Staehling

, Flanders

, et al. Serum TSH, T(4), and thyroid antibodies in the United States population (1988 to 1994): National Health and Nutrition Examination Survey (NHANES III). J Clin Endocrinol Metab, 2002; 87(2):489–499; doi: 10.1210/jcem.87.2.8182

15.

Razvi

, Ryan

, Ingoe

, et al. Age-related serum thyroid-stimulating hormone reference range in older patients treated with levothyroxine: A randomized controlled feasibility trial (SORTED 1). Eur Thyroid J, 2020; 9(1):40–48; doi: 10.1159/000504047

16.

Thienpont

, Van Uytfanghe

, De Grande

LAC

, et al. IFCC Committee for Standardization of Thyroid Function Tests (C-STFT). Harmonization of serum thyroid-stimulating hormone measurements paves the way for the adoption of a more uniform reference interval. Clin Chem, 2017; 63(7):1248–1260; doi: 10.1373/clinchem.2016.269456

17.

Waise

, Price

. The upper limit of the reference range for thyroid-stimulating hormone should not be confused with a cut-off to define subclinical hypothyroidism. Ann Clin Biochem, 2009; 46(Pt 2):93–98; doi: 10.1258/acb.2008.008113

18.

Andersen

, Pedersen

, Bruun

, et al. Narrow individual variations in serum T(4) and T(3) in normal subjects: A clue to the understanding of subclinical thyroid disease. J Clin Endocrinol Metab, 2002; 87(3):1068–1072; doi: 10.1210/jcem.87.3.8165

19.

Sterenborg

RBTM

, Galesloot

, Teumer

, et al. The effects of common genetic variation in 96 genes involved in thyroid hormone regulation on TSH and FT4 concentrations. J Clin Endocrinol Metab, 2022; 107(6):e2276–e2283; doi: 10.1210/clinem/dgac136

20.

Rothacker

, Brown

, Hadlow

, et al. Reconciling the log-linear and non-log-linear nature of the TSH-free T4 relationship: Intra-individual analysis of a large population. J Clin Endocrinol Metab, 2016; 101(3):1151–1158; doi: 10.1210/jc.2015-4011

21.

Hadlow

, Rothacker

, Wardrop

, et al. The relationship between TSH and free T₄ in a large population is complex and nonlinear and differs by age and sex. J Clin Endocrinol Metab, 2013; 98(7):2936–2943; doi: 10.1210/jc.2012-4223