Abstract

As thyroid disease is common and can present with nonspecific symptoms, thyroid function tests are one of the most widely requested laboratory investigations. The serum thyrotropin (TSH) concentration is an effective rule-out test for primary thyroid disease when used in the context of appropriate reference intervals. The dependence of the TSH reference intervals on iodine status, ethnicity, age, and pregnancy is well described and the use of relevant reference intervals is widely recommended. 1 –4 However, the provision of appropriate reference intervals can be challenging due to the complex and costly clinical, ethical, operational, and financial requirements associated with establishing prospective reference cohorts as well as the methodological differences that still plague thyroid function tests.
In this issue, Yamada et al. describe a retrospective approach for deriving reference intervals for TSH, free thyroxine (FT4), and free triiodothyronine (FT3) using data collected from routine health screening. 5 This retrospective method for the determination of reference intervals is known as “indirect” or “data mining.” 6 The study is helpful for illustrating important considerations for deriving and applying reference intervals.
As the reference intervals generated by indirect methods are highly sensitive to the reference population used for the study, it is crucial to have a good understanding of the physiology and pathophysiology of the biomarker in question. This will inform the design of clinical and statistical exclusion criteria to minimize the inclusion of data from patients with potentially confounding conditions that will invariably be present in a clinical laboratory database. 6 It also informs the need for data partitioning, for example, by age, sex, or ethnicity. Of note, it is generally recommended that at least 400 subjects are included for each partition when using this method. 6
Clinical exclusion criteria typically include the selection of clinical locations or services where the proportion of abnormal results is relatively low, for example, health screening or outpatient settings. 6 For thyroid function tests, information such as clinical features of thyroid disease, a history of thyroid conditions or treatment, pregnancy status, and other associated laboratory tests, where available, can also be used to remove subjects with established pathophysiology as well as other clinical conditions or pharmacological treatments that may affect thyroid function.
Thyroid autoantibody (thyroid peroxidase antibodies) status is also often used as an exclusion criterion as the inclusion of patients with positive thyroid antibodies has been shown to widen derived reference intervals 7 –9 As previously mentioned, iodine status, which affects thyroid autoimmunity, will also affect thyroid function test results, 10,11 but this is rarely evaluated in routine practice.
After this initial clinical exclusion, appropriate statistical techniques are required to extract the nondiseased distribution or to exclude the pathological distribution before reference intervals can be calculated from the remaining data. When appropriately applied, these statistical methods can effectively remove the contribution from affected subjects. 6,12,13 If affected subjects are not removed from the reference cohort, generated reference intervals will be artefactually widened. Analytical stability during the study period and a lack of significant interlaboratory method bias should be demonstrated before the data can be pooled for statistical analysis.
The results from the Yamada study need to be interpreted based on the inclusion criteria of the study and the size of the subpopulations analyzed as this may temper the conclusions that can be derived from this type of study. The upper reference intervals for TSH are particularly high in the oldest age group (80–89 years) and are several fold higher than the manufacturers recommendations (with some reaching as high as >20 mIU/L). This finding would need to be replicated in a larger cohort with clearly defined exclusion criteria before use in a clinical setting.
TSH generally increases with age 14 and may be contributed by changes in TSH bioactivity through molecular glycosylation or subclinical hypothyroidism. Ongoing clinical trials to replace thyroxine in the geriatric population to age-appropriate reference intervals may clarify the clinical significance of elevated TSH in this population. 15
This study also highlights the methodological difference that persists between different commercial providers of thyroid function tests. Notably the Abbott median TSH concentration appears negatively biased compared with the Siemens method, which, as stated by the authors, will have a modest impact on the diagnosis of thyroid disease if method-specific intervals are not employed. This observation is unsurprising as it included periods preceding global harmonization initiatives that may be reduced in more recent data. 16
It is, however, important to recognize that the diagnosis of thyroid disease cannot be made using reference limits alone and that a 95% reference interval may not reflect the optimum clinical decision limits. 17 TSH has a relatively narrow within-subject biological variation of 18% compared with a between-subject biological variation of 36%. This indicates that the within-subject physiological fluctuation is much smaller than the difference in physiological set point between subjects. Similar considerations apply to FT4 and FT3. 18
For example, consider a subject who has a physiological TSH set point near the upper end of the reference limits (e.g., 4 mIU/L). This patient will require a fall of 3.5 mIU/L to reach the lower reference limit of 0.5 mIU/L. The fall of 3.5 mIU/L represents nearly six times the within-subject biological variation (4 mIU/L × 16% = 0.6 mIU/L). In other words, the TSH must be highly abnormal in this subject before exceeding the reference limits, to detect subclinical and early thyroid dysfunction. 19 In addition, the relationship between TSH and FT4 that underpins the clinical utility is complex, individualized, and often not the presumed log-linear form. 20,21
In summary, thyroid disease is common in the general population. Its diagnosis requires careful clinical assessment followed by appropriate laboratory testing using reference intervals that have been derived for the intended method and population. A retrospective data mining approach can be a cost-effective and useful method for determining appropriate reference intervals providing careful consideration is given to population stratification, selection of appropriate exclusion criteria, and the statistical methods used for data analysis. Ongoing standardization of laboratory methods is also required to allow the implementation of common reference intervals.
Footnotes
Authors' Contributions
All authors contributed to the conception and writing of the commentary and final approval of the version to be published and agreed to be accountable for all aspects of the study.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
