Abstract
Background:
There has been recent debate within the thyroid field regarding whether current upper limits of the thyrotropin (TSH) reference range should be lowered. This debate can be better informed by investigation of whether variations in thyroid function within the reference range have clinical effects. One important target organ for thyroid hormone is the brain, but little is known about variations in neurocognitive measures within the reference range for thyroid function.
Methods:
This was a cross-sectional study of 132 otherwise healthy hypothyroid subjects receiving chronic replacement therapy with levothyroxine (LT4) who had TSH levels across the full span of the laboratory reference range (0.34–5.6 mU/L). Subjects underwent detailed tests of health status, mood, and cognitive function, with an emphasis on memory and executive functions.
Results:
Subjects with low-normal (≤2.5 mU/L) and high-normal (>2.5 mU/L) TSH levels did not differ on most tests of health status, mood, or cognitive function, and there were no correlations between TSH, free T4, or free T3 levels and most outcomes. There was, however, a suggestion that thyroid function affected performance on the Iowa Gambling Task, which mimics real life decision-making. Subjects with low-normal TSH levels made more advantageous decisions than those with high-normal TSH levels.
Conclusions:
Variations in thyroid function within the laboratory reference range do not appear to have clinically relevant effects on health status, mood, or memory in LT4 treated subjects. However, decision making, which encompasses many executive functions, may be affected. Unless further studies strengthen this finding, these data do not support narrowing the TSH reference range.
Introduction
S
This debate can be better informed by knowledge of the clinical consequences of variations in thyroid function within the reference range. Such consequences would strengthen the argument that the TSH reference range should be more narrowly defined. In fact, a number of studies have reported clinically relevant health consequences based on variations in thyroid function within the reference range. These outcomes have mainly been cardiovascular risk factors and events, metabolic parameters, bone density, and fracture risk [reviewed in Taylor et al. (3)].
The brain is another important target organ for thyroid hormone, but less is known regarding how psychological or cognitive function varies across TSH or thyroid hormone reference ranges. In euthyroid subjects without thyroid disease, depression, anxiety, or cognitive decrements have been linked to variations in TSH or free thyroxine (fT4) levels (4 –17). However, the correlations have shown improvements or decrements with higher or lower thyroid function, depending on the report. Other recent large population-based studies found no correlations between normal and near-normal TSH levels and depression, anxiety, or cognitive tests (18 –21). These latter studies circumvented biases inherent in studies of selected subjects, but they often utilized relatively insensitive screening tests for global cognitive function, which may miss subtle but important effects in specific, relevant cognitive domains.
Hypothyroid patients receiving LT4 therapy represent a special subset of “euthyroid” subjects. Despite normal TSH levels, many patients continue to complain of impaired health status, mood, and cognitive function. This often leads patients to request to increase their levothyroxine (LT4) doses or try alternate thyroid preparations. However, it is not clear that targeting the lower end of the TSH reference range will achieve patients’ desired outcomes. The observational literature is divergent, and there is only one controlled trial addressing this, which did not show significant effects (11,22 –28).
To address this issue, we recruited subjects with primary hypothyroidism receiving replacement doses of LT4, who then underwent extensive testing for health status, mood, and cognitive function. Rather than using less sensitive global screening tests, we employed intensive, sensitive measures that targeted two specific cognitive domains: executive function and memory. Our decision to focus on memory was based on our previous data and other studies suggesting that memory is preferentially affected in subjects with mild thyroid dysfunction (21,29,30), as well as animal studies that support a major role for LT4 in brain areas that mediate memory (31,32). Our decision to focus on executive function was based on the relative lack of information regarding thyroid effects on this cognitive domain, although studies and clinical observations suggest that this critical cognitive process is also affected [reviewed by Samuels (33)]. We hypothesized that LT4 treated subjects with lower TSH levels within the reference range have better health status, mood, memory, and executive function compared with subjects with higher TSH levels within the reference range. We also correlated these outcomes with free T4 and free triiodothyronine (fT3) levels, since some studies report associations with thyroid hormone, rather than TSH, levels.
Experimental subjects
One hundred thirty-two subjects receiving LT4 therapy for hypothyroidism, with TSH levels within the laboratory reference range, were recruited as a convenience sample between February 2009 and August 2012 from the authors’ clinics, through review of electronic health records, by flyers, and with mailings. Twelve were male and 120 were female. They were aged 21–70 years and were receiving LT4 for primary hypothyroidism (n = 102); hypothyroidism following 131I therapy for Graves’ disease (n = 13); postpartum thyroiditis leading to permanent hypothyroidism (n = 4); or a history of thyroid surgery (n = 13). All were diagnosed as adults and had received LT4 therapy for 5 months to 50 years (mean 12 years). All subjects had past elevated TSH levels. Levothyroxine doses were stable for at least 3 months (mean time on current LT4 dose 2 years). None of the subjects had any acute or chronic illness or were on medications that affect thyroid hormone levels, mood, or cognition. Stable doses of oral contraceptive or estrogen therapy were allowed. Testing was done during the first 14 days after onset of menstrual bleeding or an oral contraceptive cycle in premenopausal women.
Materials and Methods
Experimental design
The protocol was approved by the Oregon Health and Science University (OHSU) Institutional Review Board, and subjects gave written informed consent.
Screening visit
Subjects were screened for general health, medicines, thyroid status, and mood or cognitive disorders by history, physical examination and laboratory testing. General intelligence was estimated by the Wechsler Adult Intelligence–Revised (WAIS-R) Vocabulary subtest (34).
Testing visit
Within six weeks of the screening visit, subjects returned for a three- to four-hour testing visit. Subjects refrained from taking their LT4 dose that morning. Serum TSH, fT4, and fT3 levels were obtained at the beginning of each visit. Sixty-one of the samples were collected between 0700 and 0859 hours, 59 between 0900 and 1159 hours, and 12 between 1200 and 1359 hours, due to scheduling limitations for some subjects. Subjects self-completed the following measures of health status and mood:
A. The Thirty-Six Item Short Form Health Survey (SF-36), a questionnaire about general health (35). Higher scores on the summary scales and subscales reflect better health status and well-being.
B. The Profile of Mood States (POMS), a questionnaire about mood (36). Higher scores on POMS subscales reflect mood decrements, except for the vigor subscale, where higher scores reflect improved mood.
C. The Affective Lability Scale (ALS), a questionnaire on which subjects rate their agreement with statements regarding the tendency of their moods to shift between baseline and anger, depression, elation, and anxiety (37). Higher scores indicate increased lability of mood.
Cognitive tests were administered by a single experienced research assistant. Based on existent literature and our previous studies, we did not utilize a battery of general cognitive measures, but rather sensitive measures targeted to specific domains likely to be affected by altered thyroid status:
Test of declarative memory
A. Paragraph Recall Test (verbal memory). Subjects were read a brief story and verbally recalled it immediately and after a 30 minute interval. The outcome measure was the total number of story elements recalled at each interval (38).
Tests of executive function
A. Attention/concentration: The Letter Cancellation Test. This test consisted of a sheet of paper with six lines of 52 letters in random sequence. The subject was instructed to circle two specified target letters each time they appeared in the sequence, as quickly as possible. The score was the number of errors and the time taken to complete the task (39).
B. Cognitive flexibility: The Trail Making Test. In this task, the subject connected circles as quickly as possible without lifting the pen from the paper. In Part A, the circles were numbered and the subject was instructed to draw lines to connect the numbers in ascending order (1 to 2, 2 to 3, etc.). In Part B, the circles included both numbers and letters; the subject again drew lines to connect the circles in an ascending pattern, but also had to alternate between numbers and letters (1-A-2-B-3-C, etc.). Subjects were scored on the number of errors and time to complete the task (40).
C. Decision making: The Iowa Gambling Task. This task consisted of four decks of cards on a computer screen, shown face down. The subject had to choose cards from any deck, and each card resulted in the gain or loss of money. The subject was unaware that two of the decks were advantageous (small gains but smaller losses), while the other two decks were disadvantageous (large gains but even larger losses). The subject's choices were classified as advantageous (X) or disadvantageous (Y), with a net score of X – Y, over five trials of 100 cards each. A net score of zero is chance performance. This task assesses real-life decision making and responses to rewards and punishments (41).
D. Working memory tests.
i. N-Back Test. A series of letters was presented one at a time on a computer screen. Subjects responded each time a letter appeared that they had seen on the previous screen (one-back). The task was repeated with an increase in memory load by imposing intervening letters while the subjects had to hold in mind letters that had appeared two-back and then three-back. Outcome measures were the total number correct (on target) and the total number incorrect (off target) for each condition (42).
ii. Subject Ordered Pointing. Subjects were presented with a series of computer screens which had abstract drawings on them (6, 8, 10, or 12 per screen). Each screen in a set showed the same array of abstract drawings, but in a different spatial arrangement. The subject was instructed to indicate one drawing per screen using computer keys. They were to avoid choosing the same drawing on subsequent screens in the set. Subjects erred when they chose a drawing that had been previously chosen. Each set was repeated three times. The outcome measure was the total number of errors across each screen set (43). Similar to the N-Back test, the subject must hold the drawings in mind and inhibit responses to previously selected drawings.
Tests of motor learning and motor memory:
A. Pursuit Rotor. Subjects held a photosensitive wand to maintain contact with a 2 cm light disk rotating on a turntable (Model 30014, Lafayette Instrument Company). Two blocks of eight 20 second trials were administered, with a 20 second rest after each trial, and a 60 second rest period after four trials. After a 30 minute interval, the two blocks were repeated. The outcome measure was the mean total time the stylus remained on target (44).
B. Motor Sequence Learning Test. The subject memorized two keypress sequences on a computer, each associated with a letter of the alphabet. A “T” was associated with the sequence 1-3-2 and an “H” was associated with 3-1-2. As soon as the T or H appeared on the screen, the subject performed the appropriate sequence as quickly as possible. Subjects performed 10 blocks of 18 trials each. The outcome was the average total movement time (time from character presentation to completion of the keypress sequence) (45).
Analytic methods
TSH was measured by immunochemiluminometric assay (Beckman Coulter); functional sensitivity 0.02 mU/L, reference range 0.34–5.60 mU/L, interassay coefficient of variability (CV) 5% at 0.70 mU/L. Free T4 was measured by direct equilibrium dialysis (Quest Diagnostics); sensitivity 0.08 ng/dL, reference range 0.8–2.7 ng/dL, interassay CV 6.8% at 0.3 ng/dL and 1.6% at 3.8 ng/dL. Free T3 was measured by tracer dialysis (Quest Diagnostics); sensitivity 25 pg/dL, reference range 210–440 pg/dL, interassay CV 4%. TSH levels were measured at the time of testing, with no change in assay characteristics during the period of the study. Free T4 and fT3 levels were batched and analyzed at the end of the study. All samples were run in duplicate.
Statistical methods
Subjects were divided into two groups based on serum TSH levels. The Low-Normal TSH group was defined as subjects with a TSH between 0.34 (the lower limit of the assay reference range) and 2.50 mU/L. The High-Normal TSH group was defined as subjects with a TSH between 2.51 and 5.60 mU/L (the upper limit of the assay reference range). The cut-off of 2.50 mU/L for the two groups was decided based on recent debate within the thyroid field over whether the TSH reference range should be restricted to an upper limit of 2.5 mU/L to achieve a Gaussian distribution of TSH levels in healthy populations and exclude subjects with possible incipient hypothyroidism (46). There were 85 subjects in the Low-Normal TSH group and 47 subjects in the High-Normal TSH group. Seven of the 12 men were in the Low-Normal TSH group and 5 were in the High-Normal TSH group.
Health status, mood, and cognitive outcomes were compared between the two groups. Subscales of each measure were analyzed together using linear repeated measures analyses (R version 3.2.1) (47) using the non linear mixed effects (NLME) package lme function (48) or, for binary outcomes, generalized estimating equations (geepack package geeglm function) (49). These methodologies allow for correlation between subscale measures for each subject. For continuous outcomes, a compound symmetric covariance structure was used to analyze the data. These models included adjustments for age, WAIS-R vocabulary score, years in school, body mass index, estrogen status, duration of time on LT4, duration of time at current LT4 dose, and LT4 dose (μg/kg). Analysis of binary outcomes used compound symmetric covariance matrices and was unadjusted due to the more limited nature of the data.
An initial assessment of interaction between group and subscale was obtained for each set of subscales. Likelihood ratio tests were conducted to determine whether models with the interaction were significant at level 0.10, in which case a comparison of groups was conducted for each subscale. If the addition of the interaction was not significant, the comparison of groups was conducted for the set of scales as a whole (the main effect of group was analyzed, dropping the interaction from the model). To limit the effect of multiple comparisons, the original plan was to conduct follow-up comparisons of groups for each subscale individually only if evidence of a group effect was observed at level 0.05. However, since none of these tests were significant, follow-up comparisons for all individual subscales were tested to confirm the lack of significance.
In addition, we examined relationships between outcomes and TSH, fT4, and fT3 as continuous variables using the same repeated measures methodology but substituting, in separate models, the selected hormone for the categorical group variable of low-normal and high-normal TSH.
Results
Clinical parameters and thyroid function tests
The low-normal and high-normal TSH groups were well matched for age, WAIS-R Vocabulary score, years in school, sex, estrogen status, body mass index, duration of LT4 treatment, and duration at current LT4 dose (Table 1). As would be predicted, LT4 doses were higher in the low-normal TSH group compared with the high-normal TSH group (1.51 ± 0.05 vs. 1.26 ± 0.06 μg/kg/day, p = 0.002). By design, all subjects had TSH levels within the reference range, with mean TSH levels lower in the low-normal TSH group than the high-normal TSH group (1.35 ± 0.07 vs. 3.60 ± 0.11 mU/L, p < 0.0001). Mean fT4 and fT3 levels were similar in the two groups. No subject had a fT4 level outside the reference range. Seventy subjects had low fT3 levels, between 118 and 209 pg/dL (reference range 210–440 pg/dL). Forty of these subjects were in the Low-Normal TSH group (47% of this group) and 30 were in the high-normal TSH group (64% of this group).
Significant differences between groups are shown in bold with corresponding p-values.
BMI, body mass index; T3, triiodothyronine; LT4 dose time, duration of current levothyroxine (LT4) dose; LT4 time, duration of LT4 therapy; Pre-none, premenopausal, no hormone treatment; Pre-on, premenopausal on hormone treatment; Post-none, postmenopausal, no hormone treatment; Post-on, postmenopausal on hormone treatment; TSH, thyrotropin; WAIS-R, Wechsler Adult Intelligence–Revised.
Health status and mood: SF-36, POMS, ALS
There were no significant differences between the low-normal TSH and high-normal TSH groups in SF-36, POMS, or ALS overall scales or subscales (Table 2). Analyzing TSH, fT4, and fT3 as continuous variables across both groups (Table 3), the POMS anger subscale was negatively correlated with increasing fT4 levels (p = 0.04), although the magnitude of the correlation was small (0.5 unit decrease in POMS-Anger for each 1.0 ng/dL increase in fT4 level). There were no other significant correlations between TSH, fT4, or fT3 and health status or mood measures.
p-Values for continuous outcomes were adjusted for age, years of education, WAIS-R, BMI, estrogen status, LT4 time, LT4 dose time, and LT4 dose (μg/kg).
For these variables, the distributions within each group were highly skewed. The highest observed values of each subscale were used as the cut-points for producing a dichotomous measure. For BP, the highest observed value was 90, whereas for the other scales, the highest observed value was 100.
Profile of Mood States values were natural log-transformed prior to analysis because the raw data were skewed. All values were increased by one before the transformation due to the presence of zeros. The vigor subscale was analyzed separately since it was the only scale for which higher values represented improved mood.
These scores were compared as the proportion positive between the groups, since the measures were skewed and contained a large proportion of zeros.
Correlations were analyzed by repeated measures methodology using separate models for each hormone. Positive coefficients indicate that the measure increased with increasing hormone levels, while negative coefficients indicate that the measure decreased with increasing hormone levels.
For continuous measures, the magnitude of the coefficient indicates the estimated change in the measure with a 1 unit increase in free T4 (fT4) or TSH, or a 10 unit increase in fT3.
For binary measures, coefficients were transformed to estimate the percent change in the predicted odds of the measure for a 1 unit increase in fT4 or TSH, or a 10 unit increase in fT3. The transformed coefficients are estimates of the risk ratios associated with the 1 or 10 unit increase in the respective hormone level.
p-Values for continuous outcomes were adjusted for age, years of education, WAIS-R, BMI, estrogen status, LT4 time, LT4 dose time, and LT4 dose (μg/kg). Significant coefficients are shown in bold with corresponding p-values.
For these variables, the distributions within each group were highly skewed. The highest observed values of each subscale were used as the cut-points for producing a dichotomous measure. For BP, the highest observed value was 90, whereas for the other scales, the highest observed value was 100.
Profile of Mood States values were natural log–transformed prior to analysis because the raw data were skewed. All values were increased by one before the transformation due to the presence of zeros. The vigor subscale was analyzed separately since it was the only scale for which higher values represented improved mood. The magnitude of the coefficient indicates the estimated change in the natural log of the measure plus one with a 1 unit (10 units for fT3) increase.
These scores were compared as the proportion positive between the groups, since the measures were skewed and contained a large proportion of zeros.
Cognitive tests
The interaction between thyroid groups and the Iowa Gambling Task (IGT) net scores was significant (p = 0.049), indicating that the low-normal TSH and high-normal TSH groups exhibited different trends across the five trials of their IGT net scores. Both groups started with disadvantageous decision making with the first deck of cards (Net-1), with the low-normal TSH group significantly worse than the high-normal TSH group (p = 0.02). Both groups learned to make advantageous decisions with subsequent decks, but the low-normal TSH group improved more with each deck than the high-normal TSH group. The low-normal TSH group showed a pattern of increased learning to choose advantageously (better decision making), whereas the high-normal TSH group plateaued earlier and did not show consistent improvement across trials (Fig. 1). There were no significant differences between the two groups for other cognitive outcomes (Table 4).

Net scores for each card deck for the Iowa Gambling Task in the low-normal and high-normal thyrotropin (TSH) groups. Mean Net-1 score was significantly lower in the low-normal TSH group (p = 0.02). Mean Net-4 and Net-5 scores were significantly higher in the low-normal TSH group (p = 0.01, p = 0.04 respectively).
Individual tests are grouped by cognitive subdomains (first column). p-Values for continuous outcomes were adjusted for age, years of education, WAIS-R, BMI, estrogen status, LT4 time, LT4 dose time, and LT4 dose (μg/kg).
Significant differences between the two groups are shown in bold with corresponding p-values.
These values were calculated as proportion of subjects for which each measure was ≥15 (for correct on target) or >0 (for incorrect/nontarget), since there were floor effects. The N-Back p-values for comparing groups do not include three-back, since this variable was not calculated as a proportion.
ABC, Trail Making Test Part B.
Analyzing TSH, fT4, and fT3 as continuous variables across both groups (Table 5), the IGT Net-1 (baseline) score was positively correlated with TSH levels, indicating better performance (1.65 unit increase for every 1 mU/L increase in TSH level, p = 0.02). There were no correlations with the N-Back number correct target or incorrect nontarget, except the three-back number incorrect nontarget was negatively correlated with fT4 levels (p = 0.04). Further, the magnitude of the correlation was small (1.05 unit decrease in three-back number incorrect nontarget for every 1 ng/dL increase in fT4 level). There were no correlations with accuracy on the Trail Making Test, although the time to complete the test was positively correlated with fT3 levels (p = 0.049). This indicated worse performance, although again the magnitude of the correlation was small (0.26 second increase in time for each 10 pg/dL increase in fT3 level). There were no other significant correlations between TSH, fT4, or fT3 and cognitive measures.
Individual tests are grouped by cognitive subdomains (first column).
Correlations were analyzed by repeated measures methodology using separate models for each hormone.
Positive coefficients indicate that the measure increased with increasing hormone levels, while negative coefficients indicate that the measure decreased with increasing hormone levels.
For continuous cognitive measures, the magnitude of the coefficient indicates the estimated change in the measure with a 1 unit increase in fT4 or TSH, or a 10 unit increase in fT3.
For binary cognitive measures, coefficients were transformed to estimate the percent change in the predicted odds of the measure for a 1 unit increase in fT4 or TSH, or a 10 unit increase in fT3. The transformed coefficients are estimates of the risk ratios associated with the 1 or 10 unit increase in the respective hormone level.
p-Values for continuous outcomes were adjusted for age, years of education, WAIS-R, BMI, estrogen status, LT4 time, LT4 dose time, and LT4 dose (μg/kg). Significant coefficients are shown in bold with corresponding p-values.
These values were calculated as proportion of subjects for which each measure was ≥15 (for correct target) or >0 (for incorrect/nontarget), since there were floor effects. The N-Back p-values for comparing groups do not include 3-Back since this variable was not calculated as a proportion.
Discussion
In this cohort of LT4 treated subjects who had TSH levels across the full span of the laboratory reference range, we found little evidence that variations in thyroid function were correlated with health status, mood, or memory. This was true whether the data were analyzed as dichotomous variables (low-normal vs. high-normal TSH) or as continuous variables. There was a suggestion in both analyses of the IGT that higher thyroid function (lower TSH levels) within the reference range was associated with better “real-life” decision making. However, given the numbers of measures and the weak effects, additional study would be needed to verify this finding.
The published literature regarding these neurocognitive outcomes in LT4 treated subjects has been divergent and inconclusive. Some studies have shown that LT4 treated subjects have decreased psychological well-being or cognitive measures compared with control groups, while other studies have reported no differences in these measures (11,22 –27). Only four studies further investigated whether outcomes varied by TSH or thyroid hormone levels within the reference range in LT4 treated subjects, again with divergent findings (11,23,25,27). Our study extends these limited published data in LT4 treated euthyroid subjects. We did not find clinically significant associations between thyroid function and health status or mood. Most relevant, ours is the first report that correlates thyroid function in this population with sensitive tests of two cognitive domains, memory and executive function, which map to brain areas known to be responsive to thyroid hormone (30 –32).
There were no significant differences in tests of memory between the low-normal and high-normal TSH groups, including the domains of declarative memory, working memory, and motor learning/motor memory. We did find some associations with memory in our analysis across both groups using TSH and thyroid hormones as continuous variables, but the magnitude of the associations was small and of limited clinical relevance.
Executive functions have not been extensively studied in thyroid disease, since rodent models do not adequately represent complex executive function in humans, and many laboratory-based measures of executive function are labor intensive and/or relatively insensitive to “real-world” daily life scenarios. For these reasons, most of the published studies referenced above did not include sophisticated tests of executive functions. We utilized five tests that measure different aspects of this executive cognitive domain. Two of them are relatively simple and widely used tests which measure attention and concentration (Letter Cancellation Test) (39) and attention and cognitive flexibility (Trail Making Test) (40). Two of them are sensitive measures of working memory (N-Back and Subject Ordered Pointing) (42,43). In concordance with the few available reports in non-LT4 treated (7,17) and LT4 treated euthyroid subjects (23,26,27), we did not find associations between thyroid function and these four measures.
A novel strength of our study is the inclusion of an additional test of executive function, the IGT, as a representation of real-world decision making. In the IGT, individuals experience rewards and punishments for selecting cards from decks that provide either high immediate rewards and larger punishments or smaller immediate rewards and smaller punishments. Advantageous decision making relies on shifting choices away from disadvantageous card decks toward advantageous card decks (41). We found that the low-normal TSH group initially made poor decisions compared to the high-normal TSH group (Net-1), but then the low-normal TSH group outperformed the high-normal TSH group with larger gains across subsequent trials. There was also a clinically relevant relationship between TSH and performance across all subjects. The pattern we observed in the low-normal TSH group (consistent improvement across trials) appears similar to that reported in healthy control subjects (49), and better than the plateau observed in the high-normal TSH group. To our knowledge, this is the first report that utilized such a “real-life” measure of decision making in LT4 treated subjects (27). However, these results are preliminary, given our limited sample size and multiple comparisons. In addition, it is difficult to explain why the low-normal TSH group would have worse scores initially but then improve to a greater extent; there were no systematic differences, such as season or time of day, in testing between the two groups. It is possible that this represents regression to the mean, since the low-normal group had more room to improve. For these reasons, we interpret this finding as preliminary, and its major importance is to suggest a focus for future studies on thyroid status and executive function.
Complementing our observational data, there are two interventional trials of LT4 therapy in subjects with normal TSH levels. Walsh et al. varied LT4 doses in a blinded fashion in LT4 treated subjects to achieve low-normal or high-normal TSH levels and did not find any effects on hypothyroid symptoms, quality of life, or cognitive function, although executive function was only tested with the Trail-Making Test (28). Pollock et al. administered LT4 or placebo in a blinded fashion to euthyroid (non-LT4 treated) subjects who had hypothyroid-type symptoms, as well as asymptomatic controls, and found no improvement in psychological measures in either group, although cognitive tests were not done (50).
An interesting side finding in our study was the high prevalence of low serum fT3 levels in many LT4 treated subjects with normal TSH levels, even in the low-normal TSH group. This has been described in previous reports [reviewed in Jonklaas et al. (51)], and has led to suggestions that hypothyroid subjects may benefit from L-triiodothyronine (LT3) treatment. However, studies of LT3 add-on or monotherapy in hypothyroidism have not shown improvements in quality of life, mood, or cognitive outcomes, and LT3 is not recommended for standard treatment of hypothyroidism [reviewed in Jonklaas et al. (51)]. Our data showing lack of correlation between fT3 levels and these outcomes is concordant with this recommendation.
Despite our novel emphasis on executive function, our study also has limitations. Our sample size is limited, and it is possible that we missed small effects. However, the small magnitude of effects we report suggest that clinically meaningful alterations for each measure are unlikely, with the possible exception of the IGT. In addition, two studies reported that subjects with subclinical hypothyroidism (mean TSH = 14.7 and 19.4) performed more poorly than euthyroid subjects on the N-Back test. The numbers of subjects in each study was quite small (n = 11 and 16), suggesting that the sensitive tests we employed are responsive to small changes in thyroid function and do not require large samples sizes for effects to be apparent (30,52). We performed a relatively large number of correlations for our sample size, although we accounted for this in our analysis, and it is likely that some of our minor findings were due to chance. We did not include a control group of non-LT4 treated euthyroid subjects due to resource constraints, so we cannot state that our subjects had similar outcomes as healthy subjects. However, our results are similar to population norms and our previous studies in healthy control subjects for our test measures (53,54). Most of our subjects were women, and although we included sex as a covariate, it is unclear whether our results would be similar in a larger group of men. Subjects also tended to be younger and slimmer than the overall U.S. population, and our results may not be generalizable to older or heavier subjects receiving LT4 for hypothyroidism. Subjects were heterogeneous in terms of underlying thyroid diagnosis and length of LT4 treatment, although this reflects the reality of recruiting subjects for intensive clinical studies and provides relevance for clinical practice. We attempted to collect blood samples at a consistent time of day to avoid circadian variations in TSH levels, but this was not always possible due to scheduling limitations. In healthy subjects with typical sleep–wake cycles, serum TSH levels generally decrease slightly between 0700 and 0900 hours, and then remain stable until the evening (55). Thus, there may have been slight variation in TSH levels in our study due to sampling time that could limit our results.
It is a common clinical observation that some otherwise healthy patients with hypothyroidism continue to complain of fatigue, poor mood, inability to concentrate, and vague cognitive difficulties (often described as “brain fog”) despite normal TSH levels. Published studies have attempted to document this (11,22 –27), but it can be difficult to map these subjective complaints to specific objective domains. Our test measures most relevant for these complaints include subsets of the SF-36, POMS and ALS for fatigue and mood, as well as the Letter Cancellation and Trail Making Tests (attention, concentration, and cognitive flexibility). We did not find any correlations between thyroid function and these measures. The relevance of the IGT results to these complaints is less clear, as patients may have trouble elucidating specific deficits in executive function beyond phrases like “brain fog.”
Also of note, psychological well-being in subjects with normal TSH levels seems to depend on whether they have a diagnosed thyroid condition, suggesting that self-knowledge of a thyroid disorder impairs well-being regardless of the TSH level (11,54). All of our subjects knew they had hypothyroidism, and may have been less satisfied with their health status and therefore more likely to volunteer for a research study. We did not query them regarding dissatisfaction with their LT4 treatment, but it would be fruitful in future studies to correlate this parameter to self-reported health status and mood.
In summary, we found no differences in measures of health status, mood, memory or executive functions in LT4 treated subjects based on whether their TSH levels were above or below 2.50 mU/L, a level that has been suggested as a target for LT4 therapy (46). We also found few correlations between continuous measures of TSH, fT4, or fT3 levels within the reference range and health status, mood, or cognition. These findings augment the limited body of literature that suggests that variations in thyroid function within the reference range do not adversely affect these neurocognitive measures in a clinically significant way. On the other hand, our preliminary findings with the IGT raise the intriguing possibility that complex “real life” decision making may be affected by small variations in thyroid function. Our results suggest future directions for research that include more sensitive and specific validated tests that encompass symptoms typically reported by patients rather than general tests of health status and mood; studies that specifically target dissatisfied patients; and studies that examine the complexities of executive function in more depth than has traditionally occurred. In the absence of further data, complaints of quality of life, mood, or cognitive decrements should not be used as the sole reason to alter thyroid hormone doses in treated hypothyroid patients in attempts to achieve lower TSH levels within the reference range.
Footnotes
Acknowledgments
We would like to thank the staff of the OHSU Clinical and Translational Research Center for excellent patient care and research support, and the Biostatistics and Design Program for data analysis expertise. This work was supported by grants R01 DK075496 (M.H.S.; National Institutes of Health) and UL1 RR024120 (OHSU Clinical and Translational Science Awards); the clinical trial registration number is NCT00565864.
Author Disclosure Statement
No competing financial interests exist.
