Abstract
This study serves as an investigation of the reliability of symptom data as reported by individuals with chronic fatigue syndrome (CFS), across three recall time frames (the past week, the past month, and the past 6 months), and at two assessment points (with 1 week in between each assessment). Multilevel model analyses were used to determine the optimal recall time frame, in terms of test -retest reliability, for each of the Fukuda et al. (1994) case defining symptoms. Results suggested that the optimal time frame for reliably reporting CFS symptoms was six months for sore throat, lymph node pain, muscle pain, post-exertional malaise, headaches, memory/concentration difficulties, and unrefreshing sleep. For joint pain, the optimal time frame was one month. Researchers who are interested in the assessment of CFS symptoms need to take recall time frame into account, especially when the intended goal is to standardize and improve the methods used to reliably and accurately diagnose this complex illness.
Introduction
Retrospective self-report data have been particularly useful and widely used in research of the illness referred to as chronic fatigue syndrome (CFS). This reliance on retrospective self-report data may introduce biases that decrease the reliability and validity of diagnostic assessment. In addition, individuals with CFS often experience substantial cognitive difficulties (Komaroff & Buchwald, 1991; Lange et al., 2005), thus making retrospective health assessments more challenging and vulnerable to recall bias. In order to improve the reliability and ultimately the validity of CFS assessment, it is important to investigate the various factors that may be increasing biases that can ultimately reduce the integrity of illness data. One potential factor that may be impacting the reliability and validity of health assessment is the reporting period/time frame in which patients are asked to recall their health symptoms.
Many measures used to assess health symptoms for CFS have varying recall time frames, and it is unclear which time frame is optimal. Two symptoms that are commonly cited in the recall literature, and that are experienced by individuals with CFS, are pain and fatigue. Pain has been studied under a variety of contexts, and within this literature, researchers have found that longer reporting periods lead to greater biases and reduced accuracy (Broderick et al., 2008; Stone, Broderick, Kaell, DelesPaul, & Porter, 2000; Stone, Broderick, Shiffman, Litcher-Kelly, & Calvanese, 2003; Stone, Schwartz, Broderick, & Shiffman, 2005; Williams et al., 2004). Research on the recall of pain and fatigue levels across different reporting periods suggests that when patients are asked to recall their symptoms, their recall accuracy weakens over the course of seven days (Broderick et al., 2008). Furthermore, recalled symptom ratings tend to be consistently higher than averaged momentary ratings across seven days (Broderick et al., 2008). This discrepancy has also been shown for chronic fatigue and CFS-related fatigue (Friedberg & Sohl, 2008; Sohl & Friedberg, 2008). Interestingly, Broderick et al. (2008) found that some correlations between recalled symptom ratings and averaged momentary ratings for pain and fatigue were higher for a 28-day recall time frame compared to a seven day recall time frame. Broderick et al. theorize that individuals with chronic illnesses may have a good idea of their typical symptom pattern overtime, thus allowing them to make an overall assessment of the last 28 days based on their symptom beliefs.
There is very limited research within the CFS field comparing the reliability of symptom assessment across varying recall time frames. One study by Hawk, Jason, and Torres-Harding (2007) evaluated the test–retest reliability of a CFS diagnostic instrument (Jason et al., 1997), which assesses Fukuda et al. (1994) CFS symptoms using a six month time frame, as well as symptoms of energy and fatigue using the “past day” and “past week” time frames. Hawk et al. (2007) found that when using a six month time frame, the average intraclass correlation scores for items assessing the eight CFS case-defining symptoms were very good (.77). Two symptoms (tender/sore lymph nodes and pain in multiple joints) had somewhat lower reliability scores (.58 and .49, respectively). Items asking participants to rate their perceived energy, expended energy, and fatigue experienced over the past day had lower reliability scores (.59, .40, and .22, respectively) compared to the same items recalled over the past week (.77, .59, and .81, respectively). The authors have suggested that these symptoms likely fluctuate often and can be more consistently recalled over a longer time frame.
There is rarely any justification given for why a particular time frame is used in health research (Broderick et al., 2008). Given the strong need for objective and reproducible CFS criteria, it would be beneficial to determine the degree to which varying time frames impact recall of CFS symptoms. Furthermore, it has been argued that the accuracy and reliability of symptom recall is important for the development of appropriate and valid treatments (Fienberg, Loftus, & Tanur, 1985). In order to fill the gap in research on the potential connection between time frame and the reliability of symptom recall, the current study serves as a preliminary investigation of the test–retest reliability of CFS symptoms reported at three different recall time frames (the past week, the past month, and the past 6 months). The purpose of this study is to illuminate which time frame or time frames are optimal for reliably recalling symptoms of CFS. Based on the previous research by Hawk et al. (2007), we hypothesize that participants will recall the majority of their CFS symptoms with high reliability when using the six month time frame.
Method
Participants
Participants were identified through the use of an Institutional Review Board (IRB)-approved research advertisement published in a CFS Chicago newsletter. The current study group was also made up of individuals who participated in an earlier non-pharmacological intervention at DePaul University’s Center for Community Research (Jason et al., 2007). Participants received a US$5 Amazon gift card upon completion of the study.
Procedure
Data collection occurred at two time points, with one week between the first and second assessment. Researchers received verbal consent from participants over the phone and scheduled two phone interviews. In order to ensure that all participants completed the questionnaires under the same conditions, the interviews took place over the phone and were scheduled with one week in between the first and second interview, at the same time, and on the same day of the week. During the first interview, participants were not told that they would be asked the same questions a week later, and instead were informed that they would be taking another short symptom survey during the second interview. This was to ensure that participant responses at the second interview were not primed by the first.
During the first phone interview, participants were read questions aloud from the Symptom Inventory–Revised (SI-R), which was altered from the original Symptom Inventory developed by the Centers for Disease Control and Prevention (CDC; Wagner et al., 2005) to provide more time frames. Phone interviewers repeated items for participants as necessary. During the second phone assessment, phone interviewers read items from the SI-R to participants a second time. Following completion of the second phone assessment, the phone interviewers debriefed participants on the purposes of the study.
Measures
IRB-approved graduate students and staff members at the Center for Community Research at DePaul University administered the study measures over the phone. Interviewers read the same set of instructions to all participants and recorded responses as they were given.
CFS Symptom Assessment
The SI-R assesses the presence, frequency, and severity of the case-defining symptoms of CFS (post-exertional malaise [PEM], unrefreshing sleep, problems with memory/concentration, muscle aches and pains, joint pain, sore throat, tender lymph nodes/swollen glands, and headaches) according to Fukuda et al. (1994) case criteria. The SI-R is a revision of an earlier Symptom Inventory that was developed by the CDC. The CDC Symptom Inventory (Wagner et al., 2005) assesses the frequency and severity of symptoms over the past month, and it has been shown to have good internal consistency with a Cronbach’s a coefficient of .88 for the total inventory score. Furthermore, a short-form version of the measure consisting of only six symptoms (fatigue after exertion, unrefreshing sleep, muscle aches, sleeping problems, problems with memory, and problems with concentration) was shown to have a Cronbach’s α coefficient of .87 for the total score (Wagner et al., 2005). For the purposes of this study, revisions to the SI-R included the addition of four time frames: right now, the past week, the past month, and the past six months. Participants’ frequency and severity ratings on the SI-R were multiplied to create a composite score for each symptom at the past week, the past month, and the past six month intervals, with scores ranging from 0 to 25 (Wagner et al., 2005). The authors are not aware of any previous studies on the test–retest reliability of the CDC Symptom Inventory (Wagner et al., 2005).
Demographic Information
A short demographic survey was administered directly following the completion of the SI-R, at time one only. The demographic survey included eight questions, which assessed age, gender, weight, height, race, marital status, occupational status, number of children, and highest grade level.
Statistical Analysis
The present study utilized a multilevel modeling (MLM) approach within a repeated measures design in order to assess the reliability of symptom reports across two interview assessments. In order to assess the reliability of symptom reports using an MLM approach, the slope coefficients were observed, with a slope of 1.0 indicating a perfectly reliable prediction from one time to the next, without overestimating or underestimating. There is a substantial degree of dependency in data observations and this is evidenced by intraclass correlation coefficients (ICCs). The MLM approach was considered ideal, as it is able to take into account this dependency while also providing information on the strength of the reliability slopes at each recall time frame.
Results
Participant Traits
The study population consisted of 51 adults (45 women and six men), between the ages of 29 and 66 (M = 50.39), with a current diagnosis of CFS. The majority of participants self-identified as White (94%), one participant self-identified as Asian/Pacific Islander, and two self-identified as “other.” Two participants self-identified as Latino/Hispanic origin. Approximately half of all participants reported that they were married (N = 27), 13 reported that they were never married, and 11 reported that they were divorced. The majority of participants reported that they received a standard college degree or higher (70.6%), and all 51 participants reported at least a high school degree. Over half of participants indicated that they were on disability (58.8%), and the large majority reported that CFS was the cited reason for their disability claim. Only one participant reported working full time, and six reported working part time. A large proportion of participant diagnoses (84%) was confirmed with letters of documentation that were submitted by independent physicians. All 51 participants met criteria for the Fukuda et al. (1994) case definition, as well as the CDC’s empiric case definition (Reeves et al., 2005) for CFS.
Primary Results
Below is the MLM equation used for determining the optimal time frame, in terms of test–retest reliability, for participants’ recall of their CFS symptoms:
For ease of description, Level 2 of the model tested (1) the extent that symptom composite scores at Interview one (γ01) predicted composite scores at Interview two, and (2) how time frame moderated (γ11; γ21) the way symptom composite scores at Interview one predicted scores at Interview two. Level 1 of the model tested the main effect of time frame (γ10; γ20). Analyses were conducted using all CFS symptom composite scores. Symptom composite score means and standard deviations were calculated for both interviews and at all three time frames, and are shown in Table 1. Grand mean centering was conducted for the Level 2 variables in order to ease interpretation. The random effect around the intercept was also examined (υ ij ).
Means and Standard Deviations of Symptom Composites on the SI-R at Interview one and Interview two, N = 51.
Note. M = mean; SD = standard deviation; SI-R = Symptom Inventory–Revised.
Results of the above analyses revealed that the slope coefficients for all but one symptom (all except joint pain) were optimal at the six month time frame. The slope coefficient for joint pain scores revealed that the past month was optimal for reliably reporting joint pain. The slope coefficients for each symptom are listed in Table 2, and at all three time frames. Coefficients that are closest to the value 1.0 represent more reliable symptom reporting.
Slope Coefficients of ME/CFS Symptoms Across Time Frame.
Note. b = slope coefficient; CFS = chronic fatigue syndrome; df = degrees of freedom; ME = myalgic encephalomyelitis; p = p value/probability of obtaining test statistic; PEM = post-exertional malaise; SE = standard error; t = test statistic.
The symbol * refers to the optimal time frame (coefficients closest to 1.0).
Sore Throat
For sore throat scores, the slope coefficient for the six month reference was closest to 1.0 at .75, compared to −.06 for the past week, and .33 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting sore throats (see Table 2). The within variance of the distribution residuals was 5.65 and the between variance of distribution residuals was 26.06. The ICC score was calculated as .82, suggesting that 82% of the variance in predicting sore throat scores at Interview two is explained by the nesting of individual factors and sore throat scores at Interview one.
Lymph Node Pain
For lymph node scores, the slope coefficient for the six month reference was closest to 1.0 at 1.15, compared to −.10 for the past week, and .12 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting lymph node pain (see Table 2). The within variance of the distribution residuals was 3.82 and the between variance of distribution residuals was 25.90. The ICC score was calculated as .87, suggesting that 87% of the variance in predicting sore throat scores at Interview two is explained by the nesting of individual factors and lymph node scores at Interview one.
PEM
For PEM scores, the slope coefficient for the six month reference was closest to 1.0 at .72, compared to .30 for the past week, and .28 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting PEM (see Table 2). The within variance of the distribution residuals was 7.21 and the between variance of distribution residuals was 16.27. The ICC score was calculated as .69, suggesting that 69% of the variance in predicting PEM scores at Interview two is explained by the nesting of individual factors and PEM scores at Interview one.
Muscle Pain
For muscle pain scores, the slope coefficient for the six month reference was closest to 1.0 at .74, compared to −.48 for the past week, and .43 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting muscle pain (see Table 2). The within variance of the distribution residuals was 9.26 and the between variance of distribution residuals was 25.35. The ICC score was calculated as .73, suggesting that 73% of the variance in predicting muscle pain scores at Interview two is explained by the nesting of individual factors and muscle pain scores at Interview one.
Joint Pain
For joint pain scores, the slope coefficient for the past month reference was closest to 1.0 at .81, compared to −.09 for the past week, and .54 for the past 6 months, suggesting that the past month was the optimal time frame for reliably reporting joint pain (see Table 2). The within variance of the distribution residuals was 6.25 and the between variance of distribution residuals was 36.24. The ICC score was calculated as .85, suggesting that 85% of the variance in predicting joint pain scores at Interview two is explained by the nesting of individual factors and joint pain scores at Interview one.
Unrefreshing Sleep
For sleep scores, the slope coefficient for the six month reference was closest to 1.0 at .47, compared to .29 for the past week, and .35 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting unrefreshing sleep scores (see Table 2). The within variance of the distribution residuals was 6.04 and the between variance of distribution residuals was 33.14. The ICC score was calculated as .85, suggesting that 85% of the variance in predicting sleep scores at Interview two is explained by the nesting of individual factors and sleep scores at Interview one.
Headaches
For headache scores, the slope coefficient for the six month reference was closest to 1.0 at .92, compared to −.01 for the past week, and −.33 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting headaches (see Table 2). The within variance of the distribution residuals was 8.93 and the between variance of distribution residuals was 22.72. The ICC score was calculated as .72, suggesting that 72% of the variance in predicting headache scores at Interview two is explained by the nesting of individual factors and headache scores at Interview one.
Memory Problems
For memory scores, the slope coefficient for the six month reference was closest to 1.0 at .38, compared to −.09 for the past week, and .02 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting memory problems (see Table 2). For memory scores, the within variance of the distribution residuals was 5.65 and the between variance of distribution residuals was 41.93. The ICC score was calculated as .88, suggesting that 88% of the variance in predicting memory scores at Interview two is explained by the nesting of individual factors and memory scores at Interview one.
Concentration Problems
For concentration scores, the slope coefficient for the six month reference was closest to 1.0 at .42, compared to −.18 for the past week, and .06 for the past month, suggesting that 6 months was the optimal time frame for reliably reporting concentration problems (see Table 2). The within variance of the distribution residuals was 6.91 and the between variance of distribution residuals was 34.45. The ICC score was calculated as .83, suggesting that 83% of the variance in predicting concentration scores at Interview two is explained by the nesting of individual factors and concentration scores at Interview one.
Discussion
The present study serves as a preliminary investigation of the test -retest reliability of CFS symptom reporting across three different recall time frames (past week, past month, and past six months) and at two assessment points (one week between each assessment point). Consistent with our hypothesis, results of this study suggested that the optimal time frame for reliably reporting CFS symptoms was six months for all symptoms except for joint pain, which had an optimal time frame of one month. While some past literature shows a reduction in reporting accuracy for pain and fatigue as length of recall time frame increases (Broderick et al., 2008; Friedberg & Sohl, 2008; Sohl & Friedberg, 2008), the results of this study showed that longer time frames might actually improve reliability in the context of CFS. This finding is consistent with a previous finding by Hawk et al. (2007), which revealed that CFS case-defining symptoms were recalled with good reliability using a six month time frame.
Individuals with chronic illnesses may have a good grasp of their symptom pattern over time (Broderick et al., 2008), which may at least partially explain why individuals in this study were able to reliably make a global assessment of their symptoms at the six month time frame. People afflicted with a chronic illness such as CFS may be more reliable in making a broad and global estimate of their symptoms over a longer time frame because a short time frame may be more susceptible to small changes that deviate from the overall symptom pattern. Clarke, Fiebig, and Gerdtham (2008) assert that there is a trade-off between reporting accuracy and loss of information when deciding between a shorter or longer recall time frame. More work is needed in this area in order to determine if the six month time frame is optimal in understanding the experience of CFS symptoms.
There are potential limitations present in the current study. For instance, there is potential for the “adjustment and anchoring” heuristic (Tversky & Kahneman, 1974) to influence recall reliability across the time frames. The adjustment and anchoring effect explains how people take information that they know and use that information as an anchor to help estimate information that they do not know. It is possible that this anchoring effect was present in the current study; however, in an attempt to control this effect, the time frames were spaced out so that symptom ratings were not organized by symptom groupings but rather by time frame groupings. Additionally, the sample was not selected through random assignment. A large majority of the participants identified as White women and middle aged; however, Jason et al. (1999) found evidence suggesting that CFS occurs at higher rates in African American and Latino samples. Additionally, the data used in the analyses for this study were significantly and positively skewed, and therefore were not normally distributed. Data that are positively skewed are common within CFS due to the severity of symptoms experienced by this illness population. The purpose of this study was solely to determine which time frame was optimal based on the extent to which slope coefficients approached 1.0. The goal of this study was not to determine whether there were significant differences across time frames, which would have been more susceptible to positively skewed data.
Another limitation of this study was the inability to control for the potential influence of severe cognitive difficulties and other symptoms that may impact cognitive functioning (e.g., sleep disturbance) on the reliability coefficients. Unfortunately, the size of the data set was not large enough to both control for cognitive difficulties and retain necessary statistical power. It is recommended that future studies examine the potential impact of severe cognitive difficulties on the reliability of symptom reporting across varying time frame lengths, especially since cognitive difficulties are considered a key symptom of this illness.
While the six month time frame was found to be optimal for reliably reporting CFS symptoms, it is possible that the symptom reports were inflated due to the peak effects that are common with longer recall time frames; thus, impacting the overall validity of the measure (Stone et al., 2005). Therefore, it is recommended that additional research in this area explore the possible trade-off between reduced reporting accuracy and the potential to gain more information about a phenomenon using longer time frames. One way to assess the validity of the longer, six month time frame might be to compare the degree of convergent validity that symptom scores measured at longer time frames have with other diagnostic measures. Future research might assess the degree to which symptom ratings at each time frame correlate with or predict measures of substantial reductions in functioning. Future research could also incorporate paper daily diaries, electronic diaries, ecological momentary assessment (EMA) techniques, and a combination of prospective and retrospective survey methods (McColl, 2004; Stull, Leidy, Parasuraman, & Chassany, 2009) to test the validity of time frame recall. For instance, Yoshiuchi et al. (2007) used an EMA method to monitor the time frame in which symptoms develop in individuals with CFS following exercise. Others have also used EMA to assess momentary and retrospective reporting of CFS symptom intensity (Friedberg & Sohl, 2008; Sohl & Friedberg, 2008). In addition to understanding the validity of various recall time frames, it may be conducive to understand additional contextual factors that influence the reliability of symptom reporting, such as recent stressful life events, symptom stability, social support, and one’s stage/progression of the illness.
Overall, findings from the present study revealed that time frame does influence the reliability of reporting CFS symptoms. Furthermore, results showed that in general, individuals with this illness are capable of reliably recalling the frequency and severity of their symptoms over longer time frames (e.g., six months), which is contrary to what might be expected based on literature documenting reduced accuracy of reports using longer time frames. It will be important for researchers who are interested in the assessment of CFS to take time frame and additional contextual factors into account, especially if the intended goal of the research is to standardize and improve the methods used to reliably and accurately diagnose this complex illness. Accurate and reliable assessment is a crucial first step in understanding and treating this debilitating and often misunderstood illness.
Footnotes
Authors’ Note
The authors appreciate the statistical consultation received from Steven Miller, PhD.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors appreciate the financial assistance provided by the National Institute of Allergy and Infectious Diseases (grant number AI055735).
