Abstract
Abstract
Background:
Symptom cluster analysis is a new frontier of research in symptom management. This study clustered patients by their symptom profiles to identify subgroups that may be at higher risk for poor quality of life (QOL) and that may, therefore, benefit most from targeted interventions.
Methods:
Longitudinal study of metastatic cancer patients using the Edmonton Symptom Assessment Scale (ESAS). We generated two-, three-, and four-cluster subgroups and examined the relationship of cluster membership with patient outcomes. To address the problem of missing longitudinal data, we developed a novel outcome variable (QualTime) that measures both QOL and time in study.
Results:
Two hundred and twenty-one patients with a mean Palliative Performance Scale (PPS) of 59.1 were enrolled. The three-cluster model was chosen for further analysis. The low-burden subgroup had all low severity symptom scores. The intermediate subgroup separates from the low-burden group on the “debility” profile of fatigue, drowsiness, appetite, and well-being. The high-burden group separates from the intermediate-burden group on pain, depression, and anxiety. At baseline, PPS (p=0.0003) and cluster membership (p<0.0001) contributed significantly to global QOL. In univariate analysis, cluster membership was related to the longitudinal outcome, QualTime. In a multivariate model, the relationship of PPS to QualTime was still significant (p=0.0002), but subgroup membership was no longer significant (p=0.1009).
Conclusion:
PPS is a stronger predictor of the longitudinal variable than cluster subgroups; however, cluster subgroups provide a target for clinical interventions that may improve QOL.
Introduction
Co-occurring symptoms have long been used for diagnosis of psychological disorders, but this idea is relatively new to cancer care. It has been proposed that two or more symptoms may compose a cluster. 7 Depending on the research question, two directions in symptom cluster research are described in the literature. 8 If the purpose is to determine which symptoms relate to each other in order to elucidate common etiologic pathways, then the analytic approach is to group symptoms. If, on the other hand, the purpose is to identify groups of patients with similar symptom profiles that may be at higher risk for poor outcomes, the approach is to group patients (rather than symptoms). The latter is the focus of this article, in which we sought to answer the question: are there subgroups of patients clustered by their symptom profiles that have a higher risk of poor QOL and performance status? The primary objective of this longitudinal study was to test the temporal relationship of patients grouped by their symptom profiles at baseline to patient outcomes. We used a novel analytic approach to handling longitudinal data in an advanced cancer population with significant rates of noncompletion and attrition. Because previous studies have found performance status to be related to QOL and predictive of survival,9,10 a secondary objective was to determine if grouping by symptom cluster profiles adds predictive value to performance status measurement.
Methods
This descriptive 12-week longitudinal study surveyed a convenience sample of patients over a period of 12 months from five settings spanning outpatient radiotherapy, inpatient palliative care, and home palliative care. Inclusion criteria included age over 18, documented metastatic cancer (pathologic diagnosis or imaging studies), ability to complete questionnaires in English, and ability to provide informed consent. The study was approved by the institutional research ethics boards of Mount Sinai and Sunnybrook Hospitals in Toronto, Canada.
Procedures
Questionnaires were administered in person at study entry (week 0). Thereafter, they were administered either by telephone or in person (weeks 2, 4, 8, and 12). If necessary, a proxy provided information for performance status, but not for symptoms or QOL.
Measures
The Edmonton Symptom Assessment Scale (ESAS) was designed to measure symptom severity in a palliative care setting. 11 It is a nine-item self-report numerical rating scale (0 to 10), with 10 representing the worst possible experience of the symptom.12,13 The most consistent symptom cutoff recommendation is that a score <4 is considered low severity, 4–6 represents moderate severity, and ≥7 is high severity.14,15
The 15-item European Organization for Research and Treatment of Cancer Quality of Life Group Questionnaire (EORTC QLQ-C15-PAL) was developed for a palliative care population. 16 The items on this questionnaire measure physical and emotional function, 10 common symptoms, and global QOL. The global QOL (QLQ-15), the outcome variable in this study, is a single item that ranges from 1 (very poor) to 7 (excellent). 16
The Palliative Performance Scale (PPS) is widely used in palliative care settings 17 and has been shown to be a robust prognostic indicator. 18
Statistical analysis
Descriptive statistics were performed. We clustered patients around their symptom profiles using a procedure in the statistical software SAS (Proc FASTCLUS), which uses Euclidean distances as a measure of similarity. The cluster centers are based on least-squares estimation or nearest centroid sorting. 19 Although there are a number of statistical stopping rules for determining the number of clusters in the final solution, using clinical context is an equally valid approach.20,21 In the absence of standards for analytic choices in cluster analysis and acknowledging that the choice of final cluster solution is inherently subjective,20,21 we used clinical judgment and relationship to QOL outcomes to arrive at a practical final cluster solution (detailed in the Results section). Despite limitations to this method, cluster analysis was the appropriate technique for our research question. 22
One-way analysis of variance (ANOVA) and χ2 analyses were used to determine if significant differences existed among the cluster subgroups in their demographic and disease characteristics and in their outcomes (PPS and QOL). Post-hoc contrasts were done using the Tukey method, which is a single-step multiple comparison procedure used in conjunction with an ANOVA to find which means are significantly different from one another. Tukey's test corrects for the greater probability of making a type I error with multiple comparisons.
Because these data are from patients with metastatic cancer at different points in their trajectory with rates of attrition and noncompletion that are typical for this patient population, we developed a novel analytic approach to the longitudinal data. To address the problems of meaningful anchors for the time points of measurement and missing data that are likely nonrandom, a variable was created (QualTime) that combines QOL and the time that the patient was in study. An algorithm was applied to extrapolate missing QOL data, which allowed the calculation of this variable for almost all patients (described in the Results section).
We performed an analysis of covariance (ANCOVA) using the baseline global QOL (QLQ-15) item from the EORTC QLQ-C15-PAL as the dependent variable with cluster subgroup membership and PPS as covariates. A regression model was run with the dependent variable QualTime. Cluster profile group assignment and PPS were included in the model. Other covariates found in univariate analyses to be potentially relevant to the model were tested. Analyses were run using SAS 9.2 (Windows version, SAS Institute Inc., Cary, NC).
Results
A total of 221 patients were enrolled (Table 1). Lung cancer was the most common diagnosis (24%), followed by breast and colorectal cancers. Bone was the most common site of metastases (45.2%). The mean PPS at baseline was 59.1.
SD, standard deviation.
Severity of symptoms, global QOL and PPS did not vary significantly over time, although this must be interpreted in light of changes in composition of the sample at each week of data collection (Table 2). Of the 221 patients enrolled, 52 (23.6%) completed each data collection point, whereas, the remaining subjects were missing at least one collection point due to death, illness, or inability to complete the questionnaires. During the 12 weeks of data collection, 51 (21.3%) patients died.
n varies among the 9 symptoms because of missing data.
ESAS, Edmonton Symptom Assessment Scale; QLQ-15, global quality-of-life item from European Organization for Research and Treatment of Cancer Quality of Life Group Questionnaire; PPS, Palliative Performance Scale; SD, standard deviation.
Cluster analysis
The cluster analysis yielded patient groupings along two, three, and four clusters (Table 3). In the two-cluster solution, the mean scores of all the symptoms are worse in Cluster 2-1 than in Cluster 2-2. One group had moderate to high scores on 7 of 9 ESAS symptoms, whereas the other had low scores on all 9 symptoms. Tiredness, pain, depression, anxiety, drowsiness, appetite, and well-being separated the two groups.
Low severity: <4, moderate severity: ≥4 and <7, high severity ≥7.
In the three-cluster solution, Cluster 3-3 is similar to Cluster 2-2 in that all the symptoms are low severity. Although the composition of the groups changes, for the most part Clusters 3-1 and 3-2 seem to split Cluster 2-1. Leaving out nausea and shortness of breath, which do not vary by category over the three clusters, these groups could be described as:
3-1: HIGH-tired; MODERATE-drowsy, appetite, well-being, pain, depression, anxiety. 3-2: HIGH-tired; MODERATE-drowsy, appetite, well-being; LOW-pain, depression, anxiety. 3-3: LOW-tired, drowsiness, appetite, well-being, pain, depression, anxiety.
The four-cluster solution defines a group that differs on anxiety, well-being, and shortness of breath, but it does not distinguish between outcomes more than the three-cluster solution. The group with all low-severity symptoms is in Cluster 4-4 and is almost unchanged from Cluster 3-3. A group similar in profile to Cluster 3-1 appears in Cluster 4-1. However, Cluster 4-1 separates from Cluster 4-2 on pain, depression, anxiety as before, but what is new is that well-being and shortness of breath also split the groups. Cluster 4-2 separates form Cluster 4-3 on shortness of breath alone.
In this article, we present the next steps of analysis using the three-cluster solution, based on clinical reasoning and our analysis as follows. Although the two-cluster solution yields the most distinct groupings, it is unlikely to provide any more insight than what is clinically intuitive, that is, people with low-severity symptoms are likely to do better than people with more severe symptoms. The three-cluster solution helps to better define a group that is intermediate in symptom burden, still provides adequate sample size in each category for multivariate analyses, and appeared to differentiate between QOL outcomes. The four-cluster solution did not add to the ability to differentiate between different QOL outcomes. Furthermore, a three-cluster system of categorization is sufficiently parsimonious to make it practical for screening in clinical practice.
Patient characteristics by cluster subgroup
There were no differences in any of the demographic or disease characteristics across the subgroups except for radiotherapy program (p<0.0001) and bone metastases (p=0.0054). (See Table 4.) PPS was distributed differently across the cluster subgroups (p=0.0009) with post-hoc contrast analysis showing that the all-low severity cluster (3-3) had significantly higher PPS scores (63.6±17.9) than the other two subgroups. On the global QOL score, the subgroups varied significantly (p=0.0003), and this difference was due mostly to the contrast between the highest symptom burden group (3-1) and the lowest symptom burden group (3-3), although a consistent trend is observed in the intermediate group (3-2).
Tukey's test is a single-step multiple comparison procedure and statistical test for a difference in means, which corrects for experiment-wise Type I error rate.
Trend toward significance therefore controlled for in multivariate analyses.
Significance at p<0.05 is bold.
Baseline relationship of subgroups to PPS and QOL
We performed an ANCOVA using global QOL item (QLQ-15) at baseline as the dependent variable, and baseline PPS and cluster assignment number as covariates. The result shows that PPS (p=0.0003) and cluster membership (p<0.0001) both contribute significantly to global QOL at baseline. A higher PPS score is associated with higher global QOL. Higher cluster assignment number, meaning less severe symptoms, is associated with higher global QOL. Modeling was performed for all the potential confounding variables including diagnoses. However, as these were not found to be significant, they were not included in the final model. Specifically, controlling for the possible confounding effects of rapid radiotherapy program, diagnosis of lung cancer, and bone metastases did not alter the subgroup differences in performance status and global QOL.
Relationship of subgroups to QOL over time in study
Calculation of longitudinal variable of QualTime
Due to attrition and incomplete data, there were different subsets of the baseline group at each time point making it difficult to compare cluster assignments over time. Instead, we posed the following questions:
1) Does cluster subgroup at baseline predict QOL over time? 2) Does cluster subgroup at baseline predict QOL over time independent of PPS?
To measure QOL for up to 12 weeks of the study's duration, we developed an outcome variable (QualTime). For each patient, we made a plot of global QOL (QLQ-15) versus weeks in study and calculated the area under the curve. For example, two hypothetical curves for patient X are presented in Figs. 1 and 2.

Example of calculation of QualTime for patient 4 if patient completed 12 weeks of study.

Global quality of life (QLQ-15) vs. time (weeks) in study.
To calculate the area under the curve, we multiplied the weeks at each data collection by the associated QLQ-15 scores for those weeks, and summed those products.
So a patient could have a lower QualTime outcome by having lower QLQ-15 scores or by leaving the study early. For each patient, we examined in detail reasons for noncompletion and, when available, the related QLQ-15 scores. We then extrapolated values for QLQ-15 for missing data points, by using the algorithm in Table 5. We extrapolated only once for each reason given in the data. According to the algorithm, if patient X had incomplete data at week 8 because of illness, we extrapolated 2.5 for QLQ-15 at week 8. If the reason for missing data was “can't contact” at week 12, we continued that value at week 12. However, if there was no entry for week 12, we stopped the calculation of QualTime at week 8. In this way, we were able to calculate QualTime for all patients except two for whom the algorithm was unable to be applied due to missing a reason for noncompletion. QualTime and last-recorded PPS were significantly correlated (Pearson's correlation=0.3264, p<0.0001, n=216).
QLQ-15, global quality-of-life item from European Organization for Research and Treatment of Cancer Quality of Life Group Questionnaire; QOL, quality of life.
Univariate analysis of subgroup membership and QualTime
In the analyses that follow, we used QualTime as a longitudinal outcome capturing QOL for a period up to 12 weeks of the study's duration. Using an ANOVA, patients in Cluster 3-1 have a significantly lower QualTime than patients in Cluster 3-3 (p=0.0294), with a trend toward a lower QualTime than patients in Cluster 3-2.
Regression model of relationship of subgroups, PPS and QualTime
The cluster subgroups are based on numerical severity scores on the ESAS and can be ordered. We used cluster subgroup membership as a covariate in linear regression, which showed a significant relationship between subgroup number and QualTime (p=0.0098). The overall results of the regression model are F[2, 216]=3.44, p=0.0340. The separate regression of QualTime on PPS was also significant (F[1, 211]=16.71, p<0.0001). However, if both PPS and cluster subgroup number were included in the model, PPS remained significant (p=0.0002), but subgroup membership did not (p=0.1009; for overall model F[3, 209]=6.96, p=0.0002).
Discussion
Summary of main findings in context of previous findings
In this study, patients were clustered by their symptom profiles and a subgroup at higher risk for poor QOL was identified. To date, there has been little consistency in the composition of reported clusters. 23 In 2004, Gift and colleagues examined the impact of insomnia, pain, and fatigue on elderly patients newly diagnosed with cancer and found that the presence of any of the three was associated with lower functioning at 6 to 8 weeks postdiagnosis. 24 Miaskowski and colleagues also examined this trio of symptoms along with dyspnea, 25 but in both studies, the symptoms were identified a priori. In contrast, we used a wider inventory of symptoms without an a priori assumption of the symptoms that may be relevant to the patient groupings.
In 221 patients with metastatic cancer we identified two-cluster, three-cluster, and four-cluster models based on symptom severity. Of these, the three-cluster model presented the greatest potential for clinical utility. It suggests a group of patients that at baseline had generally low symptom burden (Cluster 3-3), and two further groups (Clusters 3-2 and 3-1) with increasing burden.
The highest burden group, (Cluster 3-1), separates from Cluster 3-2 on pain, anxiety, and depression. A possible clinical explanation is that the appearance of pain on the backdrop of a significant “debility” profile of fatigue, anorexia, and well-being contributes to psychological distress reflected in high ratings for anxiety and depression. This hypothesis corresponds with a recent national endorsement of the depression, anxiety, and well-being ESAS items combining to serve as an appropriate screen for distress levels. 26 In addition, previous work has identified that the highest levels of distress accompany the symptoms of pain, dyspnea, and fatigue. 18 Aggressively treating pain may also help manage distress, although this has yet to be systematically examined.
This three-cluster model that separates patients by their pain, anxiety, and depression scores on the background of the debility profile, supports previous findings of Miaskowski and colleagues, 25 Dodd and colleagues, 27 and Pud and colleagues. 28 Each of these studies looked at predetermined sets of symptoms of fatigue/sleep disturbance/pain/depression and showed that QOL was higher in the patient cluster that ranked all four “low” and significantly lower in the group that ranked all as “high.” Furthermore, across all three studies, performance status was significantly better in the low score cluster as compared with other clusters. Similarly, our data showed a significant association between cluster grouping, QOL and PPS, with Cluster 3-1 (i.e., high-burden profile) having the lowest QOL. The previous studies were in oncology patients receiving active treatment25,28 and breast cancer patients receiving chemotherapy. 27 Only one of the studies was longitudinal, showing that QOL and functional status were associated with subgroup membership at each of three time points during and after chemotherapy for breast cancer with varying subgroup membership at each time point. 27 Our study is longitudinal, in advanced cancer patients, and expands the number of symptoms used for generating patient groupings. Consistent with the earlier study, varying subgroup membership at different time points was a problem in our study, so we chose to examine the predictive value of baseline subgroup membership with a new longitudinal variable incorporating QOL over time.
Research including a temporal component is problematic in patients with advanced cancer who have high rates of attrition and missing data. The novel QualTime variable circumvents the challenge of determining clinically meaningful anchors for data collection and may be used in future studies with similar data. Clinical validity of this variable is supported by its significant correlation with the last performance status recorded (Pearson's correlation=0.3264, p<0.0001, n=216).
Limitations
This is a descriptive longitudinal study and therefore causality cannot be established. The longitudinal data were both a strength of this study and an analytic challenge. Despite procedures to optimize data completion, including the option of face-to-face visits in the home, rates of attrition and missing data were characteristic of this population. To minimize the burden of data collection in this advanced cancer population, a limited number of covariates were collected, and therefore additional variables may be important to control for in the analyses performed.
Implications for research and practice
The study found that cluster subgroup membership is associated with QOL over the period of the study, but that relationship is no longer significant when performance status is added to the model. That is, baseline performance status is a stronger predictor of the longitudinal QOL variable than cluster subgroups. This finding begs the question, “Why study cluster subgroups?” Unlike performance status, cluster subgroups provide a target for clinical interventions that may improve QOL. Therefore, in addition to performance status, optimal assessment should include an inventory of symptoms in order to identify subgroups of patients with higher-risk symptom profiles. This study adds to the limited experience with the analytic approach of clustering patients around their symptom profiles.
Footnotes
Acknowledgments
David F. Andrews (Professor, Statistics, University of Toronto), Erica Moran, Christopher Obwanga, Jennifer Jones, Kalli Stilos, Margaret Bennett, Tracey Das Gupta and the Rose and Arthur Brooks Memorial Fund, Windfields Farms Clinical Research Unit.
Author Disclosure Statement
No competing financial interests exist.
