Abstract
Background:
The Ability to Perform Physical Activities of Daily Living Questionnaire (APPADL) measures the self-reported ability of individuals with type 2 diabetes mellitus (T2DM) and obesity to perform daily physical activities. The primary objective of this study was to estimate APPADL test–retest reliability, responsiveness, and minimal important change (MIC).
Subjects and Methods:
Study participants were individuals with T2DM and body mass index ≥30 kg/m2 enrolled in clinical weight loss programs in the United States. Data were obtained for clinical measures, APPADL, and other patient-reported instruments. APPADL test–retest reliability was estimated with intraclass correlation coefficient. To estimate responsiveness in a subgroup of participants, baseline and 6-month data were analyzed using paired t test and calculation of responsiveness indices (e.g., effect size [ES]). To estimate MIC, both distribution-based and anchor-based methods were used.
Results:
Test–retest data for 106 study participants (mean age, 52 years; 69% female; 31% white; mean body mass index, 38 kg/m2) yielded an intraclass correlation coefficient of 0.91. In the subgroup (n=40) used to estimate responsiveness, weight was significantly less at end point than at baseline (mean, 222.0 vs. 231.9 pounds; P<0.001, ES=0.24), and APPADL scores were significantly better than at baseline (mean, 77.0 vs. 70.8; P=0.01, ES=0.32). Results of distribution- and anchor-based methods to establish MIC suggest values of 6–14 points (0–100 scale).
Conclusions:
The APPADL has demonstrated reliability and validity. In addition, it has demonstrated responsiveness to weight loss in individuals with T2DM and obesity, thereby making it a potentially valuable tool in the evaluation of weight loss interventions (e.g., antihyperglycemic medications that produce weight loss) targeted toward patients with T2DM.
Introduction
Although the preliminary validation of the APPADL lent support for its validity as a measure of self-reported performance of physical activities of daily living, additional evidence is needed if the APPADL is to be used as a PRO end point in the evaluation of weight loss interventions targeted toward individuals with T2DM. It must demonstrate the ability to detect significant change in the individuals' self-reported ability to perform physical activities when significant weight loss occurs. 3 Prior to that, however, there must be evidence that there is insignificant intrasubject variability in the measure when no weight loss occurs (i.e., test–retest reliability). 3 Finally, if the APPADL is responsive to change in self-reported ability to perform physical activities as a result of weight loss, it would be important to know the change in APPADL scores that is consistent with patient-perceived and/or clinical benefit (i.e., minimal important change [MIC]) 4,5 rather than simply a change that is statistically significant. Because the previously published study conducted by Hayes et al. 1 was a cross-sectional study, it was not possible to obtain any of this essential information. Therefore, the primary objective of this prospective study was to estimate APPADL test–retest reliability, responsiveness, and MIC in a sample of individuals with T2DM and obesity.
Research Design and Methods
Study participants
Study participants were recruited primarily from four sites in the United States. Three of the four sites were managed by a company that is no longer in operation but offered a holistic approach to weight loss. The fourth site is a university-based site that was testing a standard behavioral weight loss program plus a portion-controlled diet (Nutrisystem®, Horsham, PA) for weight loss versus a diabetes support and education program. In addition, a few study participants were recruited from a specialty clinic in which all physicians are board-certified in Endocrinology and Metabolism and provide diabetes management training or from a weight loss program that offered behavioral and lifestyle modification in addition to an oral serotonin supplement. Eligibility criteria included (1) actively seeking or currently engaged in a weight loss intervention, (2) diagnosed with T2DM at least 6 months prior to screening, (3) 25–65 years of age, and (4) BMI ≥30 kg/m2. Although the APPADL was developed for a target population of patients with T2DM and BMI of 30–40 kg/m2, no upper limit of BMI was imposed to facilitate recruitment. In addition, no criteria were imposed as to when a participant began or how long a participant had been involved in a weight loss program. Sample size was targeted at 300 patients for robust psychometric analyses.
Procedure
At baseline (Visit 1), site coordinators provided study participants with a study packet that included a sociodemographic form, the APPADL, and other PRO instruments. Site coordinators captured clinical data including height, weight, co-morbidities, diabetes medications, and verification of T2DM (glucose reading or confirmation of prescription for diabetes medication) on a standardized form. At Visit 2, approximately 5 days postbaseline, participants were administered the APPADL. Six months postbaseline (Visit 3), participants were administered the APPADL, the PRO instruments administered at Visit 1, and a global impression of change question. Clinical variables (i.e., weight) were also captured at Visit 3. Study approval was obtained from the Independent Investigational Review Board, Inc.
PRO instruments
The packet administered to participants at Visits 1 and 3 included, along with the APPADL, the Short-Form 36 Version 2.0 (SF-36), 6 Weight-Related Symptom Measure (WRSM), 7,8 and the Obesity and Weight Loss-Quality of Life Measure (OWQOL). 7,8 The seven-item APPADL is designed to assess self-reported ability to perform physical activities of daily living. 1 The SF-36 consists of eight domains, including a physical function domain, and is designed to provide a general self-evaluation of health status. 6 The WRSM is designed to assess existence and bothersomeness of 20 symptoms related to obesity, and the 17-item OWQOL is designed to assess the impact of obesity or weight loss on aspects of quality of life. 7,8 The SF-36, WRSM, and OWQOL were administered for comparison purposes (see Statistical analysis below). Additional details of PRO instruments administered are found in Table 1. Finally, at Visit 3, participants were asked to respond to a global question pertaining to change in ability to perform physical activities since the last visit on a 3-point scale from “worse” to “better.”
APPADL, Ability to Perform Physical Activities of Daily Living; OWLQOL, Obesity and Weight Loss-Quality of Life Measure; WRSM, Weight-Related Symptom Measure.
Statistical analysis
Descriptive statistics (e.g., mean and SD) were calculated for demographics, clinical variables, and each PRO instrument's domain and total scores. Total and domain scores were reversed-scored when necessary so that higher scores for all PRO measures corresponded to better outcome (e.g., greater ability to perform physical activities of daily living, greater health status). Subsequently, all total scores were linearly transformed to a 0–100 scale for easier comparison among instruments. However, APPADL individual item scores are reported as original scores (1–5) to more easily compare item statistics with those in the previous study conducted by Hayes et al. 1
APPADL test–retest reliability was estimated with intraclass correlation coefficient (absolute agreement two-way random effects model) 9 using data from Visits 1 and 2. APPADL internal consistency reliability was estimated with coefficient α 10 using data from Visits 1, 2, and 3. Reliability coefficients of 0.80 were considered satisfactory.
To establish whether there was a significant weight loss in the subgroup of participants followed prospectively, a paired t test was performed using Visits 1 and 3 data. To estimate APPADL responsiveness, a paired t test was performed using Visits 1 and 3 data. The common responsiveness indices of effect size (ES) (mean change in score/SD of baseline scores) and standardized response mean (SRM) (mean change in score/SD of mean change score) were calculated. 11,12 To compare APPADL responsiveness with change in clinical measures (weight, BMI) and the responsiveness of other PRO instruments that have been used as end points in clinical trials of pharmaceutical and commercial weight loss interventions (e.g., SF-36 domains), 8,13,14 similar analyses were performed using Visits 1 and 3 data for the other measures.
To estimate MIC, Crosby et al. 15 recommend an integrated approach that includes both distribution- and anchor-based methods. Distribution-based methods use a statistical property (e.g., SD) of the sample or PRO measure to estimate MIC, while anchor-based methods compare changes in the PRO measure with, for example, patient impression of change or other clinically relevant variables. For this study, distribution-based methods were 0.5 SD and 1 SE of measurement (1 SEM [square root of 1 minus reliability multiplied by baseline SD]). These two statistics have been shown to be good approximations of MIC determined by other methods. 4,5,16,17
Participants' responses to the global question pertaining to change in ability to perform daily physical activities since the last visit were dichotomized into two categories: “better” and “no/change worsening.” Percentage of body weight loss was computed by dividing weight change by Visit 1 weight. Because a 5% body weight loss is considered to be both important to individuals with T2DM and obesity 2 and clinically beneficial, 18,19 participants were divided into those who achieved a 5% or greater weight loss from Visit 1 to Visit 3 and those who did not. To provide an anchor-based method to determining an MIC, the difference in the APPADL mean change score for those participants who reported having better ability to perform physical activities and those who reported no change/worsening was calculated. The difference in the APPADL mean change score for those who achieved 5% or more weight loss and those who achieved less than 5% was also calculated. Independent t tests were also performed to establish the statistical significance of the observed difference in means for both group comparisons (global impression of change and percentage weight loss). The α level for all analyses was set at P<0.05.
Results
Study participants
Because of recruitment issues, including the ceasing of operation of the business that managed three study sites, the study was terminated at the recruitment of 119 participants. Study participants (n=106) with complete data for Visits 1 and 2 were included in the test–retest analyses. These participants were mostly African-American (54%), female (69%), middle-aged (mean [SD]=52 [10] years), and moderately to severely obese (BMI mean [SD], 38 [6] kg/m2) (Table 2). Complete data needed for the responsiveness analysis (Visits 1 and 3 data) were only available for 40 participants from the university-based study site. With the exception of one individual who did not participate in the test–retest study, these 40 participants were a subgroup of the participants for the test–retest analyses. Characteristics of the subgroup were similar to those of the larger test–retest group with the exception that the responsiveness group was more racially diverse (31% white vs. 15% white) and were recruited from one clinical site.
Data are mean (SD) values or percentages.
Subgroup (n=40) of test–retest study participants (n=106).
All scores have been transformed to a scale from 0 to 100, with higher scores corresponding to better outcome.
APPADL, Ability to Perform Physical Activities of Daily Living; OWLQOL, Obesity and Weight Loss-Quality of Life Measure; SF-36, Short-Form 36; WRSM, Weight-Related Symptom Measure.
Item statistics, internal consistency, and test–retest reliability
Item statistics for APPADL at Visit 1 indicated that most participants reported moderate difficulty (APPADL item score, approximately 2.5–3.5 on a 1–5 scale) in performing the seven physical activities. Ceiling effects (percentage of participants responding “not at all difficult”) were less than 25% (Table 3). Cronbach's α coefficients calculated for Visits 1, 2, and 3 APPADL administrations were all ≥0.89. Test–retest reliability for Visits 1 and 2 was 0.91.
Visit 1 Cronbach's α=0.93 (n=106), 0.89 (n=40); Visit 2 Cronbach's α=0.94 (n=106); test–retest reliability (Visit 1 and 2)=0.91 (n=106); and Visit 3 Cronbach's α=0.93 (n=40).
APPADL, Ability to Perform Physical Activities of Daily Living Questionnaire.
Responsiveness
A significant difference was found in participants' mean weight between Visits 1 and 3 (231.9 vs. 222.0 pounds; P<0.001) (Table 4) with an average percentage body weight loss equaling 4%. A significant difference was also found from Visits 1 and 3 in mean BMI (37.0 vs. 35.4 kg/m2; P<0.001), mean APPADL scores (70.8 vs. 77.0; P=0.01), mean SF-36 Vitality scores (59.3 vs. 66.2; P=0.04), and mean OWQOL scores (64.8 vs. 72.2; P=0.01). No significant differences in any of the other PRO instruments were observed. The ES for change in weight was 0.24, and that for change in BMI was 0.31. The values of ES for change in APPADL, SF-36 Vitality scores, and OWLQOL scores were 0.32, 0.38, and 0.31, respectively. The SRMs for change in APPADL, SF-36 Vitality scores, and OWLQOL scores were 0.42, 0.34, and 0.42, respectively.
Mean change is calculated by subtracting Visit 1 value from Visit 3 value.
P<0.05, ** P≤0.01, *** P≤0.001.
APPADL, Ability to Perform Physical Activities of Daily Living Questionnaire; BMI, body mass index; ES, effect size; OWQOL, Obesity and Weight-Loss Quality of Life Measure; SF-36, Short Form-36; SRM, standardized response mean; WRSM, Weight-Related Symptom Measure.
MIC
The two distribution-based methods used to estimate MIC in APPADL scores were 0.5 SD and 1 SEM. These were calculated as 9.6 and 6.3, respectively. MIC was also estimated by “anchoring” APPADL scores to weight loss and patient global impression of improvement. The APPADL scores for those participants who achieved a 5% or more weight loss (n=12) from baseline to end point was 15.5 compared with 1.9 (P=0.01) for those individuals who did not lose at least 5% (n=27), suggesting an MIC of 13.6. The APPADL scores for those participants who reported having better (n=20) ability to perform daily physical activities since the last visit was 11.1 compared with 1.3 (P=0.03) for those who reported no change/worsening (n=20), suggesting an MIC of 9.8 (i.e., difference between 11.1 and 1.3).
Discussion
The overall goal of this study was to provide additional support for use of the APPADL as a potential secondary end point in trials of weight loss interventions targeted at individuals with T2DM and moderate to severe obesity. This could include those antihyperglycemic medications that produce weight loss. The first objective was to demonstrate acceptable item statistics and test–retest reliability with the APPADL measure. Item statistics showed that, in general, the racially diverse and moderate to severely obese sample in the present study reported the ability to do most of the physical activities of daily living with moderate difficulty and internal consistency of the items was quite high. These item statistics were similar to those observed by Hayes et al. 1 in a sample of primarily white individuals with T2DM and moderate obesity. Test–retest reliability was acceptable and comparable to the test–retest coefficients reported for generic physical function measures (e.g., SF-36 Physical Function Domain 20 ) and obesity-specific physical function measures (e.g., Impact of Weight on Quality of Life-Lite Physical Function Domain 15 ).
The previous study by Hayes et al. 1 showed an association between APPADL scores and change in weight as calculated from participants' reports of their current weight and their previous year's weight. This relationship was supported by significant differences in weight loss across APPADL groups (low, medium, and high scores). However, although the ability for scores to discriminate groups is suggestive of responsiveness, a true test of responsiveness requires a prospective study in which a true difference from baseline to end point occurs. Therefore, our second objective was to estimate responsiveness by testing the APPADL within a prospective observational study of weight loss interventions in individuals with T2DM and obesity. The change in weight from baseline to end point was statistically significant. The APPADL demonstrated its ability to detect that weight change by the statistically significant change in APPADL scores. The responsiveness indices (ES and SRM) for the APPADL mean change were relatively small, but the ES for weight change was also small. The correspondence between weight change and APPADL scores further supported the correspondence between weight loss and performing physical activities that was anticipated by individuals with T2DM and obesity in the qualitative study. 2
The SF-36 Vitality Scale and the OWLQOL measure also were responsive to the change in weight. Although these are important tools for measuring the patient perspective of weight loss, these measures have limitations for the support of labeling claims in the development of pharmaceutical interventions that produce weight loss. 21 The SF-36 Vitality Scale was designed to assess health status across different diseases and conditions 6 and was not developed with input from any target population such as individuals with T2DM and obesity. The OWLQOL includes quality of life items that measure possible negative psychological impacts of obesity (i.e., shame, frustration, and stress) that could be influenced by other factors (e.g., patient expectations and social support) besides weight. Moreover, as defined by the Food and Drug Administration, a quality of life measure contains “non health-related aspects of life, and because the term generally is accepted to mean what the patient thinks it is, it is too general and undefined to be considered appropriate for a medical product claim.” 21 The APPADL, in contrast, was developed according to regulatory guidance by obtaining input from the target patient population (i.e., individuals with T2DM and moderate obesity) in its development and limiting its conceptual framework to those concepts deemed important and relevant to weight loss in that target population.
The last objective was to suggest a change in the APPADL score that may be considered meaningful for purposes of interpretation. The distribution-based methods for determining MIC indicated an approximately 6–10-point change may be meaningful. However, anchor-based approaches, which were based on patient input or a weight loss change deemed meaningful by patients, suggested a larger MIC of 10–14 points. Thus, for sample size estimation aimed at showing statistical significance, 6 points may be adequate, 5 but to demonstrate change in which patients actually perceive a difference in their ability to perform physical activity and/or that corresponds to meaningful weight loss may be closer to 10 points on a 0–100 scale. It should be noted, however, that, as with other psychometric properties of instruments, the MIC is sample-specific. Additional research is needed to provide support for these estimates.
Limitations
The current sample was mostly African-American (54%) and therefore not representative of the general population of individuals with T2DM and obesity. However, given that Hayes et al. 1 conducted a preliminary validation of the APPADL in a primarily white sample, recruitment in this study was aimed at providing support for APPADL validation in a more racially diverse sample.
Inclusion criteria required study participants to be actively seeking or currently engaged in a weight loss intervention. Fontaine et al. 22 observed that obese populations seeking treatment are different than those who do not seek treatment because they have a tendency to report greater impairment, specifically in the areas of self-reported physical function. Therefore, the participants in this study may not be representative of all individuals with T2DM who are obese. Nevertheless, individuals who enroll in clinical trials of antihyperglycemic medications that also produce weight loss are, by virtue of their enrollment, seeking treatment and may be similarly impaired. Consequently, the results reported in this study may not be generalizable to the population of individuals with T2DM and obesity as a whole but are likely representative of the target population for clinical trials.
Another limitation, not of the study design, but potentially of the APPADL, is that it is a self-reported assessment of ability to perform physical activities as opposed to an assessment of physical performance as rated by trained professionals. Summary performance measures of, for example, lower extremity function (walking speed, timed chair stands, and standing balance) are useful because they have been shown to be predictive of disability. 23 However, these measures capture performance on physical tests that are not necessarily linked to daily physical activities. The APPADL asks respondents to provide their perception of their ability to perform various physical activities. These activities were identified by the target population (individuals with T2DM and moderate obesity) as relevant and important to them because they experience doing them, or attempting to do them, on a nearly daily basis. 2 Therefore, although physical performance tests are valued tools in research and geriatric assessment, especially as predictors of deterioration, 23 patients' self-reported improvement in daily physical activities may be more relevant as direct measure of weight loss treatment benefit. Moreover, the simplicity of the administration of a questionnaire over a physical performance measure also provides a clear advantage for use in clinical trials. The APPADL takes less than 5 min to complete, has a Flesch Kincaid reading level of 9th grade, and has been linguistically validated in several languages. The APPADL is publicly available and can be obtained by contacting the first author.
Conclusions
Per the Food and Drug Administration guidance for the use of PRO measures in medical product development, “Use of a PRO instrument [in clinical trials] is advised when measuring a concept best known by the patient or best measured from the patient perspective.” 21 , p.2 Although, it cannot be denied that clinical measures (i.e., weight loss) are the appropriate primary end points for weight loss interventions, the relevance of this weight loss to a patient can only be reported by the patient. A previous study 2 has shown that for individuals with T2DM and obesity, relevance of weight loss is linked to their perceived improvement in their ability to perform physical daily activities. Therefore, to fully evaluate the benefit of weight loss interventions, it would seem important to collect not only clinical data but also patients' perceptions of their physical function as well.
This study provided additional information on the psychometric properties of the APPADL. The data indicated acceptable test–retest reliability, responsiveness, and an MIC of approximately 10–13 points in a racially diverse sample. Thus, the APPADL has shown the potential to be a useful tool in the evaluation of weight loss interventions, including antihyperglycemic medications that produce weight loss, targeted at individuals with T2DM and moderate obesity.
Footnotes
Acknowledgments
This study was funded by Eli Lilly and Company. The authors would like to thank both Teri Tucker, a full-time employee of PharmaNet/i3, an inVentiv Health Company, and Michael Meldahl for their assistance in the preparation of this manuscript.
Author Disclosure Statement
All authors are full-time employees and shareholders of Eli Lilly and Company.
