Abstract
We scaled a measure of physical functioning to a population-based normative sample by extending self-reported basic and instrumental activities of daily living with items from the Medical Outcomes Study 12-Item Short Form Survey. We used item response theory to place items administered to a sample of older elective surgery patients on a common metric linked to the Patient Reported Outcomes Measurement Information System (PROMIS) normative sample using published data. The summary measure for physical functioning was internally consistent (Cronbach’s α = .83), reliable across a broad range of functioning, and was moderately correlated with walking speed (r = .52) and energy expenditure (r = .40). Demonstrating predictive criterion validity, less impaired scores were associated with lower risk of discharge to a rehabilitation facility (odds ratio = .38, 95% confidence interval [CI] [0.22, 0.66]) and shorter hospital stays (incidence rate ratio [IRR] = 0.87, 95% CI [0.79, 0.97]). Our approach may facilitate direct comparison of physical functioning measures across existing and future studies using a common, population-based metric, when overlapping items with the National Institutes of Health (NIH) PROMIS item bank are present.
Introduction
Physical functioning refers to the capacity to perform activities requiring physical ability (Cella et al., 2010; Rose et al., 2014). The construct encompasses activities related to independent functioning as well as more vigorous activities such as strength and endurance. Functioning with respect to self-care can be classified into basic activities of daily living (ADL) essential for self-care, such as bathing and grooming (Katz, Ford, Moskowitz, Jackson, & Jaffe, 1963), and more complex instrumental activities of daily living (IADL) needed to maintain independence in the environment, such as managing money and shopping (Lawton & Brody, 1969; Spector, Katz, Murphy, & Fulton, 1987). Functioning is commonly assessed among older adults and plays an important role in clinical decision making and differential diagnosis, determining service needs at the individual and community level (Tonner & Harrington, 2003), and as a key outcome in observational and intervention studies in aging populations. In the Patient Reported Outcomes Measurement Information System (PROMIS) initiative, physical functioning includes the ability to care for oneself as well as lower extremity, upper extremity, and central body activities (Rose, Bjorner, Becker, Fries, &Ware, 2008).
Summary measures of physical functioning based on both self-report and objective performance have been developed in community-living older adults as well as for more impaired samples (LaPlante, 2010; McHorney, 2002; Spector & Fleishman, 1998). Lawton and Brody (1969) proposed that IADL and ADL activities can be sorted along a continuum from the basic ADLs to more complex IADLs and social behaviors. In an analogous fashion, the Medical Outcomes Study (MOS) Short Form (SF) physical component score is designed to assess functional health and ability to perform tasks independently. Physical functioning can be measured by asking questions about difficulty or health limitations performing specific ADL and IADL tasks. For example, difficulty or limitations in grocery shopping might indicate a more severe disability than difficulty with feeding. Previous studies have demonstrated empirically that ADL and IADL tasks reflect points along a common continuum (Asberg, & Sonn, 1989; Bjorner, Kreiner, Ware, Damsgaard, & Bech, 1998; Fieo, Austin, Starr, & Deary, 2011; Fisher, Eubanks, & Marier, 1997; Granger, Hamilton, Linacre, Heinemann, & Wright, 1993; Haley, McHorney, & Ware, 1994; Hays, Liu, Spritzer, & Cella, 2007; Heinemann, Linacre, Wright, Hamilton, & Granger, 1993; Jenkinson, Fitzpatrick, Garratt, Peto, & Stewart-Brown, 2001; Kempen & Suurmeijer, 1990; Linacre, Heinemann, Wright, Granger, & Hamilton, 1994; McHorney, 2002; McHorney, Haley, & Ware, 1997; Raczek et al., 1998; Spector & Fleishman, 1998; Stewart & Kamberg, 1992; Tonner & Harrington, 2003; Tsuji et al., 1995). Together, different scales provide a broad range of assessment of physical functioning which is desirable in heterogeneous samples (Spector & Fleishman, 1998; Stone & Murtaugh, 1990).
Nearly 100 standardized questionnaires of physical functioning ability have been developed since the 1950s (e.g., Hanman, 1958; Mahoney, Wood, & Barthel, 1958; McHorney, 2002; Moskowitz & McCann, 1957; Nagi, 1976; Verbrugge & Jette, 1994). Each measure provides its own yardstick and does not relate easily to others on a common underlying continuum. Recent national efforts, culminating in PROMIS, have endeavored to use state-of-the-art psychometric methods to create summary scores with good measurement properties. PROMIS is an initiative funded by the National Institutes of Health (NIH; http://nihroadmap.nih.gov). The goal of PROMIS is to provide researchers and clinicians with flexible instruments that measure constructs reliably in a manner that reduces respondent burden. The PROMIS normative sample (N = 21,133) is comprised of adults aged 18 and over (mean age: 53; 82% White and 52% female). PROMIS instruments are appropriate for measuring patient-reported health outcomes ranging from physical health to mental health and social well-being, with physical functioning representing a primary domain (Rose et al., 2008, 2014). Findings from studies that use PROMIS measures can be more easily compared, thus facilitating the synthesis of findings across different studies.
It has been shown that combining ADL and IADL items together into a single scale provides an enhanced range of measurement of physical functioning. The present study further integrates ADL and IADL items with physical functioning questions from the MOS SF-12 and scales the resulting factor in an existing study to the PROMIS normative sample using publicly available information. The MOS SF-12, a widely used measure (McHorney et al., 1997), presents more challenging activities that extend the range of daily activities and thus expand the range of measurement. Our overall goal was to derive a summary measure of physical functioning that synthesizes information from several common physical functioning items. We developed the measure in a sample of elective surgery patients and examined its psychometric properties. We examined validity of the new metric by describing associations with performance-based measures of physical functioning and clinical outcomes. We hypothesized that the summary measure would be highly correlated with mobility and energy expenditure but not with cognitive status. Finally, we examined predictive criterion validity against clinically important outcomes.
Research Design
Participants
Our sample consisted of the first 300 consecutively enrolled elective surgery patients recruited for the Successful AGing after Elective Surgery (SAGES) study. SAGES is a prospective cohort study of cognitive and functioning outcomes of hospitalization that began in 2010 and did not use PROMIS measures. Eligibility criteria included—scheduled for a major elective surgery, aged 70 years or older, and no evidence of dementia. The Institutional Review Board at Beth Israel Deaconess Medical Center approved the study procedures.
Indicators of Physical Functioning
We considered 7 ADL items, 7 IADL items, and 4 items from the MOS SF-12 for inclusion as physical functioning indicators in the present study (Table 1). For basic ADLs, participants were asked if they had any difficulty in the past month with bathing, personal grooming, dressing, feeding, getting from a bed to a chair, using the toilet, and walking across a small room with no help needed (responses: no help with no difficulty/difficulty or help needed). For IADLs, participants were asked whether they had any difficulty in the past month using the telephone, getting to places outside of walking distance, shopping for groceries, preparing meals, doing housework, taking medications, and managing money (responses: help needed/no help needed). Self-reported difficulty using the telephone was nonexistent (N = 0) in our sample; so we excluded this item, as done in previous studies (LaPlante, 2010), as well as the PROMIS item bank. We also excluded medication use and money management because of small percentages of people endorsing these items. From the MOS SF-12, we used the 4 items with the highest loading on the physical component summary. These items addressed physical functioning and role limitations due to physical health problems in the course of a typical day in (1) moderate activities, (2) limitations in climbing several flights of stairs, (3) limitations with work and regular activities, and (4) limitations in the kind of work and activities that can be done (Table 1).
Physical Functioning Items Used in the Harmonization: Results from SAGES (N = 300).
aThese items were dropped from the final scale due to small variance.
Statistical analysis
Estimation of physical functioning
We examined the dimensionality of the functioning items using parallel analysis with scree plots (Buja & Eyuboglu, 1992; Horn, 1965). We then performed a factor analysis of the 15 previously described categorical indicators of physical functioning described earlier. We used the statistically appropriate polychoric correlations in our confirmatory factor analysis, thereby estimating a model equivalent to a logistic graded response item response theory (IRT) model (Lord, 1953; Takane & de Leeuw, 1987). IRT relates responses to items to a latent trait using probabilistic models. An equation that describes the relationship between a dichotomous item and the underlying physical functioning factor θ is given by:
Here, the expected probability P(θ) (which is a shorthand expression for P(uij = 1|qij ) for person i on item uj is a function of a discrimination parameter, a, and an item difficulty parameter, b, a latent variable, θ, which here is physical functioning, and a nonlinear linking function, F, typically logistic. The model can be easily extended to ordinal response variables. The model, estimated in Mplus version 7.11, used an expectation–maximization algorithm for maximum likelihood estimation with robust standard errors (Muthén & Muthén, 1998–2011). Factors scores from the model, estimated by the regression-based approach in Mplus, represent the physical functioning scale.
The model separately estimates a measurement slope for each item, which provides information about how well an item differentiates between people at different levels of physical functioning. In addition, item threshold parameters identify where along the physical functioning trait an item provides the most information. There are as many thresholds for an item as category boundaries for that item. In this study, we report item slopes and threshold parameters on a standardized metric with respect to the latent variable and the indicators, so that parameters are interpretable in correlation and z-score units, respectively. Information functions were calculated based on parameters on a normal ogive scale, which is sometimes used in IRT (Lord, 1953).
Measurement slopes and threshold parameters for the NIH PROMIS normative sample are publicly available on the PROMIS website (http://www.assessmentcenter.net). We linked our physical functioning scale to the metric of the NIH PROMIS Physical Function item bank (version 1.0) using 2 items in common between SAGES and PROMIS. The two items, from the MOS SF-12, asked about limitations in moderate activities and in climbing several flights of stairs. We fixed model parameters for these items to their values in PROMIS (which are on a logistic ogive scale), which places the metric of the latent variable on the scale of the nationally representative PROMIS sample (Wave 1, N = 5,239; Liu et al., 2009). In PROMIS, responses to both items were not at all, very little, somewhat, quite a lot, and cannot do. In SAGES, responses to these items were no, not limited at all, yes—limited a little, and yes—limited a lot. We assigned the PROMIS threshold between the not at all and very little response options to the first threshold in SAGES, and the PROMIS threshold between somewhat and quite a lot to the second threshold in SAGES. In a sensitivity analysis, we estimated factors based on only the ADL/IADL items and on only the MOS SF-12 items and compared their distributional properties and validity to the expanded physical functioning measure.
Psychometric properties of the physical functioning measure
We examined precision (or reliability) and internal consistency. We report precision of the measure over the range of physical functioning ability using a test information curve (Hambleton, Swaminathan, & Rogers, 1991). Internal consistency of the scale was assessed using Cronbach’s α (Nunnally, 1967). To judge the fit of the model to data, we examined standardized differences between empirical probabilities and model-predicted probabilities. To evaluate local independence, we also compared sample polychoric correlations with model-estimated correlations using normalized residuals, computed as the standardized difference between sample and model-estimated correlations (Bollen, 1989).
Validity of the physical functioning measure
We examined convergent validity by correlating our summary physical functioning measure with performance-based time to complete a 3.5-m walk and self-reported energy expenditure from the Minnesota Leisure Time Activities questionnaire (Pereira et al., 1997). We examined divergent validity by correlating the physical functioning measure with the modified mini mental exam (3MS), a cognitive screening test (Teng & Chui, 1987). We expected physical functioning to be moderately to highly correlated with mobility and energy expenditure and modestly correlated with cognitive status (Cohen, 1988). To determine predictive criterion validity of the physical functioning measure, we examined its ability to predict hospital length of stay using Poisson regression and risk of discharge to a rehabilitation facility using logistic regression. We expected better physical functioning to be associated with shorter hospital stays and lower risk of discharge to a rehabilitation facility.
Results
The study sample was mostly White (95%), married (62%), female (55%), and highly educated (mean 15 years; Table 2). Of the 300 participants, 48% had more than one comorbidity and 85% were scheduled for major elective orthopedic surgery. Most (61%) reported no difficulty in any ADL or IADL item, while 89% of the sample reported some limitations in the physical functioning items from the MOS SF-12. The mean level of functioning on the MOS SF-12 physical component (mean = 35.6) was relatively worse than average compared to national norms (Table 2), which is consistent with the sample of predominantly orthopedic surgery patients (84%) over 70 years of age. These patients often have substantial functional limitations due to underlying orthopedic problems.
Participant Characteristics in the SAGES Cohort (N = 300).
Note. SD = standard deviation; ADL = activities of daily living; IADL = independent activities of daily living; MOS SF-12 = Medical Outcomes Study Standard Form 12-item questionnaire; EPESE = established populations for epidemiologic studies of the elderly.
Estimation of Physical Functioning
Parallel analysis suggested strong evidence for a unidimensional factor underlying the 15 physical functioning indicators (Figure 1). Figure 1 plots observed Eigenvalues (connected black dots) and the expected Eigenvalues based on random reshuffling of data. The first observed Eigenvalue is above that expected by randomness, while the rest fall within or below the 90% confidence region, implying that unidimensionality is sufficiently met. The Eigenvalue of the first factor was 8.3, while all other Eigenvalues were less than 2.0, which is below what would be expected by chance given the permutation distribution of random Eigenvalues. Standardized residuals comparing empirical and model-predicted probabilities were negligible (z < 1.7) for every item, so local independence was sufficiently met. The proportion of variance explained for each item was above 50% for most items (Supplemental Table 1). When we computed normalized residuals, only 1 of 105 normalized residual correlations had an absolute value greater than 2.0 (results available upon request). Thus, we considered the fit of the model to the data acceptable.

Scree plot from parallel analysis—Results from SAGES (N = 300). Observed Eigenvalues (connected black dots) and the Eigenvalues expected based on random resampling of the original data. Random data was based on 55 reshufflings of existing data. SAGES = Successful AGing after Elective Surgery.
Results of the IRT analysis are shown in Figure 2. Overall model fit was acceptable (root mean square error approximation 0.06; comparative fit index 0.96). The polychoric correlation matrix is provided in Supplemental Table 1. All items were highly correlated with the underlying physical functioning trait, as indicated by factor loadings (Figure 2). Most item location thresholds were at the more disabled end of the physical functioning continuum. ADL and IADL items had overlapping locations; while ADLs tended to provide information at the more impaired end of the functional ability continuum, several IADLs, namely meal preparation and transportation, were also informative at the impaired end.

Measurement model for the physical functioning measure—Results from SAGES (N = 300). Structural equation model summarizing the IRT graded response model with 17 ordinal dependent variables. Factor loadings quantify the correlation between underlying physical functioning and each item. Item thresholds, on a N(0,1) scale, depict the point along the spectrum of physical functioning in which the item has the best discrimination. To scale the physical functioning measure to the metric set for the NIH PROMIS initiative, item factor loadings and threshold parameters for two MOS SF-12 items (limitations in moderate activities and in climbing flights of stairs) were fixed to the values in the PROMIS normative sample. Numbers in parentheses denote standard errors of factor loadings and thresholds. In the figure, item slopes and threshold parameters are reported on a standardized metric, so that parameters are interpretable in correlation and z-score units, respectively. ADL: Activities of daily living; IADL: independent activities of daily living; SAGES = successful aging after elective surgery; IRT = item response theory; NIH PROMIS = National Institutes of Health Patient Reported Outcomes Measurement Information System; MOS SF-12 = Medical Outcomes Study Short Form-12.
The physical functioning measure was scored, such that high values indicate less disability. Latent traits typically have a mean of 0 and variance of 1 as a condition of model identification, but PROMIS has transformed these normal scores into T scores (mean 50 and standard deviation [SD] of 10) to facilitate interpretability. A score of 50 represents an average score in the NIH PROMIS normative sample, and we implemented the same transformation in our data. The mean level of the physical functioning factor in SAGES was 42.7 (SD = 5.3), suggesting the SAGES sample, which is elderly and awaiting orthopedic surgery, is less physically functional on average than the normative sample by 0.7 SD. There is a ceiling effect demonstrated in the less impaired range of physical functioning, constituting 11% of the sample (Figure 3, top panel).

Distribution of physical functioning in the SAGES study (N = 300). Distribution of the physical functioning score with overlaid normal distribution for factor scores derived using all items, ADL/IADL items only, and MOS SF-12 items only. A score of 50 is the mean level of physical functioning in the NIH/PROMIS normative sample. Higher scores indicate less impairment. ADL: Activities of daily living; IADL: independent activities of daily living; SAGES = successful aging after elective surgery; MOS SF-12 = Medical Outcomes Study Short Form-12.
Also shown in Figure 3 are distributions of physical functioning factors that used only the ADL/IADL items (second panel) and MOS SF-12 items (third panel). By using only the ADL/IADL items, 21% of the sample (N = 64 participants) were at the ceiling compared to 11% for the expanded measure. Using only the MOS SF-12 items, the ceiling was unaffected, but a sizable floor effect of 16% (N = 48) emerged in the sample.
Psychometric Properties of the Physical Functioning Measure
Internal consistency reliability (Cronbach’s α) of the 15 items comprising the physical functioning measure was .83. Analysis of the differential amount of information in the scale over the range of physical functioning, another indication of the measure’s reliability, revealed reliabilities above .95 between scores of 30 and 50, which is between 2 SD below the PROMIS norm and the PROMIS norm (Figure 4). This range included 87.6% of the sample.

Precision of the physical functioning score over the range of physical functioning—Results from SAGES (N = 300). The information of the physical functioning score is plotted over the range of functional ability for factor scores derived using all items, ADL/IADL items only, and MOS SF-12 items only. A score of 50 is the mean level of physical functioning in the NIH/PROMIS normative sample. The shape is consistent with a score optimized for the study of between-persons differences and longitudinal change among persons with below PROMIS average physical functioning (the vertical line at a score of 50 indicates the mean of the PROMIS normative sample). The horizontal line at a reliability of 0.95 indicates excellent reliability across most of the observed score range. Reliability = 1 − 1/Information = 1 − (standard error of measurement)2. ADL: Activities of daily living; IADL: independent activities of daily living; SAGES = successful aging after elective surgery; MOS SF-12 = Medical Outcomes Study Short Form-12; PROMIS = Patient Reported Outcomes Measurement Information System.
Validity of the Physical Functioning Measure
The physical functioning measure was moderately correlated with the timed walk (r = .52) and energy expenditure (r = .40), providing evidence of convergent validity. The correlation was low for cognitive function measured by the 3MS (r = .14), providing evidence of divergent validity. Corresponding correlations using a physical functioning factor based only on ADL/IADL items were similar, but a factor constructed only from MOS SF-12 physical health items was less correlated than the other factors. For example, the correlation of our expanded range physical functioning scale and gait speed is .52, whereas restricting to the MOS SF-12 physical functioning items, the correlation is .46. Overall differences were small but highlight the importance of including the ADL/IADL items for the criteria we have identified (Supplemental Table 2).
We examined predictive criterion validity of the physical functioning measure using rehabilitation facility placement and hospital length of stay. As shown in Table 3, the relationship of these clinical outcomes with worse functional ability increases linearly, consistent with a dose–response relationship. Among low functioning patients in the study (those with preoperative physical functional scores < 35, more than 1.5 SD below the PROMIS average), 65% (95% CI [40%, 84%]) were placed in a rehabilitation facility, while the proportions of patients with average (scores of 35–50) and high (scores of 50+) levels of physical functioning were 63% (95% CI [56%, 69%]) and 36% (95% CI [21%, 55%]), respectively. Overall, the odds of discharge to a rehabilitation facility were 62% lower for each half SD higher (less impaired) physical functioning score (95% CI, 34%, 78%). Models adjusted for age, sex, race, years of education, number of comorbidities, and 3MS score to account for potential confounding of the association between the physical functioning measure and clinical outcomes.
Predictive Validity of the Physical Functioning Measure (N = 300).
Note. Higher (more impaired) physical functioning was associated with a lower odds of discharge to a rehabilitation facility (p < .001) and a lower length of stay (p = .02). Estimates are adjusted for age, sex, race, years of education, number of comorbidities, and 3MS score. We selected thresholds of 35 and 50, corresponding to 1.5 SD and 0.5 SD below the PROMIS mean of 50, to divide the sample approximately into tertiles. aIncludes acute, subacute, and chronic care rehabilitation facilities.
A similar dose–response relationship was observed between the factor score and hospital length of stay. Adjusting for covariates, the mean length of stay among patients with low (scores < 35), average (scores of 35–50), and high (scores of 50+) levels of physical functioning were 6.6 days, 5.2 days, and 5.0 days, respectively (Table 3). In multivariable analyses, a half SD higher (less impaired) physical functioning score was associated with a 13% reduction in the daily risk of remaining hospitalized (incidence rate ratio = 0.87, 95% CI [0.79, 0.97]).
In a sensitivity analysis, we compared the predictive criterion validity for our physical functioning factor with scores composed of only ADL/IADL items and only MOS SF-12 items. Although the ability of these scores to predict future placement to a rehabilitation facility remained comparable to that for the full scale, the ability to predict the mean hospital length of stay no longer showed a dose–response relationship between the factor score and mean hospital length of stay for either of the two component scales.
Discussion
We developed a summary measure of physical functioning using items from ADL, IADL, and MOS SF-12 questionnaires. We used publicly available measures from the NIH PROMIS physical functioning item bank to calibrate our measure to the PROMIS normative sample. This feature enabled us to describe physical functioning in a selected sample on a population-based scale. Higher scores are associated with a shorter length of hospital stay and lower risk of discharge to a rehabilitation facility. The measure using these familiar items is internally consistent and provides a reliable measure of functional ability between 2.0 SD below (more impaired) and the mean of PROMIS metric. Although the reliable range of 2 SD is robust, the limitation of the reliability to the average reflects the ceiling effect observed in the sample. We observed a ceiling because most of the items in the physical functioning measure were developed for use in a more impaired population than what was recruited in SAGES. Underscoring the utility of the expanded physical functioning measure, ceiling and floor effects were more prominent when we constructed scales using only ADL/IADL and MOS SF-12 items, respectively.
This study demonstrates that the challenge of comparing findings across studies using different measures of physical functioning can be handled, to some degree, analytically. We used a surgical sample that allowed us to examine predictive criterion validity using nursing home placement and hospital length of stay. Our approach can be used by other researchers to directly compare physical functioning in existing studies with findings from new studies using the NIH PROMIS item bank. Although more overlap is better, at least one physical functioning item shared with the PROMIS item bank is enough to apply this methodology (Jones & Fonda, 2004). Because the PROMIS physical functioning item bank was constructed using existing questionnaires, our approach may be generalizable to other studies (Rose et al., 2008). The novelty in our approach is the external scaling of items from multiple measures of physical functioning to the PROMIS normative sample, which will maximize interpretability of our results and enhance comparisons across studies. The SAGES data had available ADL, IADL, and MOS SF-12 items but other studies may use other available items as long as some questions are in common with PROMIS measures.
The major advantage of our study is that we implemented a novel approach to score a study-specific outcome measure in a manner consistent with the PROMIS metric. Using IRT to calibrate our measure may not appreciably improve the SAGES study. However, it is an important advantage to be able to compare physical functioning in our study with the normative PROMIS metric, thus enabling a more informed understanding of the generalizability and representativeness of future study samples that use PROMIS. More broadly, calibration in IRT depends on many factors, including content span of available items, precision over the range of observed ability in a sample, and differential item measurement. With respect to item content, we included all available ADL/IADL items and relevant items from the MOS SF-12, which we believe represents the construct of interest. We established our link using the PROMIS physical functioning item bank, which includes item content that overlaps both the ADL/IADL items included and the MOS/SF-12. More items, and more overlap with the PROMIS item bank, would be more optimal. Regarding measurement precision across the range of observed physical functioning, the measure provides acceptable precision where most of the sample in SAGES performs (Figures 3 and 4). Other studies may include different types of people, but the approach we took can include other items measuring physical functioning.
Our externally scaled measure of physical functioning that combined ADL, IADL, and MOS SF-12 items demonstrated less floor and ceiling effects compared to factors using ADL/IADL or MOS SF-12 alone. Whether such a score will lead to superior measures of association in all cases needs further investigation. Explicit advantages of our approach include cross-validation and pooled analyses of multiple studies to address substantive research questions. Scale choice for a physical functioning outcome in an individual study is dependent on local factors and individual preferences. However, future advancements in research involving physical functioning could be accelerated if findings are presented on a scale common across studies, as we have demonstrated here. Regarding potential heterogeneity in the physical functioning measure, we used item parameters from PROMIS items with identical question prompts to MOS SF12 items in our study. Parameters for the other ADL/IADL items were determined empirically. Because they are determined empirically, the items only contribute to our proposed score to the extent that they share variance with the PROMIS/SF physical functioning dependency construct.
Several caveats should be mentioned. Conceptually, the dimensionality of the PROMIS physical functioning item bank has been a source of debate in the field (Hays et al., 2007; Martin et al., 2007; Raczek et al., 1998; Rose et al., 2008; Wolfe, Michaud, & Pincus, 2004). Some argue that physical functioning is a heterogeneous concept that inherently reflects not only domains reflecting muscle strength and coordination but also cognitive flexibility and the social and environmental context of the activity being performed. Previous empirical research also suggests items assessing upper-body and lower-body functions may be different or that mobility is distinct from self-care. In our study, we did not have items that permitted such distinctions. Rose and colleagues (2014) previously provided sufficient evidence for unidimensionality using PROMIS physical functioning data. Those findings are consistent with those from our data. Next, our sample included a large number (11%) of participants without any self-reported difficulty or help needed. This proportion suggests our measure does not discriminate well among participants at higher levels of physical functioning, resulting in an observed ceiling in the score distribution. However, this is a substantial improvement over ceiling effects present in other measures: 39% of the sample would have been at the ceiling had we only considered simple sums of IADL and ADL items. Fieo, Austin, Starr, and Deary (2011) noted that future work is needed to expand the range of the instruments that assess physical functioning for community-living older adults by including items with more sensitivity to milder disability. Lawton and Brody (1969) proposed that social activities and behaviors lie on the same continuous trait of everyday physical functioning and could provide more information in the range of physical functioning occupied by community-living older adults (Fieo, Manly, Schupf, & Stern, 2013). Another way to expand the measurable range of physical functioning might be to include more response options to ADL/IADL items, such as no difficulty, some difficulty, a lot of difficulty, and cannot do without help. However, expanded measurement of IADLs and ADLs into the more impaired range would not dilute the ceiling among community-living participants such as those in the present study. A third limitation is that our sample size was too small to evaluate differential item functioning. Individual items should measure functional ability in the same way among different subgroups of individuals, such as those defined by characteristics like age or sex (Paz, Spritzer, Morales, & Hays, 2013; Thissen, Steinberg, & Wainer, 1988). The power to detect a minimum clinically relevant amount of bias (odds ratio of 2.0; Cole, Kawachi, Maller, & Berkman, 2000) in our sample is only 72%, and the policy of the Educational Testing Service is not to examine item bias in samples of fewer than 700 participants (Zwick, 2012). Although replication of our study in a larger sample is needed, our resulting factor demonstrated acceptable measurement precision and convergent and predictive criterion validity. Fourth, we acknowledge that the available sample size in our study is not large for factor analysis, although in this context Comrey and Lee (1992) suggested N = 300 is “good” and Baker (1962) reported in results from a Monte Carlo study that sample size of N = 120 is sufficient to estimate item parameters with variances close to those provided by asymptotic formulas. Fifth, the response distribution on the ADL and IADL items is skewed. This may have limited our ability to identify multidimensionality in the items. We note that previous studies of ADL/IADL functioning have demonstrated unidimensionality of items related to those we used (Asberg, & Sonn, 1989; Kempen & Suurmeijer, 1990; LaPlante, 2010). Using a more impaired sample in which items were less skewed, Spector and Fleishman (1998) concluded that a single dimension explains most of the variance among ADL/IADL. Finally, calibrating physical functioning between PROMIS and the SAGES study using our approach relies on having items in common with PROMIS. There are 2 items in common between SAGES and PROMIS; while having more common items is more ideal, we are limited by constraints of existing data (Wang, 2004).
Advantages of the IRT-derived physical functioning measure are its higher sensitivity to differences in physical functioning in a broad range of older adults compared to scales for just ADL or IADL, allowance for different weighting for each item in the scale, and characterization of the scale’s precision over the observed range of physical functioning. Further, physical functioning derived with IRT can be treated as an interval scale, making it ideally suited for studying longitudinal change. Most importantly, advantages of the PROMIS initiative can be applied to existing resources to facilitate common measures with which to compare physical functioning across studies.
Conclusion
We derived a summary measure from widely used physical functioning measures using published data from the NIH PROMIS physical functioning item bank to calibrate the measure to the NIH PROMIS normative sample. This linking enabled us to describe physical functioning in our study on a nationally representative scale. The measure of physical functioning using familiar items was internally consistent, provided reliable measures of functional ability across low and average levels of functioning, and demonstrated no floor effect and less of a ceiling effect than other common approaches to creating physical functioning composites. The measure demonstrated predictive criterion validity: Less impaired scores on the physical functioning measure were associated with lower risk of discharge to a rehabilitation facility and shorter length of hospital stay. These outcomes demonstrated a dose–response relationship. Our approach holds the potential for broad applicability to directly compare physical functioning in new and existing studies when overlapping items with the NIH PROMIS item bank are present. Importantly, these methods can facilitate interpretation and synthesis of findings across existing and future research studies.
Footnotes
Authors’ Note
Dr. Inouye holds the Milton and Shirley F. Levy Family Chair. The contents do not necessarily represent views of the funding entities. Funders had no roles in the design and conduct of the study, data collection, management, analysis, interpretation of the data, or manuscript preparation, review, or approval.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Grant nos. P01AG031720 (SKI), K07AG041835 (SKI), and R03AG045494 (ALG). Dr. Gross was supported in part by Grant no. T32AG023480.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
