Abstract
Introduction
Using a sum score based on a three-graded response scale for the activities of daily living staircase has previously been found to increase the statistical power compared to dichotomized responses when assessing longitudinal changes in activities of daily living. We aimed to investigate if the statistical power could be further increased by using a four-graded scale.
Methods
We used data from two previous studies on community-living people to calculate sum scores based on a dichotomized (independent/dependent), a three-graded (independent/partly dependent/dependent), and a four-graded (independent without difficulty/independent with difficulty/partly dependent/dependent) response scale for the activities of daily living staircase. In total, 1818 paired observations (baseline to follow-up) from 482 people were included. Statistical power was estimated for the entire material as well as stratified by follow-up time and baseline activities of daily living using simulations.
Results
The four-graded scale provided the highest statistical power, particularly for shorter follow-up times and low and high baseline activities of daily living, but had similar statistical power to the three-graded scale for longer follow-up times and medium baseline activities of daily living.
Conclusion
Adding a second level to “independent” in the activities of daily living staircase improved the detection of changes over time.
Keywords
Introduction
Within occupational therapy practice, assessment of the client’s performance of activities of daily living (ADL) very often serves as the basis for planning, conducting, and evaluating evidence-based interventions (Nielsen and Waehrens, 2015; Wæhrens, 2010; Wheeler et al., 2018). One example of this is home-based interventions, where ADL is a commonly investigated outcome, at least for the older population (Ekstam et al., 2014; Fänge and Iwarsson, 2005; Stark et al., 2017; Szanton et al., 2014). Changes in ADL can serve as a measure to predict effects of possible policy changes (Slaug et al., 2017).
ADL include activities on a personal level (P-ADL) (for example feeding and dressing), as well as on an instrumental level (I-ADL) (for example cooking and cleaning). Since Katz et al. (1963) introduced an ADL index in 1963, more than 50 different ADL measurement instruments have been developed (Hopman-Rock et al., 2018). One is the ADL staircase, a Swedish instrument designed to measure dependence on another person in ADL. The ADL staircase contains the five P-ADL items (feeding, transfer, using the toilet, dressing, and bathing) from Katzs ADL Index (Katz et al., 1963) as well as four I-ADL items (cooking, transportation, shopping, and cleaning). Data are collected by means of a combination of interview and observation, that is, both self-report and professional evaluation are used. Three response alternatives are available: independent, partly dependent, and dependent, with dependent denoting dependence on another person. According to the manual, analysis of data should be based on a dichotomized scale (independent/dependent). The ADL staircase has undergone reliability and validity testing, and has been found to be useful in both practice and research (Hulter-Åsberg and Sonn, 1989; Iwarsson and Isacsson, 1997; Sonn and Hulter-Åsberg, 1991).
The ADL staircase is based on a cumulative hierarchy of items. However, even if such an approach is particularly beneficial in healthcare and rehabilitation, when it comes to statistical inference it has drawbacks, such as responses not conforming to the hierarchical order, resulting in missing data. To address this, Iwarsson and Lanke (2004) investigated alternative ways to transform the ADL staircase response pattern into a single number. They found that using the sum of the dichotomized items in the ADL staircase had several advantages compared to using the original version. For example, besides allowing for inclusions of data that would be excluded in the original version, the sum score provided lower p-values. Building upon this, Fänge et al. (2004) investigated the possibility of using all available data from the ADL staircase when calculating the sum score, that is, using a three-graded scale rather than a dichotomized scale. They concluded that, compared to using the full data, dichotomizing the items in the ADL staircase reduced the statistical power by 13–24% when assessing changes in ADL dependence over time. More recently, some studies have added a fourth grade, independent with difficulty, to the scale (Iwarsson et al., 2009; Slaug et al., 2017). By discriminating between independence with and without difficulty, Iwarsson et al. (2009) detected variations in five national samples, thus showing that by using a fourth grade of the scale, a more diverse and information-rich picture of dependence and independence was provided. More precisely, lower proportions of independence with difficulty were identified in basic P-ADL, and higher proportions in more complex I-ADL (Iwarsson et al., 2009). Recently, Slaug et al. (2017) used the four-graded scale to identify the group most at risk of becoming dependent in ADL. However, to the best of our knowledge, it has never been investigated if it is possible to extrapolate these results so that using an even finer scale to assess ADL dependence with the ADL staircase further improves the statistical power when investigating longitudinal changes. Such knowledge would potentially be useful for both practice and research within occupational therapy. A finer grade scale could be more sensitive to detecting clinically relevant changes in ADL for individual occupational therapy clients as well as demonstrating the effects of interventions within research. Accordingly, the aim of the present study was to investigate the number of scale grades in the ADL staircase that would optimally detect changes in ADL over time.
Method
Datasets
A secondary analysis was conducted using data collected from two previous longitudinal studies, which are described below. We included all participants with complete ADL data (no items missing) on at least two measurement occasions. We used all available sets of paired data from all included individuals. Thus, a person with complete ADL data from three occasions would contribute the pairs from occasion 1–2, occasion 1–3, and occasion 2–3. For each pair, the first time point is considered as baseline and the second as follow-up. Thus, one time point may be baseline in one data pair and follow-up in another.
The Research Strategy for Housing Adaptation (ReSHA) trial (Ekstam et al., 2014) is a longitudinal study including people aged 25–99 years, living in ordinary housing and applying for housing adaptation grants in three municipalities in southern Sweden (n = 241). ADL data were collected at four home visits by experienced occupational therapists. There were four occasions of data collection between 2013 and 2015. This study contributed 1001 sets of paired data from 168 people. The mean time between measurements was 350 days (range 33–1251). The study was approved by the Regional Ethical Review Board in Lund, Sweden (2012/566).
Enabling Autonomy, Participation, and Well-Being in Old Age: The Home Environment as a Determinant for Healthy Aging (ENABLE-AGE) Study included people aged 80–89 years and living alone in ordinary housing in three urban municipalities in southern Sweden (n = 397) (Iwarsson et al., 2007). Participants were randomly drawn from official registers, stratified by age (80–84, 85–89) and sex to guarantee sufficient sampling of people 85+ (about 50%) and of men (about 25%). Data were collected using a combination of interviews and observations, which were performed by trained raters in the participants’ homes. There were four occasions of data collection between 2002 and 2011. The ADL data used in the current study includes 817 paired observations from 314 people in the study sample who took part on at least two occasions. The mean time between measurements was 1343 days (range 282–3294). The study was approved by the Regional Ethical Review Board in Lund, Sweden (2002/324).
ADL staircase
In both studies (ReSHA and ENABLE-AGE), the ADL staircase was used in the modified version (Iwarsson et al., 2009), that is, including a fourth response option for each item such that those who were assessed to be independent were also asked if they were independent with or without difficulties. Thus, each ADL item was rated as either: (a) independent without difficulties; (b) independent with difficulties; (c) partly dependent; or (d) dependent, where dependence denoted dependence on another person. Items included were feeding, transfer, toileting, dressing, bathing, cooking, transportation, shopping, and cleaning. Transportation was assessed for the type of transportation most prevalently used, either public transportation or transportation by car (driving or being driven).
For each person and each time point, we calculated sum scores in three different ways for all ADL items as well as for the P-ADL (feeding, transfer, toileting, dressing, bathing) and I-ADL (cooking, transportation, shopping, cleaning) separately.
Dichotomized item scores: independent [0]/dependent [1] (Hulter-Åsberg and Sonn, 1989). Three-graded scaling: independent [0]/partly dependent [1]/dependent [2] (Fänge et al., 2004). Four-graded scaling: independent without difficulties [0]/independent with difficulties [1]/partly dependent [2]/dependent [3] (Iwarsson et al., 2009).
Effect modifiers
For each set of paired data, we calculated the total ADL sum score at baseline (the first time point in each set of paired data) using the four-graded scale. This baseline ADL sum score was then categorized into less than 5 (n = 624), 5–10 (n = 442), and greater than 10 (n = 752). Moreover, for each set of paired observations, we determined the time between the two measurements (follow-up time). This was categorized into less than 6 months (n = 295), 6–12 months (n = 610), 1–2 years (n = 304), and more than 2 years (n = 609). Both ADL at baseline and follow-up were used to perform stratified analyses.
Statistics
To estimate statistical power, we randomly selected 100 paired observations from the data and analyzed if the change from baseline to follow-up differed from null. For each ADL score (total, I-ADL, P-ADL) and statistical test (t-test, Wilcoxon signed rank test), this was repeated 1000 times, and the percentages of p-values less than or equal to .05 were used as estimators of statistical power.
Power calculations are only relevant if there indeed is an effect to be detected. Thus, we needed a data set in which a certain degree of deterioration was present. To achieve this, we split the paired observations into two strata. In the first stratum, we included those with improvement on at least one of the ADL sum scores (dichotomized, three-graded, or four-graded), whereas the remaining observations were included in the second stratum. Randomization was performed by assigning each set of paired data a random number between 0 and 1, using the uniform random number generator in SPSS. The random number for the sets with improvement in at least one ADL sum score (stratum one) were then divided by two. The 100 sets of paired data with the lowest random number were then selected for analysis. Once selected, data sets from both strata were treated the same in the analyses.
Analyses were performed for the whole material as well as stratified for baseline ADL and follow-up time. All analyses were performed in IBM SPSS Statistics 25.0.
Results
The mean sum scores for the different types of ADL (total sum score, sum score for P-ADL, and sum score for I-ADL) scales (dichotomized, three-graded, and four-graded), and categories (total and stratified by baseline ADL and follow-up time) are presented in Table 1.
Mean of the different sum scores of the activities of daily living (ADL) staircase for total ADL, and ADL on personal (P-ADL) and instrumental (I-ADL) levels, at different time points (baseline and follow-up).
Using all available data, the three-graded scale performed best for ADL and I-ADL, whereas the four-graded scale performed best for P-ADL (Figure 1). When stratifying on follow-up time, the four-graded scale produced the highest statistical power for times up to 2 years, whereas the three-graded scale performed better for longer follow-up times (Figure 2, Figure 3). The four-graded scale also performed better when restricting the analyses to those with baseline ADL less than five or greater than 10, whereas the three-graded scale produced the highest statistical power for those with baseline ADL between five and ten.

Estimated statistical power for change in sum score of the ADL staircase based on four-graded (white), three-graded (gray), and dichotomous (black) scales, respectively, in 1818 paired data sets from 482 people.

Estimated statistical power for change in sum score of the activities of daily living (ADL) staircase based on four-graded (white), three-graded (gray), and dichotomous (black) scales, respectively, and stratified by follow-up time, in 1818 paired data sets from 482 people.

Estimated statistical power for change in sum score of the activities of daily living (ADL) staircase based on four-graded (white), three-graded (gray), and dichotomous (black) scales, respectively, and stratified by baseline ADL on four-graded scale, in 1818 paired data sets from 482 people.
Discussion and implications
Although the three-graded scale performed better on ADL overall, we were able to identify situations in which the four-graded scale was preferred. These included assessment of changes in P-ADL, short follow-up time, and longitudinal assessment of people with low dependence at baseline.
The four-graded scale outperformed both the three-graded and the dichotomized scale for follow-up times less than 2 years and performed similarly to the three-graded scale for longer follow-up times. This may be interpreted to mean that all three sum scores are good at detecting long-term decline in ADL, but the four-graded scale is able to identify minor changes that occur over shorter follow-up periods.
When stratifying by baseline ADL, we found the highest statistical power for the four-graded scale for people with either low (ADL<5) or high (ADL>10) dependence at baseline, but not for those with moderate (ADL 5–10) baseline dependence. The four-graded scale performed better among those with low baseline ADL, that is, those that may be interpreted as being healthier. In such populations, people can be expected to move between the two levels of “independent” (with and without difficulty). Thus, allowing for variations of “independent” results in a larger variation in the data. In populations with high dependence already at baseline, the four-graded scale performed only slightly better than the three-graded scale. This is likely because most of the changes involve moving from independence (regardless of with or without difficulty) to partly dependent or from partly dependent to dependent. In these situations, the change in sum scores would be equal for the three- and four-graded scales.
When assessing P-ADL and I-ADL separately, the four-graded scale performed best for P-ADL, whereas the three-graded scale performed best for I-ADL. One potential explanation for this could be that, compared to the items in P-ADL, those in I-ADL are more complex and dependent on contextual aspects. For example, dressing (P-ADL) highly depends on the individual’s capacity whereas shopping (I-ADL) also depends on access to store, interactions with other people such as store personnel, and the outdoor environment when traveling to and from the store, etc. Thus, the performance of I-ADL activities places higher demands on the skills of the individual. It may be reasoned that due to the complexity of these activities, people are likely to ask for help when the task becomes difficult, whereas for P-ADL people may be more inclined to continue doing tasks themselves even though they are difficult to perform. If this is true, there would be more variation within “independent” for P-ADL than for I-ADL, which would explain the higher statistical power found for the four-graded scale for P-ADL but not I-ADL.
Both clinically as well as in research, it is important to have precise and sensitive instruments to evaluate change either on an individual client level or at a group level (Wæhrens, 2010). The last decade’s extensive research literature on instrument development testifies to this endeavor (Cordier et al., 2016; Yuen and Austin, 2014). Instruments for evaluating activities of daily living, occupational performance, and participation are core areas (Fisher, 2005; Søndergaard and Fisher, 2012), but instruments focusing on environmental issues are also important in occupational therapy (Iwarsson et al., 2012).
A strength of the present study is the use of data from actual persons rather than simulated data. However, a draw-back is that since the four-graded ADL staircase has been used only sparsely in longitudinal studies, we were restricted to data from selected populations. Whereas the ENABLE-AGE cohort comprises healthy older people, the ReSHA data are collected from people of all ages in need of housing adaptations. Thus, it may be suspected that we have an overrepresentation of very dependent and very independent people in the combined data set. We tried to overcome this by stratifying by baseline ADL. However, further studies are needed to evaluate the four-graded ADL staircase in a more mixed population.
In the present study, we used two alternative statistical methods to analyze change in ADL – one parametric (the t-test) and one non-parametric (the Wilcoxon signed rank test). We chose these in order to use all information available, that is, not only to look at change but also the magnitude of the change. Thus, the higher sensitivity of the four-graded scale found in the simulations relies on the use of a statistical method that utilizes all the information contained in the scale. That the t-test and the Wilcoxon signed rank test produced the same results suggests that even though the sum score of the ADL staircase is not inherently a continuous variable, the choice of a non-parametric test in the analyses does not seem crucial.
In summation, we investigated a version of the ADL staircase that included both objective aspects in terms of dependence/independence, and subjective aspects such as difficulty (Iwarsson et al., 2009). Such instruments have been found to provide complementary information and lead to a better understanding of the continuum of disability (Gill et al., 1998; Iwarsson et al., 2009) but have not been assessed regarding the possibility to detect longitudinal changes. By applying the fourth grade to the ADL staircase developed by Iwarsson et al. (2009), the rating-scale shifts from having focus on only one construct, dependency (objective), to also include an additional construct, difficulty (subjective). ADL instruments comprising response alternatives using both of these constructs have been reported (Barile et al., 2012; Schmucker et al., 2017), with difficulty indicating the step before/lower dependence on another person. Even though measurement instruments may include overlapping concepts (Altman, 2014), advantages and disadvantages with including an additional construct should be properly considered before the adaption of instruments.
Conclusion
The addition of a fourth grade in the ADL staircase appears to make the instrument more sensitive to detecting gradual changes in ADL over time. Hence, it would be recommended particularly for studies with a shorter follow-up period. Additionally, it could be useful for early detection of declining independence, allowing for preventive intervention before a person becomes dependent in ADL.
Key finding
Adding a fourth grade to the ADL staircase could increase the chances of detecting changes in ADL over time. The effect was most evident for shorter follow-up times.
What the study has added
Even though instruments measuring ADL using both objective and subjective aspects have been used previously, they have not been assessed regarding the possibility to detect longitudinal changes.
Footnotes
Research ethics
Ethical approval was obtained from the Regional Ethical Review Board in Lund, Sweden (2012/566 and 2002/324).
Consent
Participation in both studies was voluntary and all participants gave their written informed consent for inclusion before participating.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The ReSHA study was funded by the Swedish Research Council FORMAS (Agneta Malmgren Fänge, Lisa Ekstam, and Anna Axmon) and the Faculty of Medicine at Lund University (Agneta Malmgren Fänge and Lisa Ekstam). The ENABLE-AGE project was funded by the European Commission (QLRT-2001-00334). Funding for additional follow-ups of the Swedish ENABLE-AGE sub-sample was received from the Swedish Research Council for Health, Working Life and Welfare (Forte), the Swedish Research Council, and the Ribbing Foundation, Lund, Sweden.
Contributorship
Agneta Malmgren Fänge initiated and received funding for the ReSHA study. She and Lisa Ekstam applied for ethical approval and contributed to the development of the data. Björn Slaug was responsible for data management in the ENABLE-AGE study. Anna Axmon was responsible for the data management for the ReSHA data, as well as for the combined data and the statistical analyses. Anna Axmon wrote the first draft of the manuscript. All authors contributed to the interpretation of data and revisions of the manuscript. All authors approved the final version of the manuscript. Both studies were accomplished within the context of the Centre for Ageing and Supportive Environments (CASE) at Lund University.
