Abstract
Background:
For research purposes, there is a need for tools to assess an individual’s level of cognitive function. For survey-based investigations in nursing home contexts, proxy ratings allow the assessment also of individuals with severe cognitive impairment.
Objective:
The aim of this study was to describe the feasibility and psychometric properties of Gottfries’ cognitive scale when used in a nursing home context for proxy rating of cognitive function.
Method:
The psychometric properties of Gottfries’ cognitive scale were investigated in a sample of 8,492 nursing home residents in Västerbotten County, Sweden, using item response theory and classic scale theory-based approaches.
Results:
Cognitive function could be scored in 97.1% of the assessed individuals. The scale had a negligible floor effect, it had items with a large spread in difficulties, it appeared linear, and it distributed the assessed individuals equally over the scale. Internal consistency (Cronbach’s alpha) was 0.967, and an exploratory factor analysis revealed three factors of the scale – interpreted to represent orientation to time, to place, and to person.
Conclusion:
Gottfries’ cognitive scale is a feasible tool for grading cognitive function among nursing home residents using staff proxy ratings. The scale has excellent psychometric properties with a very high internal consistency, a favorable distribution of item difficulties producing an almost rectangular distribution of scores, and a negligible floor effect. The scale thus can be recommended for use in survey-based investigations in nursing home contexts.
INTRODUCTION
Major neurocognitive disorders and cognitive impairment are common among older individuals [1, 2]. Both in the clinic and for research purposes, there is a need for tools to screen for possible cognitive impairment and to assess the individual’s level of cognitive functioning. There are several scales available today, with the Mini-Mental State Examination (MMSE) [3] and Clinical Dementia Rating (CDR) [4] being among the most commonly used for both of these purposes.
However, for some research purposes, e.g., survey-based investigations, it might be a disadvantage that many currently used assessment scales require active participation by the individual being rated. This also contributes to a large floor effect for, for example, the MMSE in nursing home populations, where a significant proportion of the individuals have pronounced cognitive impairment to the degree that they do not understand and cannot follow instructions or they have severely impaired language function and thus get scores of zero [5]. Proxy-rating on the other hand means that for example a staff member or relative rate the cognitive function of an individual based on their knowledge and prior observations, without active participation of the individual being rated.
In 1968, Swedish researcher and psychiatrist Carl-Gerhard Gottfries and psychiatrist Ingrid Gottfries (brother and sister), both at that time at St. Lars Hospital in Lund, published a cognitive scale of 27 items for proxy rating of the orientation abilities of an individual [6–10]. Today we call the scale the Gottfries’ cognitive scale [10], and it is important to note that this scale should not be confused with the Gottfries-Bråne-Steen scale [11]. Gottfries’ cognitive scale is available in a recent English translation, published as an appendix to [10], and is free to use.
Gottfries’ cognitive scale is currently in use in different large-scale data collections in Sweden, not least a recurrent investigation of all nursing home residents in the county of Västerbotten, Sweden, conducted every 6th or 7th year since 1975 [2, 12–15], and in a national inventory of nursing homes in Sweden [16, 17]. The scale has also been used in several smaller research projects over the years. The scale has been tested for validity against the MMSE cut-off for cognitive impairment and for inter- and intra-rater test-retest reliability [8], but more elaborated psychometric analyses have not been published. Using data from the three latest rounds of the recurrent surveys to nursing homes in Västerbotten, Sweden [2, 13], the aim of this study was to describe the feasibility and psychometric properties of Gottfries’ cognitive scale when used in a nursing home context for proxy rating of cognitive function. The analysis was based on item-response theory (IRT) and classical scale theory approaches.
METHODS
Material
These analyses are based on three cross-sectional questionnaire surveys performed in 2000, 2007, and 2013 in the county of Västerbotten, Sweden. Results from these studies have been reported previously [2, 19]. The surveys included all those living in nursing homes in the county in May of the respective year, and the number of eligible people was 4,357 in 2000, 3,578 in 2007, and 3,210 in 2013. The response rates were 87.3%, 85.8%, and 70.5% and rated 3,804, 3,070, and 2,262 individuals, respectively. People in geriatric or psychogeriatric hospital wards (originally included in the 2000 and 2007 data collections) and those younger than 65 years or for whom no age was registered were excluded; thus 3,537, 2,820, and 2,135 individuals, respectively, were included giving a total of 8,492 individuals for the present analyses. There were 7,159 individuals with complete ratings on the Gottfries’ cognitive scale and 8,012 individuals where it could be determined if they had or had not cognitive impairment. The mean age was 84.2±7.0 years, and 5,755 (67.8%) were women. There were 5,458/8,012 (68.1%) individuals with cognitive impairment according to Gottfries’ cognitive scale ratings.
The Regional Ethical Review Board in Umeå, Sweden, approved the studies (registration numbers 00-170, 07-028M, and 2012-646-31M).
Procedures
The survey forms were sent to all nursing homes in the county. The instructions given stated that the staff member who knew each resident best should complete the assessment scales based on observations of the resident’s state during the preceding week. A short written set of instructions about how to carry out the assessments was included, and the staff were informed that members of the research team could be contacted by telephone to answer questions or provide additional guidance.
Assessments
Assessments were made using the Multi-Dimensional Dementia Assessment Scale (MDDAS) [8]. This scale includes assessments of functioning in activities of daily living (ADL), cognition, and behavioral and psychological symptoms as well as a registration of current drug prescriptions. The MDDAS has good inter- and intra-rater reliability [8].
An ADL score on the MDDAS was calculated based on assessment of dependence in dressing, hygiene, eating, and bladder and bowel control. All ADL categories were scored 1–5, except for bladder control, which was scored 0–4. Thus the ADL score varied from 4 to 24, where a higher score indicated greater ADL independence [8, 10].
Cognitive function was measured using a scale developed by Gottfries and Gottfries [6, 8–10]. A recent English translation is published as an appendix to [10]. This scale consists of 27 dichotomous items (questions answered by ticking boxes for “Yes” or “No”) related to various aspects of orientation. A one-sentence instruction was given directly above the scale, stating “NB! Answer to every question (even if you are unsure about it)”. A score of less than 24 is considered indicative of cognitive impairment, which correlates with a sensitivity of 90% and a specificity of 91% [9] to the usual 24/30 MMSE [3] cut-off.
Statistics
Score distribution was explored by plotting the frequency distribution of individuals receiving each score. Floor and ceiling effects were estimated as the difference between the number of individuals receiving a score of 0 and 27, respectively, and the mean number of individuals receiving a score of 1–26 in relation to the total number of individuals with complete scores.
An exploratory factor analysis (principal component analysis) extracting factors with eigenvalues of 1.0 or above, using Varimax rotation, was used to determine the factor structure of the scale. Pearson’s correlations between each factor and the sum score of the total scale were calculated.
A 2-parameter IRT-based analysis was performed using simple logistic regressions (with the score of the particular item as the dependent variable and the full scale sum score as the independent variable) to estimate the parameters difficulty (threshold, calculated as minus constant beta/item beta) and discrimination (steepness, calculated as item beta) for each item of the scale. Parameters from the logistic regressions were used to draw item characteristic curves as y = (1/(1 + e∧(–(constant beta+item beta * x))) for each item and to plot the difficulty vs. discrimination for each item.
The feasibility of the scale was evaluated as the proportion of rated individuals with answers to all 27 items and the proportion of individuals that could be scored using imputation of different numbers of items. The risk of having a missing value for a particular item was explored in relation to the total score (calculated using mean value imputation). For those with missing items, the Pearson’s correlation between the difficulty (as determined with the IRT analysis) of the missing item and the total score (using mean value imputation) was calculated.
Internal consistency was estimated using Cronbach’s alpha for the full scale. Item-to-total correlations were calculated using Pearson’s correlation.
The Pearson’s correlation between the score on Gottfries’ cognitive scale and a five-item ADL index ranging from 4 to 24 points was calculated. No other measure of cognitive function except the Gottfries’ cognitive scale was available in the material limiting the possibility to concurrent validity analyses. However, as the ADL capacity is generally worsening with the development of cognitive impairment, this could to some extent indicate the validity of the scale.
The IRT analysis was used to select four items for an optimized short screening scale. This short screening scale was evaluated for sensitivity and specificity to the full-scale cut-off score of 24/27, and a ROC curve was drawn.
Statistical calculations were performed using IBM® SPSS® Statistics Version 23.0 for Mac. Figures were drawn using Microsoft® Excel® 2011 for Mac, Version 14.7.1.
RESULTS
Score distribution and floor and ceiling effects
The distribution of sum scores on Gottfries’ cognitive scale is shown in Fig. 1A. The number of individuals receiving each score between 1 and 26 ranged from 136 to 295 with a mean of 198.8. A score of 0 was received by 254 individuals, indicating a negligible floor effect of (254–198.8)/7,159 = 0.8% in this sample. A score of 27 was received by 1,736 individuals indicating a ceiling effect of (1,736–198.8)/7,159 = 21.5% in this sample.

Item and scale characteristics of the Gottfries’ cognitive scale. A) Distribution of scores with the number of individuals receiving each score 0–27. B) Item characteristic curves for the 27 items calculated using logistic regressions. The difficulty measure (see Fig. 1C and Table 1) corresponds to the score when a particular item curve crosses the 0.5-line. The x-axis denotes the total score of the Gottfries’ cognitive scale. C) Discrimination (y-axis) in relation to difficulty (x-axis) for the 27 items. D) Mean item characteristic curves for the three factors as determined by the exploratory factor analysis. From left to right: orientation to person, orientation to place, and orientation to time. E) Linearity of the total scale: Sum of the 27 individual item characteristic curves. F) Correlation between Gottfries’ cognitive scale (x-axis) and independence in activities of daily living (y-axis, scores 4–24). Pearson’s correlation 0.682, (p < 0.001).
Factor structure
The results of the exploratory factor analysis are presented in Table 1. Three factors were extracted, accounting for 28.9%, 21.1%, and 18.3%, respectively, of the variance in the scores, for a total of 68.3% of the variance explained. The three factors were interpreted and named as orientation to time, to place, and to person, respectively. However, as seen below, the factor separation was probably more related to difficulty than to domain, and some more difficult place-related items were for example included in the first factor. The correlation between the sum scores of the three factors and the total score is also shown in Table 1.
Exploratory factor analysis of Gottfries’ cognitive scale. N = 7,159
aPearson’s correlation between scores of factor 1 and 2:0.828; 1 and 3:0.612; and 2 and 3:0.725.
IRT analysis
Although the factor analysis revealed three different factors of the scale, the overall cognitive function was deemed likely to be an underlying common trait, and IRT analysis using the full scale was therefore still considered appropriate. Using logistic regression, the difficulty and discrimination parameters and the item characteristic curve were determined for each item in the scale. The item characteristic curves are illustrated in Fig. 1B, and the difficulty and discrimination parameters of each item are presented in Table 2 and plotted against each other in Fig. 1C. The analysis showed that Gottfries’ cognitive scale contains items with a broad range of difficulties distributed over the scale. Importantly, the scale contains items with both very high and very low difficulty, and, as seen in Table 2 and Fig. 1C, also includes items with relatively high discrimination among those with the lowest and highest difficulties. By calculating the sum of all 27 item characteristic curves, the linearity of the full scale was determined as shown in Fig. 1E.
Item response theory analysis of Gottfries’ cognitive scale: difficulty and discrimination parameters. N = 7,159
The three factors of the scale differed in their mean item thresholds, as can be seen in Fig. 1D. The factors had mean difficulties of about 19.5 for factor 1 (orientation to time), 12.5 for factor 2 (orientation to place), and 4.5 for factor 3 (orientation to person).
Feasibility
There were 328/8,492 (3.9%) individuals for whom none of the items were answered. For a majority of these 328 survey forms, there was very little information filled in; for example, the ADL index of five items was not completed for 263/328 (80.2%) of these individuals. These missing values were therefore not regarded as related specifically to the Gottfries’ cognitive scale and were not considered when calculating feasibility measures. Of those with at least one item of the Gottfries’ cognitive scale completed, 7,159/8,164 (87.7%) completed all 27 items and 1,005 (12.3%) had one or more missing items. The frequency distribution of the numbers of missing values is shown in Fig. 2C. Most of the assessments with missing values had only a few items missing (764/1,005 [76.0%] had 1–3 missing values).

Characteristics of a suggested 4-item short screening scale and analysis of missing values. A) Characteristic curves for three different cut-offs (from left to right: 2, 1, and 0 negative answers allowed, respectively) of the suggested 4-item short screening scale. The vertical line illustrates the 24/27 cut-off for cognitive impairment in the full scale. B) ROC analysis for the suggested 4-item short screening scale. C) Number of individuals with different numbers of missing values. The 7,159 individuals with no missing values and those 328 individuals with all items missing are not shown. D) Correlation between the difficulty of an item (x-axis) and mean total score (y-axis, calculated using mean value imputation) for individuals with a missing value of that particular item on Gottfries’ cognitive scale. Pearson’s correlation 0.917, p < 0.001.
It is sometimes possible to determine if an individual has or does not have cognitive impairment according to the scale (cut-off<24/27 items) even with some missing items. If an individual has more than 4 negative answers or more than 24 positive answers, they can conclusively be placed in the group with and without cognitive impairment, respectively, irrespective of whether the remaining items are answered or not. Therefore, it could be determined for 8,012/8,164 (98.1%) of the individuals if they had or did not have cognitive impairment according to Gottfries’ cognitive scale.
Imputation strategies could be used for those with a few missing items (supposedly 1–3) in the scale to allow for grading of cognitive function of more individuals in the material. Imputation of up to three items increased the number of individuals with cognitive grading to 7,923/8,164 (97.0%), while imputation of only one item or two items gave 7,611 (93.2%) and 7,809 (95.7%) individuals, respectively. For a suggested imputation strategy see below.
Missing values
The number of missing values, among those where at least one item was answered, for each of the 27 items ranged from 26 (0.3%, item 1) to 219 (2.7%, item 4). There was no correlation between the total score (using mean value imputation for those with 1–3 missing items and including individuals with scores of 1–26) and the risk of having at least one missing item (β – 0.002, p = 0.906). However, there was a strong correlation (β 0.917, p < 0.001) between the mean total score (based on the other scored items using mean value imputation) for those with a particular item missing and the difficulty of this item as determined with the IRT analysis, see Fig. 2D. This means that missing values were more likely for items with a threshold near the individual’s cognitive level, i.e., when it might be hard to tell how an individual should be scored. Together this would suggest that an optimized imputation strategy would be to impute 0.5 instead of a missing item (for up to three missing items).
Internal consistency
Cronbach’s alpha for the scale was 0.967, i.e., excellent internal consistency.
Item-to-total correlations
Item-to-total correlations for each of the 27 items are shown in Table 1. Correlations ranged from 0.450 to 0.816. Items 1 and 13, which were also the two items with the lowest difficulty, had the lowest correlation to the full scale and were the only two items with correlations below 0.5. Four items had correlations between 0.5 and 0.7, and 21 items had correlations above 0.7.
Correlation with the five-item ADL index
The correlation with a five-item ADL index ranging from 4 to 24 points was calculated. With increasing cognitive impairment in major neurocognitive disorder, ADL functions are gradually decreased or lost, and this correlation could therefore be seen as an indication of the validity of the scale. The two scales had a significant positive correlation of 0.682 (p < 0.001). As can be seen in Fig. 1F, individuals with more pronounced cognitive impairment also had worse ADL function, hence supporting the validity of Gottfries’ cognitive scale.
Short screening using four items
Based on the IRT analysis above, the four items with highest thresholds (Table 2: items 16, 17, 18, 19 – all within the orientation to time domain) could be used for screening for cognitive impairment, i.e., to identify individuals with a total score of less than 24 indicating cognitive impairment. Out of the 8,164 individuals for whom at least one item on the Gottfries’ cognitive scale were answered, 97.5% had answered all four of these questions. Figure 2A shows the proportion of the sample rated with 2/4, 1/4, and 0/4 negative answers to these four items for different total scores of the Gottfries’ cognitive scale. The sensitivity and specificity for detecting cognitive impairment, allowing for different numbers of negative items, are shown in Table 3. A ROC curve for the 4-item short screening scale is shown in Fig. 2B.
Sensitivity and specificity of the suggested 4-item short screening scale
The suggested 4-item short screening scale comprises items 16, 17, 18, and 19, see Table 1.
DISCUSSION
Gottfries’ cognitive scale was used for staff proxy rating of cognitive function among nursing home residents. The present analysis of 8,492 individuals demonstrated several desirable properties of the scale for this purpose.
The scale appeared feasible in this context; 98.1% of the individuals in the nursing homes could be classified as having or not having cognitive impairment, and 87.7% had complete assessments. An easy-to-use imputation strategy could increase the proportion of individuals with a cognitive grading to 97.0% of the assessed population. These are impressive figures taking into account that the surveys were filled in by staff members without any training in the use of the scale. Staff members with different professional backgrounds filled in the survey, and more than 90% were licensed practical nurses or nursing assistants [20]. Having scales with few incomplete ratings is important for the quality of survey-based research. The instruction to the staff members that items should be answered even if they were unsure about it might have contributed to the high proportion of completed scales, however, to what extent this “forced-choice” instruction have impacted the validity and reliability of the assessments is not known. We would argue that a best guess by the proxy rater is in general better than any other imputation strategy at a later stage, but this property of the scale must be taken into account when interpreting the results.
The 27-item scale produces a quite fine-tuned grading of cognitive function, which has previously allowed us to analyze the often non-linear relationships between, for example, behavioral and psychological symptoms at different levels of cognitive function [17, 21–23]. The IRT analyses presented here showed that the scale contains items with difficulties spread out over the scale. The sum of the item characteristic curves was near linear, and this is probably the reason for why the individuals are spread so equally over the scale. In a previous comparison, we found that Gottfries’ cognitive scale distributed the individuals more equally over the scale than the MMSE [5], and this was probably a result of its linearity.
One important difference between the Gottfries’ cognitive scale and many other cognitive measures is that Gottfries’ cognitive scale does not seek to cover many different cognitive domains, but focuses on only one – orientation. Most cognitive scales have one or a few items from each of the many different cognitive domains. While this might be valuable when seeking to differentiate between different major neurocognitive disorders, it is not necessarily an advantage when it comes to grading cognitive function. The scales might become more heterogeneous and less consistent, and many items might, in fact, be of about the same difficulty in relation to the global cognitive level.
The purpose of a scale must also be taken into account. A scale developed as a screening tool for early cognitive symptoms will probably be a poorer instrument for grading different levels of cognitive impairment because it will probably contain many items with difficulties around the border between cognitive impairment and no cognitive impairment. However, because most screening scales have not been tested for item difficulties during development, many of them appear to have a quite large spread in item difficulties anyway and therefore might also be satisfactory for grading of function. On the other hand, if IRT analysis were to be used to choose items optimized for detecting the presence of cognitive impairment, the screening tools might become more powerful for this purpose. A very short screening of only four items gave a sensitivity of 97.8% and a specificity of 92.5% for cognitive impairment measured using the full 27-item scale in this study. These promising results, however, must be confirmed in investigations including unrelated measures and formal diagnoses. Still, the inclusion of items with low difficulty in general does not improve the properties of a screening tool. For example, the Dementia Screening Scale [24] has several items related to orientation to person and to place, and the MMSE [3] contains place-orientation items as well as even more low-difficulty items such as the naming of objects. Based on the results of this study, we suspect that most individuals who are oriented to time will also be oriented to place and person, while those not oriented to person or place will not be oriented to time either. Items have different difficulties, and there is also a specific hierarchy among them. Low-difficulty items will therefore contribute little meaningful information when it comes to screening for suspected cognitive impairment. IRT analysis thus appears to be a useful analysis tool for understanding such properties of a scale.
The validity of the cognitive grading was further indicated by the good correlation with a measure of function in ADL, which is known to be gradually more and more affected with the progress of a major neurocognitive disorder. This supports the idea that it might be useful to measure one central cognitive domain with many items of different difficulties instead of trying to cover many domains. To some extent at least, the functions in most cognitive domains are correlated on a population level as well as among individuals with varying degrees of cognitive impairment. Still, the restriction to one cognitive domain means that the scale is probably less accurate for individuals with good orientation but significant decline in other cognitive domains, and this limitation must be taken into account when interpreting the results of a survey using the scale. However, the scale is not intended for diagnostic purposes where more thorough and multi-dimensional assessments are needed.
Among individuals living in nursing homes are those with very severe cognitive impairment because individuals usually remain in these settings until their death in end-stage major neurocognitive disorder. At these stages, where the most basic brain functions also are affected, individuals cannot be assessed with any scale that requires their active participation. This produces a floor effect in such scales, among them the MMSE. In a nursing home population, a substantial proportion of individuals will get a score of 0 on the MMSE [5], even if there are still differences in cognitive function among them. Because the Gottfries’ cognitive scale is completed by staff members based on their knowledge about, and previous observations of the resident, all individuals can be assessed irrespective of their cognitive function. Combined with the fact that the scale also contains items with very low difficulty, this results in a negligible floor effect, which is a great advantage in this context.
An important strength of the analysis presented here is the large number of assessments in an unselected sample of nursing home residents. However, the fact that each individual was tested only once and the lack of diagnoses of major neurocognitive disorders and other cognitive testing limits the possibilities for a more elaborate validity and reliability analysis. Therefore, future testing of the Gottfries’ cognitive scale for validity against other scales for measuring cognitive impairment and clinical major neurocognitive disorder diagnoses are warranted, as well as further reliability testing including inter and intra-rater test-retest reliability.
Conclusion
Gottfries’ cognitive scale is a feasible tool for grading cognitive function among nursing home residents using staff proxy ratings. The scale has excellent psychometric properties with a very high internal consistency, a favorable distribution of item difficulties producing an almost rectangular distribution of scores, and a negligible floor effect. The scale thus can be recommended for use in survey-based investigations in nursing home contexts.
Footnotes
ACKNOWLEDGMENTS
The authors express their gratitude to Professor Carl-Gerhard Gottfries for the original development of the scale and for reading and verifying the description of the development of the scale. The data collections were supported financially by grants from the Swedish Dementia Association, the Swedish Foundation for Health Care Sciences and Allergy Research, the Field Research Centre for the Elderly in Västerbotten, the Detlof Foundation, Umeå University Foundation for Medical Research, Swedish Brain Power, and the County Council of Västerbotten.
