Abstract
Background
Current estimates of dementia and Alzheimer's disease incidence and prevalence are required to understand the health needs of the elderly.
Objective
We used two Australia cohort studies, administrative datasets, and data linkage techniques to estimate dementia rates in Australia.
Methods
The study used Australian Longitudinal Study on Women's Health and the Health in Men Cohort Study. Records of dementia were obtained from linked sources and incidence and prevalence estimates were produced. Capture-recapture methods were used to estimate numbers of dementia cases not identified through data linkage.
Results
There were 3399 (28.5%) men with dementia identified from any source and 3767 (34.8%) women. Rates of dementia incidence and prevalence were similar between sexes but were raised in men once estimates of unidentified cases were included.
Conclusions
Cohort studies and linked administrative data can be used together to produce current estimates of dementia prevalence and incidence comparable to other population estimates.
Keywords
Introduction
As life expectancy increases and the population ages, we would expect the number of people living with dementia to increase. In this setting it is important to access accurate health statistics to allow for the accurate planning and provision of health and care services.
Traditionally, to estimate dementia prevalence and incidence, population-based surveys including cognitive testing and informant interviews regarding the course of cognitive and functional decline are conducted. However, these studies are costly and have been hampered by the inability to obtain acceptable response fractions in surveys in many parts of the world even prior to the COVID-19 pandemic. 1 People with cognitive impairment are less likely to respond to such surveys, and people with more severe disease have often moved to residential aged care facilities (RACFs) which may also reduce response rates.2,3 It has been reported that at least 30% of all Australians with dementia currently live in RACFs. 4 For these reasons alternative approaches to identify people diagnosed and living with dementia are required.
Administrative datasets are a potential source of information on rates of dementia prevalence and incidence. In Australia, for example, these include hospital and emergency department records, pharmaceutical prescriptions, aged care assessments and death certificates. However, each of these data sources has limitations for determining dementia rates. As the purpose of these datasets is often associated with funding, changes in funding policies may affect the overall case ascertainment. Undercounting is more plausible than overcounting in most data sets. 5 Uncertainty about the accuracy of each data source is a barrier in their use to calculate the overall prevalence and incidence of dementia. For example, changes in the degree of undercounting over time would limit the use of these datasets to identify real secular changes in the number of people with dementia that may result from specific preventative activities or risk factor reductions.
One method to overcome the limitations of administrative datasets is to use techniques to estimate the undercounting of cases and hence improve the accuracy of dementia prevalence and incidence data. We describe the linkage of administrative datasets to data from population-based cohort studies. Using the Health in Men cohort study (HIMS) 6 and the Australian Longitudinal Study on Women's Health (ALSWH), 7 we apply capture-recapture methods to estimate the prevalence and incidence of dementia. 8 We also compare these estimates with others obtained by different methods and from different countries.
Methods
There are several different criteria used to define dementia. The World Health Organization's 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) defines dementia as a syndrome due to disease of the brain that is usually chronic or progressive in nature characterized by impairment of multiple higher cortical functions, including memory, and accompanied by deterioration in function, emotional control, social behavior, or motivation. 9 Common specific types of dementia listed include dementia in Alzheimer's disease (AD), vascular dementia, dementia in Lewy body disease, dementia in Parkinson's disease, and dementia in Pick's disease. The ICD-10 classification system includes the category of ‘Dementia Not Otherwise Specified (DNOS)’, allowing for the diagnosis of dementia without requiring clinicians to specify a type of dementia. In this study we used different administrative records and self-reported survey data to identify records of dementia.
Data sources: Administrative data
Hospital data
In Australia, the National Hospital Morbidity Dataset (NHMD) and State-based hospital data sources comprise episode-level records from admitted patient data collection systems in public and private hospitals. 10 In some States, data from emergency department (ED) attendance and mental health services are also available; for example, these records are included in the Western Australian Data Linkage System. 11
Pharmaceutical benefits scheme
The Pharmaceutical Benefits Scheme (PBS) dataset includes prescriptions for dementia-specific medications which can only be prescribed on specialist authority to receive a government sponsored subsidy. 12
Aged care assessments
In Australia, many people with dementia use government subsidized aged care services, and two administrative datasets provided information about dementia diagnoses. 13 The Aged Care Assessment Program (ACAP) minimum data set contains information relevant to assessing older peoples’ needs for home care support or for admission to residential care. The Aged Care Funding Instrument (ACFI) is an assessment used to determine the level of Australian government subsidies for permanent RACF. Data contained within the ACFI includes up to three mental and behavioral health conditions and up to three other health conditions. There are no specific requirements for the person within a RACF, or someone acting on their behalf, to complete or update the ACFI.
Causes of death
The National Mortality Database (NMD) records deaths in Australia since 1964. 14 This database includes multiple causes of death and sex and age and date at death.
Self-reported survey data
The HIMS and ALSWH also collected self-reported data on dementia. Unlike the other administrative records these survey responses were not based on a medical diagnosis but were collected from survey responses or contact response follow-up from either a participant or a proxy assisting the participant.
The specific diagnosis codes and question items used to identify dementia records in each study are documented in Supplemental Table 1. These data sources were linked to identify whether a participant had a record of dementia during the follow-up period. In addition, the patterns of overlap between the different sources were assessed to try to estimate the level of underreporting of dementia.
Data sources: Cohorts
HIMS and ALSWH cohorts were used as starting points to assess rates of dementia. The full details of these cohorts have been published elsewhere.6,7 Briefly, HIMS commenced with participants recruited for a randomized controlled trial of screening for abdominal aortic aneurysm in Perth, Western Australia. In 1996–1998, using the electoral roll, the study recruited men aged 65 to 79 years who were not in long-stay institutional accommodation. Of those randomized to an invitation to be screened, 1846 were ineligible, 5303 did not respond or refused, and 12,203 were screened. A number of participants died between screening and start of follow-up, and analysis was limited to those who had reached the age of 65 at start of follow-up, reducing the number included in these analyses to 11,923. In ALSWH three cohorts of women were randomly selected in 1996 from the electronic records of Medicare, the universal health insurance scheme. In the oldest cohort, born in 1921–1926 and aged 70 to 75 years at baseline, 39,000 women were invited to participate; of these 1100 were not contactable and 2366 were ineligible. Of the remaining 35,534 women, 12,432 responded.
The HIMS had access to the following data sources to identify records of dementia: HIMS (self-report) questionnaires, WA cancer registration records, hospital admitted patient data, ED attendance records, cause-of-death data from electronic death certificate data for WA, ACAP and ACFI, and access to mental health services data. The statistical assumptions underlying the capture-recapture methods specifically requires data from different sources to be independent, so due to substantial overlap in cases we combined the information from hospital admissions and ED attendances (henceforth called hospital data), and from ACAP and ACFI assessments (called Aged Care data). In addition, information from the cancer registry was essentially a duplication of that received from death registrations for cancer patients. Therefore, five source categories of data were included in the capture-recapture analyses in the HIMS analyses.
For ALSWH we used five data sources: hospital admitted patient and emergency department data, PBS data on dementia-specific medications, aged care data (from ACAP and ACFI), self-reported survey data, including free text descriptions and information provided by proxies, and cause of death data. 15
Based on the date ranges of available datasets the ALSWH analysis was undertaken between the dates 1 January 2002 and 30 June 2016. The HIMS analysis was undertaken for dates between 1 April 1996 and 31 December 2016.
Data linkage
For the HIMS cohort, data linkage was performed using the Western Australian Data Linkage System. 11 This allowed linkage to the WA Hospital Morbidity, Mortality, and ACAP datasets. Additional linkage through the Australian Institute of Health and Welfare National Aged Care Data Clearing house 13 provided access to ACFI data for all men admitted to residential care.
For the ALSWH cohort, the Medicare personal identification number used in the sample creation enabled deterministic record linkage between the survey and the PBS data. 12 For women in this age group most prescriptions are subsidized, so the medication records are likely to be complete. The aged-care records were linked through both name-based and key-based linkage techniques and these linkages to the ALSWH data were estimated to have a sensitivity over 94% and a positive predictive value above 96% (Australian Institute of Health and Welfare (AIHW) communication). The cause of death 16 data is linked using probabilistic matching methods based on the name, and date of birth, and gender of participants. Hospital admissions datasets 10 were also linked using probabilistic matching.
Analyses
The linked and self-reported data sources were used to identify the total number of participants with a dementia record (from any of the available data sources), and to assess the overlap between these sources. Based on these patterns of overlapping sources capture-recapture analysis was used to estimate the number of people with dementia not identified from the linked sources.8,17 Use of the method in this context has been previously described 8 and is detailed fully in the Supplemental Methods section.
Incidence and prevalence rates of dementia were calculated based on the observed numbers of cohort members identified from the linked data as having dementia, and also after including the estimated number with dementia not identified in the records. To estimate the ages of the unidentified cases, diagnoses of dementia were randomly assigned to an identical number of cohort participants still alive in each age group at the relevant time period and who did not have a record of dementia from any source. 18 This random allocation was run 20 times, based previous estimates of a 20% undercount of dementia cases. 19 The average rates of incidence were calculated as the number of new dementia records which occurred each year divided by the person years to diagnosis, death, or censoring for each interval. Prevalence rates were calculated based on the number of persons living with dementia divided by the total number of persons alive at any time during each age period of interest. Smoothed versions of the capture-recapture prevalence and incidence rates were plotted based on predictions from a Poisson model which modeled rates by single year of age using a quadratic polynomial Poisson model to allow a non-linear (curved) association.
Comparison with other studies
The prevalence and incidence rates were compared to incidence and prevalence rates from other studies including a large Australian study based on administrative data 20 for New South Wales, and worldwide meta-analyses.21,22 Five yearly age-specific incidence and prevalence estimates from ALSWH and HIMS were combined using the 2010 Australian population age-sex distribution. 23 The age-sex specific prevalence estimates from Cao 22 were combined using WHO 2020 age-sex population data from high income countries. 24
Results
In this analysis the HIMS cohort included 11,923 men and had a follow-up period of 20.25 years. The ALSWH cohort included 10,835 women followed for an average period of 10.6 years. There were 3399 men with dementia identified from any source in the HIMS cohort (28.5%). In the ALSWH cohort 3767 dementia cases were identified (34.8%). Table 1 shows the number of cases of dementia identified from each source. In both studies the most common source of dementia diagnosis was the aged-care data (HIMS 77.5% and ALSWH 78.5%), followed by the hospital admissions data (68.2% and 64.5%) (Table 1). Average age at diagnosis varied by data source and sex and was between 83 and 86 years, with the exception of HIMS records from mental health services (mean age 80.5).
Number of dementia records identified from each source.
Hospital and ED data combined: 2318 (68.2%; 83.5 (5.3)).
Death certificate and cancer data combined: 1147 (33.7%; 83.1 (5.3)).
Most cases were identified from multiple sources (see example data in Supplemental Table 2).
Using capture-recapture methodology, we estimated the potential undercounting by age group. The estimated number of unidentified dementia records was 1318 in the HIMS cohort and 400 in ALSWH, indicating undercounts of 38.8% and 10.6% respectively (Table 2). The estimates from the separate analyses used to produce the numbers in Table 2 are provided in Supplemental Tables 5 to 9).
Numbers of observed cases and estimated numbers of unidentified cases by age for the HIMS and ALSWH cohorts.
The estimates of unidentified cases in the ALSWH study were relatively consistent across ages 75 to 89 between 7.1% and 8.9%, with a larger estimate of the undercount in the 90 + age group of 24.5%. The undercount estimates were larger and more variable in the HIMS study, ranging from 24.0% to 56.9%. The exclusion of outlying estimates above the 90th percentile only marginally altered the estimates produced, with the exception of the 75 to 79 age category from the HIMS study where the undercount estimate was reduced from 79.6% to 56.9% (from 473 to 338 records) (Table 2 and Supplemental Table 4).
We used these data to generate the incidence and prevalence rates estimates of dementia by age for men and women, using the observed numbers and numbers adjusted for undercounting (Figures 1 and 2). The estimates of incidence for the HIMS and ALSWH studies are very similar, although above 90 years of age the predicted incidence was higher for women than men. The prevalence estimates are initially lower in the ALSWH cohort as no women had dementia at the beginning of the follow-up period. In both men and women, the prevalence of dementia increased steadily with age. Although the observed prevalence rates were similar between men and women, the estimated prevalence was higher for men due to larger estimates of unidentified cases from the capture-recapture analyses.

Incidence of Dementia from HIMS and ALSWH cohorts. Dotted points are observed rates, solid (HIMS from age 65 and ALSWH from age 75) lines are smoothed estimates after capture-recapture.

Prevalence of Dementia from HIMs and ALSWH cohorts. Dotted points are observed rates, solid (HIMS from age 65 and ALSWH from age 75) lines are smoothed estimates after capture-recapture.
Figures 3 and 4 show how the estimates of incidence and prevalence from the HIMS and ALSWH cohorts compared with another Australian and two international studies. The incidence estimates presented are broadly consistent with other studies, 20 but the estimates for the 90 + ages from Prince et al. based on High Income Countries are highest of the studies presented (Figure 3). 21 Above the age of 80 the prevalence rates per 100 person (%) estimates produced from HIMS and ALSWH are higher than the age specific rates presented from the meta analyses figures produced for Australia 21 and the international estimates produced by Cao (Figure 4). 22

Dementia incidence estimates by study. M&F: male and female; CC: capture-recapture; HIMs and ALSWH rates pooled using 2010 Australian age-sex population distribution.

Dementia prevalence estimates by study. M&F: male and female; CC: capture-recapture; HIMs and ALSWH rates pooled using 2010 Australian age-sex population distribution, Cao Male and Female rates pooled using 2020 WHO High Income Country age-sex population distribution.
Discussion
We describe the use of two separate population-based cohorts in determining incidence and prevalence of dementia. By using linked administrative data there was no requirement for the presence of dementia to be detected by continued examination of cohort participants. Each individual administrative dataset only detected a proportion of the cases of dementia, but a combination of all data sources allowed us to estimate the number of dementia cases which occurred in each age group. Further, by using analytical techniques we were able to calculate the extent of undercounting, that ranged between 11% and 39%, and these adjustments to our calculations produced incidence and prevalence estimates broadly similar to those found previously through synthesis of estimates from the literature or data linkage.20–22 However, the dementia prevalence estimates gained were higher than recent Australian estimates obtained from the 2021 Census 4 and the NPS MedicineWise Survey, 25 which highlights the range in prevalence estimates obtained from difference sources with different methodologies and the uncertainty regarding the level of prevalence at the oldest ages (above 90). 26
A recent meta-analysis has shown age specific incidence and prevalence rates of dementia to be higher in women than men. 27 This finding was not replicated in our analyses, where rates based on identified cases were similar between men and women, but prevalence rates which included estimates of unidentified dementia were higher in men. This higher estimate of dementia prevalence in men was an unexpected finding which is likely due to differences in the cohort characteristics of the HIMS and ALSWH studies, the differences in the sources of linked administrative datasets used in each study, and differences in the overlapping patterns of dementia record identification from linked data sources in each cohort.
There have been considerable barriers in the direct estimation of dementia incidence and prevalence. General population sampling is becoming more difficult with evidence of diminished response fractions in older people with frailty 28 and specifically for those with cognitive impairment. 1 Concerns have also been raised about the healthy volunteer and other biases associated with use of Large Scale Databases. 29 Using administrative datasets together with cohort study data we were able to determine the potential undercounting of cases of dementia within these datasets and thus correct for this bias. It is clear that some methodological issues remain. In both the HIMS and the ALSWH cohorts the observed incidence and prevalence of dementia consistently rises up to age 88 (see Figures 1 and 2). Above this age, rises in observed incidence and prevalence slow in both the male and female cohorts, however, these rates at the oldest ages are based on smaller numbers due to mortality and participants reaching the censoring date (end of follow-up). This raises the question as to whether cognitive disability in very advanced old age may not be attributed to dementia but ascribed to the presence of ‘old age’, a diagnostic process that has been criticized. 30 Alternatively, the unexpectedly low rates of dementia in the oldest ages may be artefactual due to the delay in diagnosis with the aged care datasets, an advanced age that the youngest members of the cohorts may not have reached. Also, there are concerns that population estimates at these oldest ages are inaccurate. 31
There are other limitations in this approach. These administrative data are not collected in a standardized fashion to make a diagnosis of dementia. Many of the diagnoses are made by non-medical practitioners, often relying on diagnoses made by other health professional using a wide variety of diagnostic methods. Even for AD dementia specific medications, prescriptions may be dispensed for individuals with AD but without dementia, according to criteria originally meant for research purposes only. 32 The diagnostic strategies in these diverse settings may change over time resulting in instability in the degree of undercounting, and this distortion of the estimates. The use of self-reported survey data to identify some cases of dementia meant that we included a source which did not require a standardized medical diagnosis; however, this was one of five sources used which identified a smaller number of records than the aged-care, hospital, and death certificate sources, and was very rarely the sole source of a dementia record. In addition, the linked data analysis presented has not presented information on subtype, severity and duration of dementia. Information on dementia subtype was available from selected sources (specifically hospital, aged care and cause of death data). However, further work is required to understand how subtype coding practices and quality may vary between sources.
The use of administrative data to estimate dementia rates for population-based cohorts has promise. This allows observation of diagnoses with little financial costs and no participant burden. Having identified records of dementia through administrative sources cohort studies may be able to explore dementia risk factors 33 together with detailed examination and quantification of likely biases. Such studies may allow the estimation of population prevalence and incidence of dementia.
Supplemental Material
sj-docx-1-alz-10.1177_13872877241291139 - Supplemental material for Estimating the rates of dementia using administrative data linked to cohort studies
Supplemental material, sj-docx-1-alz-10.1177_13872877241291139 for Estimating the rates of dementia using administrative data linked to cohort studies by Michael Waller, Leon Flicker, Patrick Fitzgerald, Osvaldo P Almeida, Kaarin J Anstey and Annette J Dobson in Journal of Alzheimer's Disease
Footnotes
Acknowledgments
The authors acknowledge the Departments of Health and Veterans’ Affairs, and Medicare Australia for providing the aged care data, and Pharmaceutical Benefits Scheme data, and the Australian Institute of Health and Welfare as the integrating authority. They acknowledge the assistance of the Data Linkage Unit at the Australian Institute of Health and Welfare for undertaking the data linkage to the National Death Index.
They also acknowledge: The Centre for Health Record Linkage (CHeReL), NSW Ministry of Health and ACT Health, for the NSW Admitted Patients, Emergency Department; and the ACT Admitted Patient Care and Emergency Department Data Collections; Queensland Health as the source for Queensland Hospital Admitted Patient, and Emergency Data Collections; and the Statistical Analysis and Linkage Unit (Queensland Health) for the provision of data linkage; The Department of Health Western Australia, including Data Linkage Services WA and the Data Custodians of the WA Hospital Morbidity and Emergency Department Data Collections; SA NT DataLink, SA Health, and Northern Territory Department of Health, for the SA Public Hospital Separations, SA Public Hospital Emergency Department, NT Public Hospital Inpatient Activity and NT Public Hospital Emergency Department Data Collections; The Department of Health Tasmania, and the Tasmanian Data Linkage Unit, for the Public Hospital Admitted Patient Episodes and Tasmanian Emergency Department Presentations; Victorian Department of Health as the source of the Victorian Admitted Episodes Dataset and the Victorian Emergency Minimum Dataset; and the Centre for Victorian Data Linkage (Victorian Department of Health) for the provision of data linkage.
The research on which this paper is based was conducted as part of the Australian Longitudinal Study of Women's Health by the University of Queensland and the University of Newcastle. We are grateful to the Australian Government Department of Health for funding and to the women who provided the survey data.
Author contributions
Michael Waller (Data curation; Formal analysis; Methodology; Writing – original draft); Leon Flicker (Conceptualization; Writing – original draft; Writing – review & editing); Patrick Fitzgerald (Data curation; Formal analysis; Methodology; Writing – original draft; Writing – review & editing); Osvaldo P Almeida (Conceptualization; Investigation; Writing – review & editing); Kaarin J Anstey (Investigation; Writing – review & editing); Annette J Dobson (Conceptualization; Investigation; Methodology; Supervision; Writing – review & editing).
Funding
The authors received support from the Australian National Health and Medical Research Council Grant Boosting Dementia Research Grants Project no. 1171319. Leon Flicker is in receipt of a Practiitoner Fellowship funded by the Medical Research Future Fund APP 1155669. Kaarin Anstey is funded by ARC Fellowship FL190100011.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
The HIMS data supporting the findings of this study are available on request from the HIMS investigators. The data are not publicly available due to privacy and ethical restrictions. The ALSWH data used in the study can be accessed via a request to the ALSWH data access committee
. There are restrictions on how the linked datasets can be accessed due to ALSWHs responsibilities to the linked data custodians.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
