Abstract
Background:
Many health systems are interested in increasing the number of uncomplicated and typical dementia diagnoses that are made in primary care, but the comparative accuracy of tests is unknown.
Objective:
Calculate diagnostic accuracy of brief cognitive tests in primary care.
Methods:
We did a diagnostic test accuracy study in general practice, in people over 70 years who had consulted their GP with cognitive symptoms but had no prior diagnosis of dementia. The reference standard was specialist assessment, adjudicated for difficult cases, according to ICD-10. We assessed 16 index tests at a research clinic, and additionally analyzed referring GPs clinical judgement.
Results:
240 participants had a median age of 80 years, of whom 126 were men and 132 had dementia. Sensitivity of individual tests at the recommended thresholds ranged from 56% for GP judgement (specificity 89%) to 100% for MoCA (specificity 16%). Specificity of individual tests ranged from 4% for Sniffin’ sticks (sensitivity 100%) to 91% for Timed Up and Go (sensitivity 23%). The 95% centile of test duration in people with dementia ranged from 3 minutes for 6CIT and Time and Change, to 16 minutes for MoCA. Combining tests with GP judgement increased test specificity and decreased sensitivity: e.g., MoCA with GP Judgement had specificity 87% and sensitivity 55%.
Conclusions:
Using GP judgement to inform selection of tests was an efficient strategy. Using IQCODE in people who GPs judge as having dementia and 6CIT in people who GPs judge as having no dementia, would be a time-efficient and accurate diagnostic assessment.
The original protocol for the study is available at https://bmcfampract.biomedcentral.com/articles/10.1186/s12875-016-0475-2
INTRODUCTION
Enhancing the role of general practitioners (GPs) in making a diagnosis of dementia in uncomplicated cases is a priority in the UK [1], but a barrier for GPs is choosing between the wide variety of brief cognitive assessments (BCAs) that are available [2, 3]. There is little evidence to help GPs choose between tests; indeed, national guidelines recommend different tests [4].
GPs use time as a diagnostic test [5], and dementia is a progressive disorder which can be difficult to diagnose early in the disease process. However, some people wait a long time for a diagnosis after presenting with symptoms largely due to practical and logistical difficulties in accessing specialist expertise [6]. GPs making a formal diagnosis of dementia in uncomplicated cases can reduce anxiety, avoid unnecessary waiting and investigations for alternative diagnosis [7], and is often needed to access additional social care and support (albeit this being an aspect of system design rather than necessity). Most people who are consulting about memory problems would want to know if they had dementia [8]. There may also be benefits to explicitly recognizing more advanced dementia in people who have lost capacity and insight, as this can prompt a holistic re-evaluation of care-goals and avoid unnecessary tests and treatments [9]. There are currently no widely available disease modifying treatments for dementia, and a focus on highly sensitive tests that does not “miss” any cases of dementia in primary care may result in an overwhelmed secondary care service with unclear benefits for individual patients and families [10].
The available evidence typically has limited applicability for GPs evaluating people with possible dementia in primary care. Firstly, current studies often report diagnostic accuracy in an asymptomatic population and therefore are more applicable to screening or case finding [11–13] than helping to support a diagnosis in those presenting with clinical symptoms, who have higher pre-test probability of disease. Secondly, studies often investigate accuracy for the target condition of all cognitive impairment [14], rather than specifically dementia, typically with the implicit assumption that all people with cognitive impairment including both those with dementia and those with mild cognitive impairment (MCI) require specialist referral. Thirdly, studies typically investigate a single test in isolation, rather than making direct comparisons between tests in the same study [15], allowing only indirect comparisons. Fourthly, studies often investigate the accuracy of tests in (high prevalence) specialist settings rather than in the (relatively low prevalence) general practice population, and test accuracy is related to prevalence and the spectrum of disease severity [16]. Finally, existing studies tend to evaluate the accuracy of tests alone rather than when combined with GP judgement which is what happens in “real world” clinical practice.
To address the limited applicability of existing studies to primary care, we conducted a study in people with symptoms of possible dementia who had consulted about these with their GP. The aim of this study was to quantify the test accuracy of a range of non-specialist candidate tests, suitable for use in a GP clinic for the evaluation of cognitive symptoms, alone and in combination with GP judgement, compared to a reference standard specialist assessment according to ICD-10 criteria.
METHODS
Population
We recruited participants consecutively from 21 participating GP clinics from a total 82 practices in the Bristol, North Somerset, and South Gloucestershire area between March 2015 and May 2017. Research clinics took place in four participating GP clinics. A minimum sample size of 200 was needed, based on a specificity of 95% in prior studies and a 75% prevalence of dementia in local memory clinic data [17].
Participants were people with cognitive symptoms but no prior diagnosis of dementia, who were aged at least 70 years and had been referred by their GP to this research study. Cognitive symptoms were not specified but generally include disturbance in memory, language, executive function, behavior, or visuospatial skills [18]. Symptoms were required to be present for at least six months, and could be reported by the person themselves, a family member, a professional, or another person; there was no severity threshold. Symptom duration was determined from the clinical history. Cognitive symptoms did not need to be the main focus of the consultation with the GP: an enquiry about cognition could be initiated if there was a perception of a problem.
People were excluded if they had a known neurological disorder (i.e., Parkinson’s disease, multiple sclerosis, learning disability, Huntington’s disease), were registered blind, or profoundly deaf (i.e., unable to use a telephone), had a psychiatric disorder requiring current secondary care input, or if cognitive symptoms were either rapidly progressive or co-incident with neurological disturbance. People with cognitive problems that were so advanced that they were unable to consent were excluded.
GPs were encouraged to refer a consecutive series of eligible patients with cognitive symptoms. The research team took written consent from all participants. An accompanying informant was mandatory, and informants also gave written informed consent to participate. All participants were offered free accessible transport and translation services. All methods were carried out in accordance with relevant guidelines and regulations including Declaration of Helsinki. The National Research Ethics Service Committee London –Bromley (reference 14/LO/2025) gave a favorable ethical opinion on 25 November 2014. NHS Research and Development approvals were granted by Avon Primary Care Research Collaboration on behalf of Bristol, North Somerset and South Gloucestershire clinical commissioning groups. The University of Bristol acted as Sponsor.
GP judgement
The referring GP recorded their clinical judgement using an electronic referral form during a consultation with their patient. GP judgement was operationalized as normal, cognitive impairment not dementia (CIND), or dementia as options for response to the question “Is your gut feeling that this person has...”. GPs were not specially trained, were not required to arrange any test, and could refer people simultaneously or subsequently to NHS services. We specifically instructed GPs that they did not need to use any prior cognitive test (we did not mention the index tests by name as we judged on balance this could increase the risk they may be used). The study team contacted the practice at least three times to obtain any missing referral data.
Index tests
The index test battery was selected following a review of the literature and on the basis of the following criteria: not copyright and either previously evaluated in primary care or not previously evaluated in primary care but of interest [19]. The index assessment included eight brief cognitive assessments, three physical tests, and five informant evaluations. Index tests were all performed by the same examiner (STC), a medical doctor undertaking postgraduate training in general practice, and a PhD in diagnostic tests. We conducted index tests as instructed by the original authors and followed instructions of the original test authors for clock scoring. We used prespecified test thresholds which are provided in Supplementary Table 1. We calculated time taken for each test in minutes from the start to end time of each test. The eight brief cognitive assessments were the Memory Alteration Test (M@T) [11]; Eurotest [20]; Phototest [21]; Scenery Picture Memory Test (SPMT) [22]; Six item cognitive impairment test (6CIT) [16]; General Practitioner Assessment of Cognition (GPCOG) [23]; Mini-cog [24]; and Time and Change (T&C) [25]. The three physical tests were Timed Up and Go (TUG) [26, 27]; Extra-Pyramidal Signs Scale (EPSS) [28]; and Sniffin’ sticks [29]. The five informant questionnaires were the Pfeffer Functional Activities Questionnaire (FAQ) [30]; Lawton instrumental activities of daily living (IADL) [31]; Katz activities of daily living (ADL) [32]; 8-item AD8 [33]; and short form Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE) [34]. Where possible, items were not repeated, e.g., the 6CIT [16] and the GPCOG [23] both require the recall of a 5-item name and address, and to avoid burdening and potentially confusing participants this item was done once and then scored separately for eachtest.
The Montreal Cognitive Assessment (MoCA) [14] was initially not included in the index battery as it was originally designed to diagnose or identify MCI, had been advocated for use in secondary care [35] and had not been investigated in primary care [36]. We revised the protocol in light of subsequent policy changes in 2015 that encouraged GPs to diagnose dementia in typical situations without referring to a specialist [7] using the MoCA as the preferred instrument. We replaced the M@T with the MoCA because we judged that including both the MoCA and the M@T would be overly burdensome for participants and have little added value. We imported Sniffin’ sticks on special order and so we added these at a later stage to avoid delaying the start of recruitment while waiting for this single test.
Excluding Sniffin’ Sticks, we randomly assigned the order of the index tests in the battery for each participant to avoid the effect of order influencing test accuracy. The examiner offered each participant the chance to undertake every test in the battery but was responsive to the participants if they appeared to be becoming tired or distressed.
Reference standard
At the research clinic, a single specialist physician (JH) with more than 20 years’ experience in the field of dementia conducted a standardized assessment, including the Addenbrooke’s Cognitive Examination (ACE) III [37], Brief Assessment Schedule Depression Cards (BASDEC) [38], and the informant-completed Bristol Activities of Daily Living (BADL) Questionnaire [39], lasting approximately 60 min. The specialist was not aware of other test results such as clinical judgement of the referring GP, research clinic index tests, or any investigations. We randomly allocated participants to see the specialist before or after the index tests. All participants were offered, and encouraged, to have a gentle break of 10–20 min for a drink and snack between sessions. A second specialist, who had access to the primary care medical record for six months after the research clinic follow-up, as well as all information available to the primary specialist, adjudicated borderline cases. The reference standard was an integrated expert assessment according to ICD-10 criteria [40] for each individual patient, and specific test thresholds were not used; people with CIND were included in the “normal” group for evaluation of test accuracy since we were specifically interested in the test accuracy for dementia. We used the term CIND in the reference standard, while also classifying MCI, for consistency with GP judgement who classified against CIND (GPs being generally unfamiliar with MCI criteria). Study data were electronically entered and managed using REDCap (Research Electronic Data Capture) hosted at the University of Bristol [41].
Statistical methods
We calculated a potentially eligible population by indirectly standardizing the population at risk of dementia in recruiting practices based on age specific incidence of dementia [42] and GP list size [43]. We used a regression model with total ACE-III score as the dependent variable and categorical randomly allocated assessment order (index battery first or specialist assessment) to investigate a possible effect of assessment order on test scores. We calculated the median and range ACE-III total scores for people who were classified by GPs as being normal but in fact had dementia, compared to people who were correctly classified as having dementia. We classified test duration by the 95th centile of test duration, the time which clinicians could expect 95% of people to complete the test. We calculated measures of test accuracy (sensitivity, specificity, likelihood ratios, predictive values) together with 95% confidence intervals. To determine the effect of combining tests with GP judgement we calculated the accuracy of each test combined with GP judgement so that the combined test positive was taken as being both GP judgement positive and the other test positive; and the combined test negative was taken as either both or one of judgement/test negative. We also calculated the diagnostic accuracy when using an approach of “stratified sequential testing”, where clinical judgement determines what further test should be done, by calculating the diagnostic accuracy in cross-tabulations that were restricted based on GP Judgement. We evaluated how GP judgement influenced the discrimination of the index tests by calculating the AUROC (area under the ROC curve) stratified by GP judgement.
In an exploratory analysis, we used a bootstrapping procedure with 1000 replications to compare the numbers misclassified, either as false positives or false negatives, from different combinations of cognitive tests, using either an unstratified approach (one cognitive test) or a stratified approach depending on GP judgment. Test combinations for the exploratory bootstrap procedure were chosen on the basis of the three tests with the greatest number of true positives and true negatives for each stratum (unstratified by GP Judgement, GP judgement dementia, GP judgement not dementia).
In practice there is a finite amount of resource available to assess people with symptoms of dementia, and longer assessments mean that fewer people can be evaluated. To account for the trade-offs between test duration and accuracy, we derived the numbers of people that a full time (37.5 hours a week) NHS Memory clinician could classify in a population of up to 1000 people, using the (simplified but implausible) assumption that all working time was spent administering cognitive assessments. To derive these figures, we used sensitivity and specificity of each index test stratified by GP judgement, together with 95th centile of test duration also stratified by GP judgement (to account for test duration being longer in people who GPs suspected of having dementia). For informant completed tests which can be completed independently of a clinician we used the time for the shortest duration brief cognitive assessment, since a practical implementation would be for an informant to complete their questions while a clinician evaluated the patient.

STARDdem flowchart for inclusion of participants in the study. †One person had to withdraw part way through the reference test as they were acutely ill. ‡Dementia according to ICD-10 [40]. §Of 61 with CIND, 59 met criteria for Petersen MCI [44], 1 affective disorder, 1 brain injury). ¶One person met criteria for ICD-10 dementia and also had features of normal pressure hydrocephalus. Expert review endorsed a reference standard diagnosis of dementia. People with advanced cognitive problems who could not consent were not eligible and were not referred by GPs.
All analysis was done using Stata Version 15.
RESULTS
Participants
Recruitment is described in detail in a separate paper [19]. Figure 1 shows that GPs referred 456 people, of whom 240 (53%) participated and had available data, 45 were ineligible (10%) and 155 declined (34%). Of 240 participants, 47 were normal, 61 had CIND (59 of whom had Petersen MCI), and 132 had dementia. The median age overall was 80 years (IQR 75 to 85 years), and the median ACE-III total score was 75 (IQR 65 to 87); the median age of leaving education was 15 years (IQR 15 to 16 years) and the median months since symptom onset was 24 months (IQR 12 to 36 months). Using indirectly standardized rates in the recruiting practices we estimate that during the recruitment period around 1,735 people would have been potentially eligible, of whom GPs referred 456 and we saw 241.
GPs judged 34 people as being normal, 120 as having CIND, and 86 as having dementia. People that GPs judged as having dementia had a total ACE-III score IQR of 60 to 74 with a 90th centile of 81/100 and a highest score of 95/100, compared to published ACE-III thresholds of ≤82 for dementia. Six people with dementia were classified as normal by their referring GP, these people had a median ACE-III total score of 72 (range 69 to 82 points) and had a median age of 82 years (range 79 to 86 years). In contrast the 73 people with dementia who were classified as such by their GPs had a median ACE-III total score of 65 (range 26 to 92 points) and had a median age of 82 years (range 71 years to 94 years).
Characteristics of participants by cognitive category by reference standard
Dementia according to ICD-10 [40]. ACE-III, Addenbrooke’s’ Cognitive Examination III; CIND, Cognitive impairment, not dementia.
Table 1 presents the characteristics of participants. Of 240 participants, 53% were men, median age was 80 years, median symptom duration was 24 months and the median ACE-III score was 70 out of 100, compared to published ACE-III thresholds of <82 for dementia and <88 for MCI. Median age of leaving education age was 15 years (range 13 years to 19 years). Table 1 also provides a 3 × 3 cross tabulation of GP judgement against the reference standard, showing that referring GPs judged that 86 patients (36%) had dementia. In a regression model with total ACE-III score as the dependent variable and test order (index test / reference test first or second) as the independent variable, people who the specialist assessed first scored 2.4 points more on ACE-III (95% CI –1.3 to 6.1) points than those who underwent the index test battery first, after adjusting for age, sex, and cognition category they scored 1.9 points more (95% CI –0.8 to 4.5) than those who underwent the index battery first.
Test characteristics
There was wide variability in test duration (Fig. 2) with the 6CIT having the shortest median duration (1 min) whereas MoCA had the longest median duration (11 min). There was also variation in the range of times taken to complete tests, which differed between tests, for example being much greater for MoCA (range 7–22 min) than Phototest (range 1–5 min). Classified by the 95th centile of test duration (C95), the short duration (<5 min) tests were: EPSS (C95 2 min), T&C (C95 3 min), 6CIT (C95 3 min), Phototest (C95 4 min), and GPCOG (C95 4 min). The medium duration (≥5 but ≤10 min) tests were: TUG (C95 5 min), Sniffin’ sticks (C95 6 min), SPMT (C95 9 min), and Eurotest (C95 10 min). The long duration (>10 min) tests were: M@T (C95 11 min) and MoCA (C95 15 min).

Duration of index tests, all cognition categories. The box plots the median (darker middle line) and the quartiles (box edges), the whiskers enclose the lower (quartile 1–1.5 × interquartile range) and upper (quartile 3 + 1.5 × interquartile range) adjacent values, and the dots mark the outlying values.
Supplementary Table 1 shows the characteristics of the brief cognitive assessments and physical tests in terms of differential performance by cognitive status and test duration. We observed that performance on every test was worse for people with dementia than people who were cognitively normal, except for the Sniffin’ sticks, and that tests took longer for people with dementia or CIND than people who were normal, though for many tests this could have been due to chance (that is confidence intervals overlapped).
Accuracy of brief cognitive tests for the diagnosis of dementia
*Combined test positive was taken as being both GP judgement positive and the other test positive; and the combined test negative was taken as either both or one of judgement/test negative. CI, confidence interval; 6CIT, Six item cognitive impairment test; EPSS, Extra-Pyramidal Signs Scale; FAQ, Functional Activities Questionnaire; GPCOG, General Practitioner Assessment of Cognition; IQCODE, Informant Questionnaire on Cognitive Decline in the Elderly; Katz ADL, Katz Activities of Daily Living; Lawton IADL, Lawton Instrumental Activities of Daily Living; M@T, Memory Alteration Test; MoCA, Montreal Cognitive Assessment; SPMT, Scenery Picture Memory Test; T&C, Time & Change; TUG, Timed Up and Go.
Diagnostic accuracy
Table 2 indicates that some tests with high sensitivity are better for ruling out dementia whilst others with high specificity are better for ruling in the diagnosis. The tests with the highest sensitivity were MoCA at a threshold of 26 (sensitivity 100%; 95% CI 97% to 100%) and Sniffin’ sticks at a threshold of 11 (sensitivity 100%; 95% CI 96% to 100%). In contrast the tests with the highest specificities were FAQ (specificity 97%; 95% CI 92% to 99%), T&C (specificity 97%; 95% 91% to 99%), and Katz ADL (specificity 95; 95% CI 90% to 98%). GP judgement had modest sensitivity (56%; 95% CI 47% to 65%) but was the third most specific test (89%; 95% CI 81% to 94%).
For many brief cognitive assessments, using tests in combination with GP judgement led to a reduction in sensitivity and increase in specificity. In contrast, informant measures were less affected by combining with GP judgement, although the sensitivity of both AD8 and IQCODE combined with GP judgement were much lower than when these two tests were used alone. When combined with GP judgement the combined test with the highest sensitivity was GP+IQCODE with sensitivity 56% (47% to 64%), and the combined test with the highest specificity was T&C with specificity 100% (97% to 100%).
Impact of GP Judgement on test performance
Table 3 shows how GP judgement impacts the discrimination of the index tests quantified using the AUROC. SPMT was the test with the highest AUROC overall (AUROC 0.7753 95% CI 0.7220 to 0.8285), and the AUROC was similar regardless of GP judgement for dementia (AUROC 0.7725 95% CI 0.6283 to 0.9168) or not dementia (AUROC 0.7148 95% CI 0.6402 to 0.7894). AUROC was generally higher in people classified as not having dementia than those classified as having dementia. However, the converse was true for T&C, SPMT, M@T, and Eurotest, suggesting that these tests may be more useful in people who GPs judge as having dementia than in people who GPs think do not have dementia.
Discrimination of tests, by GP judgement
PPV, positive predictive value; NPV, negative predictive value. See Table 2 for test abbreviations. Missing data (.) are where the value is not computable. e.g., no cases where GPs judged a diagnosis of dementia were test-normal on MoCA.
Table 3 also shows the PPV (positive predictive value) and NPV (negative predictive value) for each test stratified by GP judgement. The predictive values are dependent on GP judgement because this influences the prevalence of disease. PPVs were higher when tests were restricted to GP dementia +, and NPVs were higher when restricted GP dementia -, probably attributable to prevalence effects. PPV for M@T was 100% (95% CI 75% to 100%) in people classified by GPs as having dementia, but only 50% (95% CI 7% to 93%) in people classified as not dementia. In contrast NPV for a normal Eurotest was 77% (95% CI 68% to 85%) in people classed as not dementia, but only 36% (95% CI 18% to 58%) in people who GPs thought had dementia.
Natural frequency classification
Supplementary Table 2 shows the natural frequency classification of people in a hypothetical population of 1000 people. Based on our data 550 of the 1000 people have dementia, GPs would classify 358 of the 1000 as having dementia, being correct in 308 of the 358. Without taking GP judgement into account SPMT is the test with best classification, leading to half (526) testing positive and potentially needing referral, of these 426 would be true positives (TP) with 100 false positives (FP); there would also be 351 true negatives (TN), and 125 false negatives (FN). Taking GP judgement into account then in the 358 people GPs classify as having dementia M@T is the test with best classification, whereas Eurotest has best classification in the 642 people classified as not having dementia. Combining M@T in the 358 classed by GPs as having dementia and Eurotest in 642 classed by GPs no dementia results in a total of 414 TP and 75 FP, with 376 TN and 138 FN.
Supplementary Table 3 gives the results of the exploratory bootstrapping procedure and indicates that for the tests under evaluation, there is a general trend that the stratified approach has fewer false classifications, though these differences could still be consistent with chance. For SPMT and CIT as the unstratified test, the reduction in false classifications is attributable to fewer false negatives, whereas for GPCOG there are fewer false positives with the stratified approach. With GPCOG (and to a lesser extend SPMT) as the unstratified comparison, the stratified approach has a trade-off between FP and FN whereas with CIT the trade-off is less clear, and stratification typically leads to fewer FP and FN.
Supplementary Table 4 presents a STARD checklist. Supplementary Table 5 presents a cross tabulation for each patient-index test against cognitive category.
DISCUSSION
This is the most comprehensive evaluation of a wide variety of tests for the diagnosis of dementia in people aged at least 70 years presenting to their GP with cognitive symptoms. We investigated the accuracy of eight BCAs, three physical tests and five informant measures, both alone and combined with GP judgement. Combining tests with GP judgement altered test accuracy.
There are several methodological strengths: patient selection is applicable to clinical practice, we verified all cases against the same reference standard, and there was adjudication of the reference standard for uncertain cases. We randomized the order of the index battery to minimize order effect. However, there are important limitations. We did not include people who were unable to attend with an informant, or people with severe cognitive impairment who were unable to consent, and so our findings cannot easily be generalized to them, especially regarding test duration and informant tests. There was no evidence of bias due to selective recruitment by GPs, or due to selective participation by cognitive status, but any systematic bias in recruitment would limit the generalizability of our findings to the people who were excluded [19]. We do not know whether GPs used any index tests prior to referral but based on previous studies, clinical judgement is likely to be based on rules of thumb [45], not formal tests [46], and information on referral forms indicated that judgement was informed by “face to face presentation". There is insufficient power to detect statistically significant differences between test accuracy and the confidence intervals for tests overlap. Test accuracy may vary between generations, for example when using prime ministers, or currency, which are subject to change. A further important limitation is that despite providing translation services the population were largely white, native English speakers.
Most comparable studies have reported the accuracy of single tests. For example, MoCA at a threshold of 26 was reported to have a sensitivity of around 94% and specificity of at most 60%, but this was based on studies in secondary care which are likely to have a more severe spectrum of disease [36]. IQCODE at a threshold of 3.2 has been reported to have sensitivity 100% specificity 76% [47] and Eurotest at a threshold of 21 has been reported to have sensitivity 91% specificity 82% [20]. These results are comparable with ours. In a multi-test primary-care based study that included 47 people with dementia out of a total of 141 people, Eurotest took an average 7 min and Phototest an average 3 min in someone with dementia, and both tests had comparable accuracy [48], which compares well with our findings.
While we have only tested a few comparisons with our exploratory bootstrapping procedure, our data suggest that stratification by GP judgement may help to reduce incorrect diagnostic classifications, but the interplay between judgement and the test will determine the impact. Although it is not possible to make firm recommendations about tests, we believe that BCAs such as M@T, Eurotest, and 6CIT and GPCOG may be particularly useful to consider for further investigation or use in practice; the last two tests may be particularly useful if time is highly valued in practice. These four tests have high predictive values in this setting, and our results suggest that these tests may be particularly useful when GP judgement is used to stratify people for further testing, but this requires confirmation in a future study.
One important implication is that there is substantial variation in duration of brief cognitive assessments when performed in this setting. Clinicians who are short of time may prefer to be familiar with the use of (and limitations of) a test which they can reliably do in less than 5 min in 95% of people, such as GPCOG, which when combined with clinical judgment has high specificity but only modest sensitivity. A second implication is that GP judgement could inform the selection of future tests because GP judgement has an important impact on prevalence and therefore predictive values (and diagnostic accuracy) of tests. The most time efficient diagnostic procedure while retaining high accuracy would be to stratify by GP judgement and use IQCODE in people who GPs judge as having dementia and 6CIT in people who GPs judge as having no dementia.
Further research in primary care to investigate the accuracy of tests for dementia in combination with GP judgement and each other is important to help refine our results and reduce the uncertainty in the estimates. Future research should attempt to identify the most discriminative tests to distinguish dementia and normal from MCI in people who in the clinical judgement of a GP have cognitive impairment but not having dementia, and also investigate the effect of GP stratification of particular tests on diagnostic accuracy, with particular attention given to the time taken to complete tests. We believe that it may be helpful to focus future research on tests such as M@T, Eurotest, 6CIT, and GPCOG for reasons discussed above.
Footnotes
ACKNOWLEDGMENTS
The authors thank the participants and the staff at participating practices, without whom this work would not have been possible. The staff at the West of England Clinical Research Network arranged for redaction, collection, and transport of medical records from general practices. Dr. Judy Haworth made an invaluable contribution to the paper but sadly died before publication. Part of this work was completed as part of a PhD thesis: Approaches to diagnosing dementia syndrome in general practice: Determining the value of clinical judgement and tests Creavin, S. T. (Author). 29 Sep 2020.
.
FUNDING
The Wellcome Trust (Fellowship 108804/Z/15/z £321,248), Avon Primary Care Research Collaboration (£19,705), The Claire Wand fund (£5040), and the National Institute for Health Research School for Primary Care Research (£9,971). The Western Clinical Research Networks approved an application for service support costs for practices to provide for the expense of room hire in GP clinics and GPs referring people to the study. YBS is partially supported by the NIHR Applied Research Collaboration West (NIHR ARC West). The views expressed in this article are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. This research was funded in whole, or in part, by the Wellcome Trust [108804/Z/15/15/z]. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
DATA AVAILABILITY
The datasets generated and analyzed during the current study are not yet publicly available as the funder approved pre-specified data management plan stated we would embargo access for five years, but are available from the corresponding author on reasonable request. Data will be available from the data. Bris repository after an embargo period of five years. Statistical code is available on request from the authors.
