Clock Drawing Test: Types of Errors and Accuracy in Early Cognitive Screening

Abstract

Background:

Clock Drawing Test (CDT) is a commonly used screening tool for cognitive disorders, known for its ease of administration and scoring. Despite frequent use by clinicians, CDT is criticized for its poor predictive value in mild cases of impairment.

Objective:

To evaluate CDT as a screening tool for early stage of cognitive impairment in biomarker-verified Alzheimer’s disease (AD) and depressive disorder (DD).

Methods:

We analyzed CDT of 172 patients with verified AD, 70 patients with DD, in whom neurodegenerative disorder was excluded using cerebrospinal fluid biomarkers, and 58 healthy older adults. CDT was scored using the semi-quantitative (Shulman) and itemized criteria (adapted from Mendez).

Results:

Logistic regression showed that for both DD and AD patients with high Mini-Mental State Examination (MMSE) scores (27 and above) the significant predicting variable is uneven number spacing. As MMSE deteriorates (24-26 points), an additional error of setting clock hands is predictive of the disease. In the low MMSE condition, CDT showed an acceptable discrimination for AD (AUC itemized 0.740, Shulman 0.741) and DD (AUC itemized 0.827, Shulman 0.739) using both scoring methods. In the high MMSE condition, discrimination rates were acceptable using itemized scoring but poor using Shulman scoring for both AD (AUC itemized 0.707, Shulman 0.677) and DD (AUC itemized 0.755, Shulman 0.667) groups.

Conclusion:

Ideally, modern diagnostic process should take place before the cognitive performance drops beneath the healthy range. This makes CDT of little use when screening patients with very mild cognitive deficits.

Keywords

Alzheimer’s disease dementia depressive disorder mental status and dementia tests screening

INTRODUCTION

With ever better therapeutic options for treatment of Alzheimer’s disease (AD) on the horizon, procedures to identify patients with mild cognitive deficits become increasingly important. The therapies currently in development target pathophysiology of AD rather than being symptomatic treatments. Thus, screening tests need to be evaluated for their ability to characterize mild neuropsychological deficits with regards to their etiology. In addition, for use in general practitioner’s settings screening tests need to administered and scored quickly.

Clock Drawing Test (CDT) is one of the most widely used screening tests for cognitive disorders. CDT was first validated by Shulman in 1986 [1] and at the time belonged to a range of drawing tests that were used to investigate cognition, such as cube-drawing [2], house-drawing [3], or a tree-lined-avenue-drawing [4]. Popularity of the CDT can be attributed to its easy administration—the patients are instructed to draw a face of a clock with the hands of the clock indicating a particular time. A widely cited review of the CDT [5] states that the test has high levels of sensitivity and specificity as well as concurrent and predictive validity. However, many studies on validity and reliability derived their data from comparisons of patients with well-established AD and healthy controls [6 –14]. This leaves a question of whether CDT is equally suitable for early disease detection.

With the new treatment techniques emphasizing importance of early diagnosis for a successful treatment, screening tools need to keep up. This does not seem to be the case for CDT. However scored, CDT seem to be only modestly successful in identifying cases of mild dementia [15, 16] and could not discriminate between cases with very mild dementia of Alzheimer’s type and healthy controls at all [17]. Studies using CDT for the screening of mild cognitive impairment reported mixed results using several scoring types [18, 19].

CDT scoring systems can be broadly divided into qualitative, semi-quantitative, and quantitative approaches [20]. Qualitative analyses of the clock are the most subjective and describe typical errors in the drawing by considering the clock as a whole [21 –24]. Quantitative approaches on the other hand are represented by numerical scales with objective and fast scoring that focuses on one aspect of the clock at a time [12 , 25–27]. A compromise between the two systems, a semi-quantitative approach, uses a numerical scale to characterize a subjectively evaluated clock drawing [1 , 28]. A semi-quantitative method proposed by Shulman is the one used most widely today [29]. To improve its accuracy some researchers suggested to use CDT in in conjunction with the Mini-Mental State Examination (MMSE) [30], verbal fluency, and informant reports. This has become a critique point, for if a screening tool cannot be used in isolation, can it really be used as a stand-alone test [31]. Aiming to increase the ability to distinguish different patterns of cognitive deficits it was suggested to use quantitative analyses, such as the one suggested by Mendez [12], hoping that its itemized nature will be more helpful in describing profiles for different types of dementia [21, 32].

While CDT is mostly used to screen for cognitive deficits in patients with suspected dementia it has also been used to detect cognitive deficits in depressive disorders (DD) [33]. Depression and dementia of Alzheimer’s type share a number of symptoms in the initial stages of the disease such as apathy, loss of interest and decreased cognition [34]. Classical cognitive impairments in AD are progressively worsening episodic memory function and spatial orientation [35]. In addition to these initial symptoms, executive function, memory, planning, and attention become impaired as disease progresses [35]. Patients with DD seem to suffer particularly from decreased executive function as well as reduced verbal learning capacity; the latter is especially evident in late-onset DD [36, 37]. Deficits in verbal learning are reported to occur due to poor consolidation as well as poor recall among DD patients [38 –40]. Morphologically both conditions present with atrophy in temporal und frontal structures and white matter lesions and [41]. It comes as no surprise that they are hard to tell apart in the initial stages. Despite the initial similarity in cognitive and affective domains, the treatment path for these two conditions is different and warrants an early and accurate diagnosis.

Using biomarkers, the diagnosis of AD can be verified and AD can be ruled out in patients with depressive symptoms [38]. The authors are not aware of any study looking at the characteristics of the CDT in patient groups with verified AD and verified DD. The current study aimed to analyze two different scoring systems of the CDT with regard to their ability to distinguish between patients with verified very early AD and patients with DD, in whom AD pathology was ruled out.

MATERIALS AND METHODS

This study is a retrospective observational study. We used in- and outpatient records from the gerontopsychiatric services of Ulm University at Bezirkskrankenhaus Günzburg from 2014 to 2018. The study received approval of the ethics committee of Ulm University (289/18). It was conducted in accordance with the ethical standards of the University of Ulm and the guidelines outlined in the declaration of Helsinki [42].

Study sample

Our study sample was selected from records of 3,758 in- and outpatients that were referred to the Geriatric Psychiatry services of Ulm University at Günzburg hospital between 2014 and 2018. Exclusion criteria were age under 60, MMSE score of under 24 as well as psychiatric diagnoses other than AD and DD. All participants received a detailed neuropsychological evaluation consisting of verbal memory (measured by California Verbal Learning Test [43], including five learning trials, immediate and delayed recall (without and with cues) and recognition), verbal and visual span forward and backward [44], Trail Making Test A and B [45] and semantic (category animals) and phonetic (words beginning with letter P and S) fluency [46]. All AD and DD patients met the respective diagnostic criteria of mild AD or DD according to the 10^th version of the International Classification of Diseases (ICD-10; [47]). Diagnoses were backed up by taking medical history, exploring the current symptoms, and performing a physical examination. AD was additionally verified using cerebrospinal fluid (CSF) biomarkers (amyloid-β (Aβ)_1 - 42 _< 550 pg/ml, total tau > 300 pg/ml, or phospho-tau>61 pg/ml). DD was additionally verified by including only patients with biomarkers not suggestive of AD pathology (Aβ_1 - 42 _> 550 pg/ml, total tau < 300 pg/ml, phospho-tau<61 pg/ml). Patients with subjective cognitive complaints, whose subjective memory complaints could not be verified through a neuropsychological examination and who presented with inconspicuous medical history, normal physical examination, and no indication of gradual cognitive decline, were classified as healthy controls. The final sample analyzed in this study consisted of 58 participants in the control group (CG), 70 persons with DD, and 172 persons with AD. The demographics and average scores of neuropsychological evaluation for all groups are presented in Table 1.

Table 1

Participant demographic data. Results are displayed as Mean±Standard Deviation

	CG	DD	AD
N	58	70	172
Male/Female	33/25	31/39	69/103
Age	66.29±8.90	73.60±7.60	77.60±6.76
Education Years	11.50±3.54	8.97±1.95	9.56±2.75
MMSE	29.52±0.60	27.06±2.01	26.51±1.87
GDS	2.10±1.86	6.90±3.73	4.81±3.69
CVLT 1	4.78±1.64	3.48±1.42	2.81±1.48
CVLT 5	11.86±3.34	7.65±3.05	5.62±2.38
CVLT total recall	44.52±10.1	29.66±9.92	23.07±8.05
CVLT immediate recall	9.54±3.44	4.63±3.01	2.14±2.32
CVLT immediate cued recall	12.16±2.86	7.6±3.16	4.98±2.72
CVLT delayed recall	10.37±3.46	4.99±3.23	2.07±2.65
CVLT delayed cued recalled	11.95±2.88	7.18±3.41	4.17±3.04
CVLT recognition	15.33±0.85	13.63±2.17	13.38±2.72
CVLT false positives	0.55±0.86	4.26±4.89	8.61±6.43
Digit span forward	7.79±1.64	6.62±2.07	6.77±1.85
Digit span backward	6.12±1.84	4.46±1.82	4.19±1.52
Visual span forward	6.95±1.32	6.24±1.59	5.69±1.42
Visual span backward	6.44±1.36	5.04±2.01	4.67±1.86
TMT-A	43.36±25.56	69.07±36.70	87.16±40.18
TMT-B	100.49±45.59	175.06±77.60	213.35±67.32
Semantic Fluency	21.62±5.35	15.72±5.66	13.68±4.86
Phonemic fluency P	9.57±3.63	5.91±3.91	5.47±3.41
Phonemic fluency S	12.52±4.71	8.01±4.24	7.67±4.06

AD, Alzheimer’s disease; CVLT, California Verbal Learning Test; DD, Depressive Disorder; CG, Control group; GDS, Geriatric Depression Scale; MMSE, Mini-Mental State Examination; TMT, Trail Making Test.

Materials

Mini-Mental Status Examination [30]: The MMSE is a widely used instrument to give on overview over global cognitive functioning. It comprises questions on orientation, registration, short-term memory, language use, comprehension, and basic motor skills. The score ranges from 0 –30, with a score below 24 indicating a cognitive impairment.

Geriatric Depression Scale [48]: The short version of the Geriatric Depression scale is a 15-item questionnaire to assess symptoms of depression. Participants are asked to answer each item with yes or no. One point is given for each answer compatible with the symptoms of depressive disorder. Scores above five indicate a particular severity of depression: 5-8 mild, 9-11 moderate and 12-15 severe.

Clock Drawing Test: In the CDT, participants were presented with A4 sheet with a circle on it and asked to complete a face of the clock with the hands indicating a time “ten past eleven”. The present study used two separate scoring methods.

Itemized: This scoring method was derived from the original scoring of separate error types proposed by Mendez [12]. The clock is scored by individually assessing 20 items and comprises three major components: general impression, clock numbers and clock hands. Few modifications were added in the current study. Firstly, as proposed by Nakashima [49] all items were renamed to reflect the error type this item represents. One point was given for an error in each item. In cases where one of the major components of the clock was not present, all items in this group were considered an error. Secondly, we decided to split items 4 (a “2” is present and is pointed out to indicate the time) and 9 (an “11” is present and is pointed out to indicate the time) in the original Mendez system to separate the errors of a missing number and the clock hands not pointing to that particular number. Finally, as the template used for the CDT already contained a circle, we removed the item 3 (there is a closed figure without gaps) from the original Mendez scoring. The final scoring system used in this study consisted of 21 items (Table 2).

Table 2

Error types in CDT drawing test

Item No.	General Impression
1	No attempt to indicate time
2	Extra symbols or marks
	Numbers
3	Number 2 does not exist
4	Numbers spaced unevenly
5	Two or more clock quadrants contain inappropriate numbers
6	Numbers counterclockwise
7	Numbers outside of circle
8	Number 11 does not exist
9	Missing numbers
10	Repeater numbers
11	Substitution of numbers
12	Number beyond 12
13	Uneven distance from edge
14	Six or less of the same type symbols
	Clock hands
15	Shifted clock center
16	Clock hands are same length
17	More or less than 2 clock hands
18	Hands cross the circle
19	None of the hands attempt to indicate time
20	Hand does not point at number 2
21	Hand does not points at number 11

Shulman [29]: This scoring method consists of a hierarchical scale, where a clock is analyzed as a whole. The original method uses scores from 5 to 0 with lower scores indicating greater severity. In this study we used a German adaptation with scores from 1 to 6, where a higher score indicates greater severity: 1 = a perfect clock; 2 = minor visuospatial errors; 3 = acceptable visuospatial organization, but incorrect time; 4 = moderate visuospatial disorganization of numbers, 5 = severe visuospatial disorganization; 6 = no reasonable attempt to draw a clock.

Statistical analyses

All statistical data analyses were carried out using the statistics program SPSS (SPSS 25.0 for Windows, Armonk, NY, 2017). A binomial logistic regression was used to predict the probability that participants fall into the control or disease (AD and DD) group based on the errors they have made in the CDT. Probabilities calculated in the regression models of the itemized CDT scoring method as well Shulman method scores were used to produce ROC curves to compare their overall discriminatory ability over different conditions.

RESULTS

Figure 1 represents percentages of items that were scored as correct using Itemized scoring criteria (Fig. 1A) and percentage of scores using Shulman criteria (Fig. 1B) in each group. Note that for better graphical representation Fig. 1A represents not errors (as used in the subsequent analyses) but the correct scores.

Fig. 1

A) Percentage of correct items using itemized scoring (see Table 2 for item descriptions). B) Percentage of achieved scores per group using Shulman scoring. AD, Alzheimer’s disease; CG, control group; DD, depressive disorder.

Itemized scoring

Separate binomial logistic regression analyses were performed to ascertain the effects of CDT items on the likelihood that participants have DD and AD respectively. CDT items, where at least a half of participants in the respective group scored wrong, were used for the analysis. For both groups it comprised items 4, 15, 16, and 20. We first analyzed the complete sample and later divided each sample into subgroups of high MMSE scores (values 27 and above) and low MMSE (values between 26 and 24). The demographic data of the participants in high and low MMSE conditions can be found in Table 3.

Table 3

Demographic data of AD and DD groups divided by MMSE scores. Results are displayed as Mean±Standard Deviation

	DD		AD
MMSE	High	Low	High	Low
N	40	30	85	87
Male/Female	19/21	12/18	40/45	29/58
Age (M±SD)	72.35±7.42	75.27±7.65	77.20±7.16	78.00±6.34
Education Years (M±SD)	9.18±2.26	8.68±1.42	9.96±2.98	9.17±2.46
MMSE (M±SD)	28.57±1.04	25.03±0.89	28.14±1.06	24.91±0.82
GDS (M±SD)	6.68±4.05	7.17±3.35	5.05±3.75	4.49±3.64
CVLT 1	3.55±1.47	3.38±1.37	2.96±1.55	2.67±1.40
CVLT 5	8.7±3.07	6.14±2.34	6.28±2.69	4.98±1.83
CVLT total recall	32.33±10.33	25.86±8.02	25.39±9.03	20.83±6.24
CVLT immediate recall	5.45±3.36	3.46±1.95	2.82±2.67	1.49±1.70
CVLT immediate cued	8.63±3.37	6.14±2.16	5.76±3.14	4.23±1.99
CVLT delayed recall	5.8±3.59	3.82±2.20	2.87±3.08	1.31±1.88
CVLT delayed cued	8.25±3.71	5.64±2.20	5.12±3.41	3.26±2.31
CVLT recognition	13.72±2.36	13.5±1.88	13.48±2.83	13.28±2.62
CVLT false positives	3.62±5.00	5.18±4.7	7.99±6.75	9.21±6.08
Digit span forward	7.22±2.12	5.75±1.69	6.92±1.55	6.62±2.10
Digit span backward	4.9±1.99	3.82±1.36	4.33±1.51	4.05±1.52
Visual span forward	6.48±1.52	5.89±1.66	5.96±1.38	5.42±1.42
Visual span backward	5.05±2.26	5.04±1.62	4.98±1.77	4.36±1.90
TMT-A	63.47±34.93	76.79±38.28	82.54±36.31	91.68±43.37
TMT-B	164.11±74.29	190.65±80.97	210.63±71.81	216.24±62.64
Semantic Fluency	16.87±6.62	14.17±3.59	14.95±4.75	12.48±4.68
Phonemic fluency P	6.9±4.35	4.59±2.78	6.12±3.39	4.85±3.33
Phonemic fluency S	8.95±4.82	6.71±2.88	8.38±4.09	6.99±3.94

AD, Alzheimer’s disease; CG, Control group; CVLT, California Verbal Learning Test; DD, Depressive Disorder; GDS, Geriatric Depression Scale; MMSE, Mini-Mental State Examination; TMT, Trail Making Test.

The logistic regression model for the complete DD group was statistically significant and explained 30% of the variance. Items 4 (uneven distribution of numbers along the face of the clock) and item 20 (hand of the clock not pointing to number 2) were significant predictors and increased the likelihood of DD. The model for the high MMSE DD group was significant and explained 28% of the variance (Nagelkerke R²) with item 4 being a significant predictor. Logistic regression model for low MMSE DD group was likewise statistically significant and explained 40% of the variance (Nagelkerke R²) with items 4 and 20 being significant predictors for DD (Table 4).

Table 4

Logistic regression predicting likelihood of depressive disorder and Alzheimer’s disease based on Clock Drawing Test

	χ ²	df	p	Explained variance
	CG versus DD
Itemized*
All MMSE	27.343	4	<0.001	30%
Low MMSE	23.132	4	<0.001	40%
High MMSE	19.741	4	<0.001	28%
Shulman Score
All MMSE	20	1	<0.001	19%
Low MMSE	19.666	1	<0.001	28%
High MMSE	9.842	1	0.002	13%
	CG versus AD
Itemized*
All MMSE	20.161	4	<0.001	17%
Low MMSE	17.871	4	0.001	22%
High MMSE	15.637	4	0.004	18%
Shulman Score
All MMSE	49.682	1	<0.001	29%
Low MMSE	50.896	1	<0.001	40%
High MMSE	30.221	1	<0.001	26%

AD, Alzheimer’s disease; CG, control group; DD, depressive disorder; MMSE, Mini-Mental State Examination. *Error Items 4 (numbers spaced unevenly), 15 (shifted clock center), 16 (clock hands are same length), 20 (hand does not point at number 2).

Similarly, the logistic regression model for AD was statistically significant χ² (4)=20.161, p < 0.001. The model explained 17% of the variance (Nagelkerke R²) and correctly classified 69.3% of cases. Items 4 was the only significant predictor in the model, with errors in it increasing the likelihood of AD. The model for the high MMSE AD group was significant and explained 18% of the variance (Nagelkerke R²) with item 4 being a significant predictor. Logistic regression model for low MMSE AD group was likewise statistically significant and explained 22% of the variance (Nagelkerke R²) with items 4 and 20 being significant predictors for AD (Table 4).

Finally, to test the suitability of CDT in differential diagnostic we have calculated analyzed the how well CDT with itemized scoring can differentiate between AD and DD patients (all MMSE values). The test showed 100% specificity and 0% sensitivity classifying all patients in the AD group.

Shulman scoring

Logistic regression analysis for Shulman criteria scores returned statistically significant model for DD group and explained 19% of variance. Models for high and low MMSE DD groups were likewise significant and predicted 13% and 28% of variance respectively. Similarly, the logistic regression models for AD were statistically significant; the model explained 29% of the variance for the complete AD group, 26% for the high MMSE AD group, and 40% for the low MMSE AD group (Table 4). Lastly, we looked into how well CDT can discriminate between AD and DD patients using Shulman scoring (all MMSE values). Similarly like with itemized scoring, all patients were classified in the AD group resulting in 100% specificity and 0% sensitivity.

The values of accuracy, sensitivity, and specificity for AD and DD category predictions using itemized and Shulman scoring are displayed in Table 5.

Table 5

Classification of cases using Mendez and Shulman divided by MMSE

	all MMSE Values	MMSE 24-26	MMSE 27-30
	CG versus DD
Itemized*
Accuracy	73.3%	80.6%	74.4%
Specificity	77.4%	92.5%	88.7%
Sensitivity	69.2%	47.4%	51.5%
Shulman Score
Accuracy	71.1%	73.9%	65.3%
Specificity	69.0%	81.0%	81.0%
Sensitivity	72.9%	60.0%	42.5%
	CG versus AD
Itemized*
Accuracy	69.3%	68.7%	68.2%
Specificity	52.8%	77.4%	66.0%
Sensitivity	78.0%	58.7%	70.4%
Shulman Score
Accuracy	76.5%	78.6%	71.3%
Specificity	69.0%	69.0%	69.0%
Sensitivity	79.1%	85.1%	72.9%

AD, Alzheimer’s disease; CG, control group; DD, depressive disorder; MMSE, Mini-Mental State Examination. * Error Items 4 (numbers spaced unevenly), 15 (shifted clock center), 16 (clock hands are same length), 20 (hand does not point at number 2).

The probabilities of the logistic regression models of both DD and AD for high and low MMSE scores were used to produce ROC curves and compare CDT results scored using itemized and Shulman criteria (Figs. 2 3). The criteria proposed by Hosmer [50] were used as guidelines in the AUC analysis (AUC < 0.7 = poor discrimination; 0.7 ≤ AUC <0.8 = acceptable discrimination; 0.8 ≤ AUC <0.9 = excellent discrimination; AUC ≥ 0.9 = outstanding discrimination). The results are displayed in Table 6 and show acceptable or excellent discrimination for itemized scoring method across groups, whereas CDT loses its discriminatory ability in the groups with high MMSE value.

Fig. 2

Discriminating ability of Shulman and itemized criteria illustrated by ROC curves for high and low MMSE in Alzheimer’s disease. See Table 6 for numerical values of area under the curve.

Fig. 3

Discriminating ability of Shulman and itemized criteria illustrated by ROC curves for high and low MMSE in Depressive Disorder. See Table 6 for numerical values of area under the curve.

Table 6

Area under the curve assessment for high and low MMSE values in AD and DD

Group	MMSE	AUC	95% CI		Assessment
			Lower	Upper
	Depressive Disorder
Itemized*	High	0.755	0.645	0.864	acceptable
	Low	0.827	0.713	0.941	excellent
Shulman	High	0.667	0.546	0.788	poor
	Low	0.739	0.605	0.872	acceptable
	Alzheimer’s Disease
Itemized*	High	0.707	0.608	0.806	acceptable
	Low	0.740	0.643	0.837	acceptable
Shulman	High	0.677	0.575	0.779	poor
	Low	0.741	0.641	0.842	acceptable

AUC, area under the curve; CI, confidence interval; *Error Items 4 (numbers spaced unevenly), 15 (shifted clock center), 16 (clock hands are same length), 20 (hand does not point at number 2).

DISCUSSION

Inspection of the graph depicting the frequency of the separate error types in the clock-drawing test shows that lines representing different participant groups follow a similar pattern, differing in the frequency of the errors. Patients with both DD and AD seem to be susceptible to two types of visual-spatial and executive errors. Both DD and AD groups seem to have difficulties spacing the numbers evenly and determining the clock center as well as disregard the length difference in the hands of the clock and do not point the hand of the clock at number 2 to indicate 10 minutes.

Although it was not the focus of this paper, it is worth pointing out, the potential importance of separating the nonexistence of numbers 2 or 11 and the hands of the clock not pointing to these particular numbers as separate errors. Nonexistence of number 2 (item 3) or number 11 (item 8) followed a similar pattern as nonexistence of other numbers (item 9). This type of item error is generally associated with visuospatial difficulty and deficits in the right parietal lobe [49 , 52]. On the other hand failure to set the clock hands points towards the dysfunction of the frontal lobe and could reflect difficulties in abstract thinking [53, 52].

Logistic regression analysis identified two items as significant predictors. In the low MMSE group, where MMSE values ranged from 24 to 26, uneven spacing of numbers and clock hands not pointing to number 2 were significant predictors for both AD and DD. On the other hand, for high MMSE group, with MMSE scores 27 and above, only uneven spacing predicted the disease condition.

Uneven spacing of numbers reflects a broader deficit in visuospatial planning, which has been described to occur in AD [21, 54] and affects as many as 43% of patients [55]. Among other deficits in constructing representations of visual scenes and objects AD patients show deficits in spatial coherence [56]. The remembered or imagined objects lack spatial integration, are fragmented or misplaced. These reports closely resemble the clocks drawn by AD participants in regards of correctly spacing the numbers on the face of the clock. Neuroanatomical studies have reported that this type of error was associated with impairments in frontal lobe [49] as well as nondominant right parietal lobe [52]. A successful visual planning of a clock face additionally requires successful communication between frontal and parietal lobes [57] that coordinate visuospatial understanding of the clock. Frontoparietal circuits seem to be adversely affected by depression and result in problems with cognitive flexibility and cognition, especially in integrating information into coherent mental representations [58], such as a face of a clock.

Incorrect placement of the clock hands was an error more descriptive of a higher-level impairment as measured by the MMSE. Other studies found that this error directly related to executive functioning and was able to discriminate between early dementia of Alzheimer’s type from other dementia forms [59] and from healthy controls [60]. Most of the participants in the current and similar studies place the minute hand of 10 instead of 2, to indicate the time “ten past eleven” [49 , 60]. Correct placement of the clock hands requires a transformation of the verbal time indication to its correct representation on a clock. This is facilitated by semantic memory where the knowledge about conceptual time representation and that of a clock functionality is stored. Access to semantic storage is impaired in early stages of AD [61 –63]. Correct placement of the clock hands requires a transformation of the verbal time indication to its correct representation on a clock. This is facilitated by semantic memory where the knowledge about conceptual time representation and that of a clock functionality is stored. Access to semantic storage is impaired in early stages of AD [21, 64].

CDT is mostly known and used for dementia screening, therefore literature on its diagnostic value in DD is relatively scarce and mostly describes CDT as a tool to detect underlying dementia [65, 66]. Later studies showed, that even among patients with depression without concomitant diagnoses, an executive dysfunction correlates with lower CDT scores [67]. Clock setting in particular discriminates between patients with DD and healthy controls [68]. In good harmony with our results, Klein and colleagues [69] reported that overall CDT score correlated significantly with the semantic memory impairment relating to minute hand functionality. Additionally depressed individuals with late onset depression performed significantly worse than individuals with early onset depression in tasks concerning minute hand placement and digit arrangement in the clock face. The current study builds up on these results suggesting that the deficits in question are directly related to depression, since neurodegenerative disease in the current sample was excluded using biomarkers.

Although AD and DD are distinct disorders, they share a lot of common symptoms when it comes to cognitive deficits, such as memory, attention, visuospatial and executive functions [38]. This phenotypical similarity is well reflected in the findings of the current study as both disease groups showed the same types of errors. Given blurred boundaries between AD and DD when it comes to most cognitive deficits, diagnostic process should rely on a more detailed evaluation of a telltale sign of AD—episodic memory. Hodges and colleagues [35] have reported that AD follows a sequential pattern of deficits, which starts with episodic memory and are followed by attention, working-memory, and executive functions later on. DD lacks such a clear-cut pattern of deficits and may involve some or all of the AD deficits to varying degrees and differing time of onset. This leaves early episodic memory deficits as a most reliable criteria within differential diagnostics. Admittedly, a thorough evaluation of episodic memory is a considerably lengthier process, requiring a trained neuropsychologist. This could, however, give a much clearer picture of the symptomatic constellation of the particular patient and help decide on the most efficient further steps for each individual.

Due to quick and easy administration CDT appeal remains high. It is arguably a valuable addition to the GP’s assessment package as a rough verification of subjective complaints [66]. However, it should be kept in mind that impaired performance in CDT does not necessarily point to a specific etiology of the complaints. Similarly, inconspicuous result does not guarantee an absence of illness. If used for initial screening or as a follow-up tool, CDT should be applied with its limitations in mind. As evident from the ROC curves, itemized quantitative scoring is more valuable than a semi-quantitative Shulman scoring in detecting cognitive deficits. If at all used, Shulman scoring method should only be applied for individuals with at least moderate cognitive deficits. If cognitive deficits are mild, or have not been previously assessed, it is recommended to use only itemized scoring for initial screening. When communicating CDT results, types of error should be included in the report, as these potentially indicate the severity of cognitive decline.

The current study has potential limitations. As the design of the study was retrospective, we collected and scored clock drawings readily available at the hospital’s medical record archives and did not administer CDT ourselves. The number of the previous depressive episodes for DD group could not be accurately determined and was not accounted for in this study. The number of the depressive episodes could potentially have had an impact on the severity of cognitive impairment in this group [70]. Although the classical paper-pencil administration of the CDT is predominantly in use by clinicians, a digital version of the CDT [71, 72] has been gaining momentum in the research community as a tool to evaluate neuropsychological deficits [73 –77]. Compared to the classical administration it offers additional graphomotor and latency parameters, which are hard to capture otherwise [78, 79]. As digital CDT requires additional hardware for its administration it might not be practical for many clinicians. Alternative can be offered by an automated algorithm-based scoring [80 –82]. Future studies should aim to look into additional value to early differential diagnostics provided through digital administration and machine-learning scoring of the CDT.

Conclusion

Analysis of clocks drawn by patients with verified AD and patients with DD without concomitant neurodegenerative disease showed that CDT is a poor choice for early and adequate diagnostic process. Type of scoring played a crucial role in test’s accuracy. Itemized error by error scoring showed better discriminability than semi-quantitative method. Two errors stood out as particularly common for both AD and depression—uneven number spacing was the most typical of very light cognitive impairment, followed by errors setting clock hands as the cognition worsens. Overall, the CDT offers some insights into visual-spatial capabilities; however, it is time to let the test go as a screening tool.

Footnotes

ACKNOWLEDGMENTS

KS, FG, and CL were responsible for data acquisition. KS and MR were responsible for data interpretation, drafting and revising the manuscript. All authors approved the final version of the manuscript.

FUNDING

The authors have no funding to report.

CONFLICT OF INTEREST

The authors have no conflict of interest to report.

DATA AVAILABILITY

The data supporting the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

Shulman

, Shedletsky

, Silver

(1986) The challenge of time: Clock-drawing and cognitive function in the elderly. Int J Geriatr Psychiatry 1, 135–140.

Rosselli

, Ardila

(2003) The impact of culture and education on non-verbal neuropsychological measurements: A critical review. Brain Cogn 52, 326–333.

Moore

, Wyke

(1984) Drawing disability in patients with senile dementia. Psychol Med 14, 97–105.

Rennert

(1971) Der Allee-Zeichentest: Vorläufige Mitteilung einer Methode zur Erfassung von Raumabbildungsanomalien bei psychisch Kranken [The tree-lined avenue drawing test. Preliminary report on a method to diagnose space drawing abnormalities in mental patients]. Psychiatr Neurol Med Psychol 23, 601–609.

Shulman

(2000) Clock-drawing: Is it the ideal cognitive screening test? Int J Geriat Psychiatry 15, 548–561.

Brodaty

, Moore

(1997) The Clock Drawing Test for dementia of the Alzheimer’s type: A comparison of three scoring methods in a memory disorders clinic. Int J Geriatr Psychiatry 12, 619–627.

Dastor

, Schwartz

, Kurzman

(1991) Clock-drawing: An assessment technique in dementia. J Clin Exp Gerontol 13, 69–85.

Gruber

, Varner

, Chen

Y-W

, Lesser

(1997) A comparison of the clock drawing testand the Pfeiffer Short Portable Mental Status Questionnaire in a geropsychiatry clinic. Int J Geriatr Psychiatry 12, 526–532.

Lam

LCW

, Chiu

HFK

, Ng

, Chan

, Li

, Wong

(1998) Clock-face drawing, reading and setting tests in the screening of dementia in Chinese elderly adults. J Gerontol B Psychol Sci Soc Sci 53B, 353–357.

10.

Libon

, Swenson

, Barnoski

, Sands

(1993) Clock drawing as an assessment tool for dementia. Arch Clin Neuropsychol 8, 405–415.

11.

Manos

, Wu

(1994) The Ten Point Clock Test: A quick screen and grading method for cognitive impairment in medical and surgical patients. Int J Psychiatry Med 24, 229–244.

12.

Mendez

, Ala

, Underwood

(1992) Development of scoring criteria for the Clock Drawing Task in Alzheimer’s disease. J Am Geriatr Soc 40, 1095–1099.

13.

Sunderland

, Hill

, Mellow

, Lawlor

, Gundersheimer

, Newhouse

, Grafman

(1989) Clock drawing in Alzheimer’s disease. A novel measure of dementia severity. J Am Geriatr Soc 37, 725–729.

14.

Todd

, Dammers

, Adams

, Todd

, Morrison

(1995) An examination of a proposed scoring procedure for the clock drawing test: Reliability and predictive validity of the clock scoring system (CSS). Am J Alzheimers Dis Other Demen 10, 22–26.

15.

Seigerschmidt

, Mösch

, Siemen

, Förstl

, Bickel

(2002) The clock drawing test and questionable dementia: Reliability and validity. Int J Geriat Psychiatry 17, 1048–1054.

16.

Storey

, Rowland

JTJ

, Basic

, Conforti

(2001) A comparison of five clock scoring methods using ROC (receiver operating characteristic) curve analysis. Int J Geriatr Psychiatry 16, 394–399.

17.

Powlishta

, Drasvon

, Stanford

, Carr

, Tsering

, Miller

, Morris

(2002) The clock drawing test is a poor screen for very mild dementia. Neurology 59, 898–903.

18.

Ehreke

, Luppa

, König

H-H

, Riedel-Heller

(2010) Is the Clock Drawing Test a screening tool for the diagnosis of mild cognitive impairment?: A systematic review. Int Psychogeriatr 22, 56–63.

19.

Petrazzuoli

, Vestberg

, Midlöv

, Thulesius

, Stomrud

, Palmqvist

(2020) Brief cognitive tests used in primary care cannot accurately differentiate mild cognitive impairment from subjective cognitive decline. J Alzheimers Dis 75, 1191–1201.

20.

Spenciere

, Alves

, Charchat-Fichman

(2017) Scoring systems for the Clock Drawing Test: A historical review. Dement Neuropsychol 11, 6–14.

21.

Rouleau

, Salmon

, Butters

, Kennedy

, McGuire

(1992) Quantitative and qualitative analyses of clock drawings in Alzheimer’s and Huntington’s disease. Brain Cogn 18, 70–87.

22.

Cahn

, Salmon

, Monsch

, Butters

, Wiederholt

, Corey-Bloom

, Barrett-Connor

(1996) Screening for dementia of the alzheimer type in the community: The utility of the clock drawing test. Arch Clin Neuropsychol 11, 529–539.

23.

Leyhe

, Saur

, Eschweiler

, Milian

(2009) Clock test deficits are associated with semantic memory impairment in Alzheimer disease. J Geriatr Psychiatry Neurol 22, 235–245.

24.

Parsey

, Schmitter-Edgecombe

(2011) Quantitative and qualitative analyses of the Clock Drawing Test in mild cognitive impairment and Alzheimer disease: Evaluation of a modified scoring system. J Geriatr Psychiatry Neurol 24, 108–118.

25.

Tuokko

, Hadjistavropoulos

, Miller

, Beattie

(1992) The Clock Test: A sensitive measure to differentiate normal elderly from those with Alzheimer disease. J Am Geriatr Soc 40, 579–584.

26.

Babins

, Slater

M-E

, Whitehead

, Chertkow

(2008) Can an 18-point clock-drawing scoring system predict dementia in elderly individuals with mild cognitive impairment? J Clin Exp Neuropsychol 30, 173–186.

27.

Lessig

, Scanlan

, Nazemi

, Borson

(2008) Time that tells: Critical clock-drawing errors for dementia screening. Int Psychogeriatr 20, 416.

28.

Wolf-Klein

, Silverstone

, Levy

, Brod

, Breuer

(1989) Screening for Alzheimer’s disease by clock drawing. J Am Geriatr Soc 37, 730–734.

29.

Shulman

, PushkarGold

, Cohen

, Zucchero

(1993) Clock-drawing and dementia in the community: A longitudinal study. Int J Geriatr Psychiatry 8, 487–496.

30.

Folstein

, Folstein

, McHugh

(1975) “Mini-mental state”. J Psychiatr Res 12, 189–198.

31.

Philpot

(2004) The clock-drawing test: A critique. Int Psychogeriatr 16, 251–256.

32.

Kitabayashi

, Ueda

, Narumoto

, Nakamura

, Kita

, Fukui

(2001) Qualitative analyses of clock drawings in Alzheimer’s disease and vascular dementia. Psychiatry Clin Neurosci 55, 485–491.

33.

Milian

, Leiherr

A-M

, Straten

, Müller

, Leyhe

, Eschweiler

(2013) The Mini-Cog, Clock Drawing Test, and the Mini-Mental State Examination in a German Memory Clinic: Specificity of separation dementia from depression. Int Psychogeriatr 25, 96–104.

34.

Hussain

, Kumar

, Khan

, Gordon

, Khan

(2020) Similarities between depression and neurodegenerative diseases: Pathophysiology, challenges in diagnosis and treatment options. Cureus 12, e11613.

35.

Hodges

(2000) Memory in the dementias. In The Oxford Handbook of Memory, Tulving I, Craik FIM, eds.,Oxford University Press: New York 441–459.

36.

Rapp

, Dahlman

, Sano

, Grossman

, Haroutunian

, Gorman

(2005) Neuropsychological differences between late-onset and recurrent geriatric major depression. Am J Psychiatry 162, 691–698.

37.

Mackin

, Nelson

, Delucchi

, Raue

, Satre

, Kiosses

, Alexopoulos

, Arean

(2014) Association of age at depression onset with cognitive functioning in individuals with late-life depression and executive dysfunction. Am J Geriatr Psychiatry 22, 1633–1641.

38.

Lanza

, Sejunaite

, Steindel

, Scholz

, Riepe

(2020) Cognitive profiles in persons with depressive disorder and Alzheimer’s disease. Brain Commun 2, 1961.

39.

Lanza

, Sejunaite

, Steindel

, Scholz

, Riepe

, Ginsberg

(2020) On the conundrum of cognitive impairment due to depressive disorder in older patients. PLoS One 15, e0231111.

40.

Elderkin-Thompson

, Moody

, Knowlton

, Hellemann

, Kumar

(2011) Explicit and implicit memory in late-life depression. Am J Geriatr Psychiatry 19, 364–373.

41.

Brommelhoff

, Sultzer

(2015) Brain structure and function related to depression in Alzheimer’s disease: Contributions from neuroimaging research. Int Psychogeriatr 45, 689–703.

42.

World Medical Association (2013) World Medical Association Declaration of Helsinki. JAMA 310, 2191.

43.

Niemann

, Sturm

, Thöne-Otto

AIT

, Willmes

(2008) CVLT California Verbal Learning Test. German adaptation. Manual, Pearson Assessment, Frankfurt.

44.

Härting

, Markowitsch

, Neufeld

, Calabrese

, Deisinger

, Kessler

(2000) Wechsler Gedächtnis Test - Revidierte Fassung: Deutsche Adaptation der revidierten Fassung der Wechsler-Memory-Scale, Huber, Göttingen.

45.

Reitan

, Wolfston

(1985) The Halstead-Reitan NeuropsychologicalTest Battery: Theory and Clinical Interpretation, Neuropsychology Press, Tucson, AZ.

46.

Aschenbrenner

, Tucha

, Lange

(2000) RWT: Regensburger Wortflüssigkeits-Test, Hogrefe Verlag, Göttingen.

47.

World Health Organization (WHO) (1993) The ICD-10 classification of mental and behavioural disorders, World Health Organization.

48.

Burke

, Roccaforte

, Wengel

(1991) The short form of the Geriatric Depression Scale: A comparison with the 30-item form. J Geriatr Psychiatry Neurol 4, 173–178.

49.

Nakashima

, Umegaki

, Makino

, Kato

, Abe

, Suzuki

, Kuzuya

(2016) Neuroanatomical correlates of error types on the Clock Drawing Test in Alzheimer’s disease patients. Geriatr Gerontol Int 16, 777–784.

50.

Hosmer

Jr. , Lemeshow

, Sturdivant

(2013) Applied logistic regression, 398, John Wiley & Sons.

51.

Schotten de

, Urbanski

, Duffau

, Volle

, Lévy

, Dubois

, Bartolomeo

(2005) Direct evidence for a parietal-frontal pathway subserving spatial awareness in humans. Science 309, 2226–2228.

52.

Tranel

, Rudrauf

, Vianna

EPM

, Damasio

(2008) Does the Clock Drawing Test have focal neuroanatomical correlates? Neuropsychology 22, 553–562.

53.

Hodges

(2013) Distributed cognitive functions. In Cognitive Assessment for Clinicians,Hodges JR, ed., 2nd Edition,Oxford University Press,1–28.

54.

Salimi

, Irish

, Foxe

, Hodges

, Piguet

, Burrell

(2019) Visuospatial dysfunction in Alzheimer’s disease and behavioural variant frontotemporal dementia. J Neurol Sci 402, 74–80.

55.

Mendez

, Mendez

, Martin

, Smyth

, Whitehouse

(1990) Complex visual disturbances in Alzheimer’s disease. Neurology 40, 439.

56.

Irish

, Halena

, Kamminga

, Tu

, Hornberger

, Hodges

(2015) Scene construction impairments in Alzheimer’s disease - A unique role for the posterior cingulate cortex. Cortex 73, 10–23.

57.

Eknoyan

, Hurley

, Taber

(2012) The Clock Drawing Task: Common errors and functional neuroanatomy. J Neuropsychiatry Clin Neurosci 24, 260–265.

58.

Brzezicka

(2013) Integrative deficits in depression and in negative mood states as a result of fronto-parietal network dysfunctions. Acta Neurobiol Exp 73, 313–325.

59.

Barrows

, Barsuglia

, Paholpak

, Eknoyan

, Sabodash

, Lee

, Mendez

(2015) Executive abilities as reflected by clock hand placement. J Geriatr Psychiatry Neurol 28, 239–248.

60.

Leyhe

, Milian

, Müller

, Eschweiler

, Saur

(2009) The minute hand phenomenon in the clock test of patients with early Alzheimer disease. J Geriatr Psychiatry Neurol 22, 119–129.

61.

Adlam

A-LR

, Bozeat

, Arnold

, Watson

, Hodges

(2006) Semantic knowledge in mild cognitive impairment and mild Alzheimer’s disease. Cortex 42, 675–684.

62.

Dudas

, Clague

, Thompson

, Graham

, Hodges

(2005) Episodic and semantic memory in mild cognitive impairment. Neuropsychologia 43, 1266–1276.

63.

Vogel

, Gade

, Stokholm

, Waldemar

(2005) Semantic memory impairment in the earliest phases of Alzheimer’s disease. Dement Geriatr Cogn Disord 19, 75–81.

64.

Umegaki

, Suzuki

, Komiya

, Watanabe

, Yamada

, Nagae

, Kuzuya

, Loewenstein

(2021) Frequencies and neuropsychological characteristics of errors in the Clock Drawing Test. J Alzheimers Dis 82, 1291–1300.

65.

Herrmann

, Kidron

, Shulman

, Kaplan

, Binns

, Leach

, Freedman

(1998) Clock tests in depression, Alzheimer’s disease, and elderly controls. Int J Psychiatry Med 28, 437–447.

66.

Kirby

, Denihan

, Bruce

, Coakley

, Lawlor

(2001) The clock drawing test in primary care: Sensitivity in dementia detection and specificity against normal and depressed elderly. Int J Geriat Psychiatry 16, 935–940.

67.

Woo

BKP

, Rice

, Legendre

, Salmon

, Jeste

, Sewell

(2004) The Clock Drawing Test as a measure of executive dysfunction in elderly depressed patients. J Geriatr Psychiatry Neurol 17, 190–194.

68.

Bodner

, Delazer

, Kemmler

, Gurka

, Marksteiner

, Fleischhacker

(2004) Clock drawing, clock reading, clock setting, and judgment of clock faces in elderly people with dementia and depression. J Am Geriatr Soc 52, 1146–1150.

69.

Klein

, Saur

, Müller

, Leyhe

(2015) Comparison of clock test deficits between elderly patients with early and late onset depression. J Geriatr Psychiatry Neurol 28, 231–238.

70.

Beblo

, Sinnamon

, Baune

(2011) Specifying the neuropsychology of affective disorders: Clinical, demographic and neurobiological factors. Neuropsychol Rev 21, 337–359.

71.

Davis

, Penney

, Pittman

, Libon

, Swenson

, Kaplan

(2010) The Digital Clock Drawing Test (dCDT)-III: Clinician reliability for a new quantitative system. 39th Annual Meeting of the International Neuropsychological Society, Boston, MA.

72.

Penney

, Davis

, Libon

, Lamar

, Price

, Swenson

(2011) The Digital Clock Drawing Test (dCDT)-II: A new computerized quantitative system. 40th Annual Meeting of the International Neuropsychological Society, Boston, MA.

73.

Dion

, Arias

, Amini

, Davis

, Penney

, Libon

, Price

(2020) Cognitive correlates of digital clock drawing metrics in older adults with and without mild cognitive impairment. J Alzheimers Dis 75, 73–83.

74.

Buckley

, Atkins

, Fortunato

, Silbert

, Scott

, Evered

(2021) A novel digital clock drawing test as a screening tool for perioperative neurocognitive disorders: A feasibility study. Acta Anaesthesiol Scand 65, 473–480.

75.

Davoudi

, Dion

, Amini

, Libon

, Tighe

, Price

, Rashidi

(2020) Phenotyping cognitive impairment using graphomotor and latency features in digital clock drawing test. Annu Int Conf IEEE Eng Med Biol Soc 2020, 5657–5660.

76.

Dion

, Frank

, Crowley

, Hizel

, Rodriguez

, Tanner

, Libon

, Price

(2021) Parkinson’s disease cognitive phenotypes show unique clock drawing features when measured with digital technology. J Parkinsons Dis 11, 779–791.

77.

Yuan

, Libon

, Karjadi

, Ang

AFA

, Devine

, Auerbach

, Au

, Lin

(2021) Association between the digital clock drawing test and neuropsychological test performance: Large community-based prospective cohort (Framingham Heart Study). J Med Internet Res 23, e27407.

78.

Davis

, Libon

, Au

, Pitman

, Penney

(2014) THink: Inferring Cognitive Status from Subtle Behaviors, AAAI Conference on Artificial Intelligence. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2898-2905.

79.

Piers

, Devlin

, Ning

, Liu

, Wasserman

, Massaro

, Lamar

, Price

, Swenson

, Davis

, Penney

, Au

, Libon

(2017) Age and graphomotor decision making assessed with the digital clock drawing test: The Framingham Heart Study. J Alzheimers Dis 60, 1611–1620.

80.

Binaco

, Calzaretto

, Epifano

, McGuire

, Umer

, Emrani

, Wasserman

, Libon

, Polikar

(2020) Machine learning analysis of digital clock drawing test performance for differential classification of mild cognitive impairment subtypes versus Alzheimer’s disease. J Int Neuropsychol Soc 26, 690–700.

81.

Chen

, Stromer

, Alabdalrahim

, Schwab

, Weih

, Maier

(2020) Automatic dementia screening and scoring by applying deep learning on clock-drawing tests. Sci Rep 10, 1778.

82.

Souillard-Mandar

, Davis

, Rudin

, Au

, Libon

, Swenson

, Price

, Lamar

, Penney

(2016) Learning classification models of cognitive conditions from subtle behaviors in the digital Clock Drawing Test. Mach Learn 102, 393–441.