The Role of Cue-Based Strategies in Skilled Diagnosis Among Pathologists

Abstract

Objective

This research was designed to test whether behavioral indicators of pathology-related cue utilization were associated with performance on a diagnostic task.

Background

Across many domains, including pathology, successful diagnosis depends on pattern recognition that is supported by associations in memory in the form of cues. Previous studies have focused on the specific information or knowledge on which medical image expertise relies. The target in this study is the more general ability to identify and interpret relevant information.

Method

Data were collected from 54 histopathologists in both conference and online settings. The participants completed a pathology edition of the Expert Intensive Skills Evaluation 2.0 (EXPERTise 2.0) to establish behavioral indicators of context-related cue utilization. They also completed a separate diagnostic task designed to examine related diagnostic skills.

Results

Behavioral indicators of higher or lower cue utilization were based on the participants’ performance across five tasks. Accounting for the number of cases reported per year, higher cue utilization was associated with greater accuracy on the diagnostic task. A post hoc analysis suggested that higher cue utilization may be associated with a greater capacity to recognize low prevalence cases.

Conclusion

This study provides support for the role of cue utilization in the development and maintenance of skilled diagnosis amongst pathologists.

Application

Pathologist training needs to be structured to ensure that learners have the opportunity to form cue-based strategies and associations in memory, especially for less commonly seen diseases.

Keywords

medical image perception cue utilization cognitive load prevalence

Histopathology is a medical pathology subspecialty, the role of which involves the examination of tissue to correctly identify tissue features that are normal, normal variants, and clinically significant pathological processes. Correct interpretation can require integration of morphological findings with clinical history. Pathologists visually inspect histopathological slides using a light microscope allowing for the interpretation and classification of diseases. Typically, tissue biopsy or resections referred to histopathologists from a variety of medical specialists have already been subjected to diagnostic tests to confirm pathology, thereby creating an environment where disease prevalence tends to be high. The diagnosis rendered by the histopathologist is considered the gold standard and often the starting point for determining the patient’s treatment and prognosis. This is especially critical for breast cancer, where early diagnosis results in a 97%, 5-year or more survival rate(Australian Institute of Health and Welfare, 2009).

Errors within diagnostic medicine more broadly are estimated at >10% (Goldman et al., 1983; Hoff et al., 2012; Kirch & Schafii, 1996; Shojania et al., 2003). In pathology, errors in cancer diagnosis are reported at up to 12% (Raab et al., 2005). The discrepancy between breast cancer staging among breast histopathologists is at a high of 40% (Elmore et al., 2015). Elmore et al. (2015) conducted a large study in the United States with 115 practicing pathologists and showed a 24.7% disagreement rate among pathologists interpreting breast biopsies. This rate was higher for denser breasts, and among pathologists who interpreted lower weekly case volumes, worked in smaller practices or nonacademic settings.

Unfortunately, errors that occur in pathology and diagnostic medicine can of course have grave consequences. For example, the under-interpretation of an atypical cancer may delay the required treatments (false negatives or misses) and conversely an overdiagnosis of normal tissue may lead to unnecessary invasive treatments (false positives or false alarms). These issues are complicated as the base rate for histopathology is nonnormal. In practice, there are programs in place that record a pathologist’s performance and provide the necessary feedback. For example, the Royal College of Pathologists of Australasia (RCPA) offers a Quality Assurance Program (QAP) in which pathologists must participate to maintain their registration. Given that the pathology diagnosis is final and definitive, it is important to explain the underlying processes involved in these crucial, diagnostic decisions and therefore, how errors might be prevented.

Nodine and Krupinski (1998) and, more recently, Drew et al. (2013) proposed that successful diagnosis results when medical image specialists apply their fine-tuned perceptual and cognitive skills to rapidly process a layout/scene globally to reach their decision. Consistent with this account, abnormalities in a display or image can be detected rapidly, following a brief glance (Brennan et al., 2018; Carmody et al., 1981; Carrigan et al., 2018; Charness et al., 1996; Evans et al., 2013). Amongst cytologists, performance is above chance at detecting abnormalities in micrographs of cervical smears presented for 250 ms (Evans et al., 2013). Similarly, expert radiologists and pathologists fixate more rapidly than trainees on an abnormality and do so using fewer visual saccades (Krupinski et al., 2013; Kundel & La Follette, 1972; Kundel & Nodine, 1975; Kundel et al., 1978).

Krupinski et al. (2013) characterized the scanning patterns of resident pathologists (pathologists in training) throughout their training and demonstrated their search patterns changed from a less efficient strategy (scanning around the entire visual field) to a more efficient strategy (targeted search) with experience. Further, visual expertise is associated with less time fixating on diagnostically irrelevant and nondiagnostic regions (Brunyé et al., 2014; Krupinski et al., 2013). These capabilities suggest that experts extract global properties of an image rapidly and develop a finely tuned perceptual representation that almost instantly supports the relationship between visual stimuli and a diagnosis. This ability is also likely supported by patterns in memory gained through past experience that are triggered when presented with the stimulus (Brunyé et al., 2014).

According to Nodine and Mello-Thoms (2010), when an expert considers a case, “features” that are extracted during the initial glance are compared against a template or “pattern” in memory. This process tends to be rapid, nonconscious, and domain-specific, and is dependent upon a repertoire of feature–event/object associations in memory that serve as “cues” (Brunswik, 1955; Croskerry, 2009; Klein, 1989; Wiggins, 2014, 2021; Wiggins, Brouwers, et al., 2014). Carrigan et al. (2019) demonstrated that compared with nonradiologists, radiologists perceive features that are especially diagnostic to the case (subtle nodule in a chest radiograph) as more salient, lending support to the notion that feature-based cues are integral for successful performance.

Norman (2005) demonstrated that the greater and more diverse an expert’s memory for patterns, the faster their capacity to match a visual percept against a stored pattern. For example, a pathologist reviewing a breast biopsy slide might associate the destruction of normal, lobular architecture of the breast glands (feature) with the likelihood of cancer (event). As additional cases are reviewed, a repertoire of feature–event associations is accrued, enabling more precise discrimination in future encounters. As these associations become integrated to form patterns, cognitive processes become automated, reducing the demand on cognitive resources, including working memory, while maintaining accuracy and efficiency (e.g., see Curby et al., 2009 and Curby & Gauthier, 2007, for evidence of domain-specific visual working memory capabilities afforded by expertise).

Indirect evidence of the role of cue utilization in supporting diagnostic performance is evident in experts’ extraction of a limited number of diagnostic features and their relatively rapid and efficient formulation of diagnoses in comparison to nonexperts (e.g., Carrigan et al., 2018, 2019; Brennan et al., 2018; Evans et al., 2013; Krupinski et al., 2013; Kundel & La Follette, 1972). However, the strength of these associations in memory is likely to depend on the frequency with which they are activated. For example, patterns of features that are reviewed less frequently in practice may decay through inactivity, whereas a lack of exposure to specific cases may result in patterns of features that are imprecise and/or liable to inappropriate activation, resulting in diagnostic errors. Since the activation of patterns is reliant upon cues in memory, differences in the capacity to utilize cues within a given context is likely to be an important cause of differences in diagnostic accuracy.

Ideally, assessments of cue-based associations in memory would involve the identification of a universal catalog of features that could constitute a benchmark for a “test of existence.” However, the specific feature–event associations that experts acquire are shaped by their individual experience. Moreover, visual features may be more or less salient based on individual differences in visual detection skills, so that, despite similar levels of experience, experts are unlikely to share identical cue-based relationships in memory.

Despite differences in the specific feature–event/object relationships in memory, it remains possible for practitioners to achieve similar levels of performance, particularly in the context of diagnosis. Characterizing the nature of this behavior allows more general inferences to be drawn concerning the application or utilization of cue-based associations from memory. Relatively higher cue utilization can be inferred based on the capacity of a practitioner to: (1) rapidly identify areas of concern, (2) accurately recognize classes or categories of features, (3) greater and more rapidly differentiate associations between features, (4) discriminate relevant from less relevant features during problem-solving, and (5) demonstrate a more explicit prioritization of the acquisition of features during problem orientation (Wiggins, 2014, 2021).

Comparative behavioral assessments of cue utilization have successfully differentiated diagnostic performance across a range of nonmedical contexts including rail control (Sturman et al., 2019), aviation piloting (Schriver et al., 2008; Wiggins, Azar, et al., 2014), air-traffic control (Falkland & Wiggins, 2019), water safety (Wiggins et al., 2019), software engineering (Loveday et al., 2014), and electricity network power system control (Loveday, Wiggins, Harris, et al., 2013). In medical domains, including radiology (Carrigan et al., 2020), pediatrics (Loveday, Wiggins, Searle, et al., 2013), and anesthesiology (Crane et al., 2018), behavior reflecting higher cue utilization is associated with greater levels of accuracy in both direct and indirect measures of diagnostic performance.

The primary goal of the present study was to test the contribution of cue utilization in the diagnostic performance of histopathologists, accounting for self-reported experience. In the case of histopathology, diagnosis requires that a clinician match features associated with a pattern of cells against a repertoire of features in memory. These features include tissue architecture; cell arrangement; and cytoplasmic and nuclear shape, size, color, and density.

A greater and more discriminating repertoire of feature–event associations, reflecting a greater capacity for the utilization of cues, is likely to enable a more accurate and rapid diagnosis. Therefore, it was hypothesized that relatively higher cue utilization in the context of histopathology would be associated with relatively greater accuracy on a simulated diagnostic task, independent of experience.

Method

Participants

Pilot data were collected from eight histopathologists across a range of experience. Experimental data were then collected from 54 participants (32 female) who participated in a conference setting (n = 35), or online (n = 19) and who were familiar with reporting histopathological specimens. The mean age of histopathologists in the experimental group was 46 years (SD = 12 years). Forty participants were qualified pathologists and 14 were residents (pathologists in training). Participants reported an average experience as a pathologist of 14 years (SD = 12, range 1–52), having recently read between zero (one was retired) to 14,000 cases per year (M = 3663, SD = 3554). Three participants were left-handed, all reported normal or corrected-to-normal vision, and all were naïve to the purposes of the experiment. In return for participation, participants were offered the chance to win an iPad.

Measures

The participants completed a 45-min task which included: (1) a demographic survey, (2) an assessment of cue utilization using the Expert Intensive Skills Evaluation platform (EXPERTise 2.0; Wiggins et al., 2015), and (3) a computer-based static image recognition task that provided an independent measure of diagnostic performance.

Demographic Survey: Covariates

The participants were asked to indicate their age, sex, handedness, qualifications, their number of years of experience in pathology, the number of cases performed per day, the number of cases performed per year, and how frequently they played video games or a musical instrument. If they played a musical instrument, they were asked to indicate their perceived level of proficiency. These data were recorded, as there is evidence to suggest that video game players perform at a higher level on attentional tasks (e.g., Cain et al., 2014), and that playing a musical instrument may enhance visuo-spatial abilities (Boyd et al., 2008). Using five-point Likert scales, participants were also asked about their energy levels at time of experimentation, their confidence in their role, the extent to which they considered that “pattern recognition is an inherent skill which cannot be taught,” and their self-rated performance as a pathologist.

Cue Utilization

Cue utilization was assessed using the EXPERTise 2.0 platform (Wiggins et al., 2015). EXPERTise 2.0. is an online assessment tool that incorporates five tasks that are designed to assess behavior that corresponds to the utilization of cues in a specific domain. Guided by subject-matter experts, stimuli are developed for the various tasks that represent a specific domain or context.

In the feature identification task (FIT), participants are asked to identify, as quickly as possible, an abnormality or area of concern. Participants responded to one practice and 15 randomized trials, 10 of which incorporated a single area of abnormality whereas the remainder were normal. The anatomical location of the abnormal images comprised the breast (2), lung (2), gastrointestinal tract (4), liver (2), and salivary gland (1). The response was made under free-viewing conditions and no feedback was provided. Notably, no assumptions are made about what specific features experts should use to select a region. In the FIT, higher cue utilization is generally associated with lower mean response latency, reflecting the rapid and targeted identification of an area, rather than an exhaustive search (Loveday, Wiggins, Harris, et al., 2013) (Figure 1).

Figure 1

Example image from the first of five tasks (the Feature Identification Task) from the pathology edition of EXPERTise 2.0. Panel (a) depicts an image presented to the participants. Panel (b) depicts the image with the abnormal area outlined in black, representing gastric adenocarcinoma.

In the feature recognition task (FRT), participants are asked to classify stimuli following a short exposure. One practice and 20 randomized, experimental scenarios were displayed for 4 s. Each scenario contained an abnormality, in response to which, participants were asked to select one from five multiple choice options to categorize the abnormality (Benign Neoplasm, Malignant Neoplasm, Reactive/Inflammatory Process, Developmental, Metabolic). No feedback was provided. In the FRT, higher cue utilization is generally associated with greater accuracy (Wiggins & O’Hare, 2003’Hare, 2003).

In the feature association task (FAT), participants are asked to assess the relatedness of pairs of pathological, text-based, task-related stimuli using a six-point Likert Scale. The relatedness of the texts within each trial varied and was reviewed by a subject-matter expert. Over one practice trial and 15 randomized experimental trials, the two terms were presented sequentially for 1500 ms after which participants were asked to rate the relatedness of the features on a six-point Likert scale (from 1 = Extremely unrelated to 6 = Extremely related). For example, “Granulation Tissue” (feature) might be followed by “Tuberculosis” (event). Seven pairs were less likely to be related (e.g., “Amyloid” [feature] followed by “Grass” [event]), and eight were more likely to be related in practice (e.g., “Bubbly Cytoplasm” [feature] followed by “Chordoma” [event]). In the FAT, higher cue utilization is typically associated with greater variance in the perceived relatedness of the terms as a function of response latency (Morrison et al., 2013).

In the feature discrimination task (FDT), participants are presented with a short, written description of a problem-oriented scenario. On the basis of this information, they are asked to select a response from four possible options based on their typical response (i.e., “What would be your first response in this situation?”). Having selected a response, participants are presented with a list of features that were incorporated in the scenario and are asked to rate, with reference to the decision, the perceived importance of the features using a 10-point Likert scale (from 1 = Not important at all to 10 = Extremely important). In the FDT, higher cue utilization is associated with greater variance across feature-relevance ratings (Pauley et al., 2009; Weiss & Shanteau, 2003), where participants are more likely to select either Not important at all (1) or Extremely important (10).

For the feature prioritization task (FPT), participants are required to prioritize information to solve a histopathology-related problem. The histopathologists were presented with incomplete, descriptive scenarios (“You are doing routine reporting. As quickly as possible, access the information below that you feel is necessary to decide on your response.”), and were provided a list of 14 information screens (feature cues), varying in relevance and presented randomly in drop-down tabs from which feature-related information could be accessed (e.g., “Clinical impression,” “Low power microscopic image”). Participants with higher cue utilization tend to access the information in a less sequential manner, reflecting their ability to prioritize the information perceived as important to their goal. Meanwhile, participants with lower levels of cue utilization tend to access the submenus in a more sequential manner, accessing features as they are listed (Wiggins & O’Hare, 1995). Higher cue utilization is associated with a lower frequency of pairs of features accessed in sequence, calculated as a proportion of the total frequency of pairs of features accessed during the scenario (Wiggins & O’Hare, 1995; Table 1).

Table 1

Summary of Five Tasks Within EXPERTise 2.0

Task	Cognitive Process Examined	Task Description	Measure	Validity/Reliability
FIT	Identification of predictive features	Identify, as quickly as possible the area of concern.	Response latency	Loveday, Wiggins, Harris, et al. (2013) Wiggins, Azar, et al. (2014)
FRT	Identification of predictive features	Select the category of abnormality, displayed for 4 s.	Accuracy	Loveday, Wiggins, Searle, et al. (2013)
FAT	Feature–event relationships in memory	Rate the strength of perceived associations between the feature and event	Variance divided by response latency	Morrison et al. (2013)
FDT	Discrimination between predictive features	Rate the relative importance of features during a task-related problem-solving process.	Variance	Pauley et al. (2009)
FPT	Prioritization of feature–event relationships	Acquire task-related information to solve a problem-solving process.	Ratio of sequential to nonsequential menus accessed	Wiggins and O’Hare (1995) Wiggins et al. (2002)

Note. EXPERTise = EXPERT Intensive Skills Evaluation; FAT = feature association task; FIT = feature identification task; FDT = feature discrimination task; FPT = feature prioritization task; FRT = feature recognition task.

The test–retest reliability of each of the tasks that comprise EXPERTise 2.0 has been demonstrated in the context of audiology (Watkinson et al., 2018), whereas that of classifications based on performance aggregated across the five tasks has been demonstrated in the context of electrical transmission power control (Loveday, Wiggins, Searle, et al., 2013). The construct validity of the EXPERTise 2.0 classifications has been demonstrated in radiology (Carrigan et al., 2020), transmission and distribution power control (Loveday, Wiggins, Harris, et al., 2013; Sturman et al., 2019), software engineering (Loveday et al., 2014), and aviation (Wiggins, Brouwers, et al., 2014), whereas predictive validity has been demonstrated in audiology (Watkinson et al., 2018).

Diagnostic Performance

Diagnostic performance was assessed using a static image recognition task. The stimuli consisted of one practice and 31 test images from various anatomical locations (e.g., breast, kidney, skin). Each image included five unique multiple-choice responses developed by two subject-matter experts. Four of the images demonstrated pathologies not included in the multiple-choice options, where the correct answer was “none of the above.” The participants were instructed: “As quickly as possible, please select the most correct answer from the options below.” The de-identified images were provided by the subject-matter experts (Figure 2).

Figure 2

Exemplar of an image from the static image recognition stimuli set. Answer = C.

Procedure

The research was conducted in accordance with the American Psychological Association Code of Ethics and was approved by Macquarie University Human Research Ethics Committee (Medical Sciences). Informed consent was obtained from all participants. EXPERTise 2.0 and the static image recognition tasks were counterbalanced across participants. All of the tasks within EXPERTise 2.0 were blocked in a set order, and the scenarios were randomized within task. After completing a series of demographic questions, each task began with a practice trial followed by the EXPERTise 2.0 or static image recognition tasks.

Results

Preliminary Analysis

All statistical analyses were performed using IBM Statistical Software for the Social Sciences (SPSS Version 25). Outliers defined as ±2 SD were removed from the EXPERTise raw data (four participants) leaving 50 participants for the main analysis. As the participants’ raw scores for each of the five EXPERTise 2.0 tasks were on different scales, these were standardized (z scores) and then aggregated across the five tasks. The dependent variable for the image interpretation task was accuracy (% correct).

Nineteen participants completed the study online at their convenience. The study instructions included requests to perform the study on a laptop or desktop (no tablets or cell phones), and at a time when they were least likely to be interrupted. An independent samples t-test was performed and showed no differences in performance between those participants who completed the study in a conference setting or online; t(52) = −.3, p = .77.

Demographic Survey: Covariates

A series of correlations were conducted between the demographic variables and accuracy on the static image recognition task. As expected, positive Pearson’s correlations were evident between accuracy on the static image recognition task and both the number of cases performed per day; r(52) = .39, p = .004, and the number of cases performed per year; r(52) = .37, p = .006. This suggests that the diagnostic performance measure has construct validity. A positive Spearman’s correlation was evident between self-rated confidence in their role as a pathologist and performance on the image interpretation task; r_s (52) = 34, p = .01. There was no correlation between self-reported years of experience and performance on the static image recognition task; r(52) = .23, p = .097. There were no other statistically significant correlations evident (p > .05). Since level of confidence is a subjective measure, it was not included in the main analysis. As the number of cases/day and per year are essentially the same measure, only the number of cases per year was included as a potential explanatory variable (covariate) in the main analysis.

Cue Utilization

Cue utilization was established based on the participant’s performance across the five tasks included in EXPERTise 2.0. Previous research involving the five EXPERTise tasks suggests that performance can be discriminated at two levels (higher, lower) that reflect differences in behavior during tasks that demand the utilization of cues (e.g., Brouwers et al., 2017; Carrigan et al., 2020; Falkland & Wiggins, 2019). Consequently, a k-means cluster analysis was employed to delineate two groups based on the standardized scores for the five EXPERTise tasks.

An inspection of the centroids that emerged following the cluster analysis was used to distinguish higher from lower performance and ensure that this pattern was uniform across the five tasks. Consistent with the hypothesis, the cluster analysis in the present study reflected uniformly, a pattern of centroids that distinguished higher from lower performance. In comparison to lower cue utilization, higher cue utilization is characterized by faster responses in the FIT, greater accuracy in the FRT, greater variance as a proportion of response latency in the FAT, greater variance in the FDT, and a lower proportion of features accessed in the sequence in which they were presented in the FPT.

Eighteen participants comprised a group whose behavior was consistent with lower cue utilization, whereas 32 participants comprised a group whose behavior reflected higher cue utilization (Table 2). This delineation is intended to take into account nonlinear changes that occur in the rate at which cue-based associations are identified and retained in memory. There were proportionally more trainees assigned to the lower cue utilization group (Table 3). Four participants did not complete all of the EXPERTise 2.0 tasks, so were not assigned a cluster membership. Notably, cue utilization in this case, is a relative, not absolute, measure. Therefore, those participants assigned to Cluster 1 show patterns consistent with relatively lower cue utilization compared, whereas those participants assigned to Cluster 2 showed patterns consistent with relatively higher cue utilization.

Table 2

Centroids for the Standardized Scores for Each of the EXPERTise 2.0 Five Tasks Distributed Across the Two Groups That Were Delineated by a K-Means Cluster Analysis

	Cluster 1:Lower CueUtilization Group(n = 18)	Cluster 2:Higher CueUtilization Group(n = 32)
Task	Centroid	Centroid
Feature identification	.92616	−.51780
Feature recognition	−.80742	.50503
Feature association	−.72342	.39166
Feature discrimination	−.11779	.11648
Feature prioritization	.23245	−.06519

Table 3

Frequency of Trainee and Qualified Pathologists Per Cluster Group

	Cluster 1:Lower Cue Utilization Group (n = 18)	Cluster 2:Higher Cue Utilization Group (n = 32)
Trainees	7	4
Qualified	11	28

Diagnostic Performance

Diagnostic performance was measured as percentage accuracy on the static image recognition task; mean accuracy = 66.55% (SD = 12.4). Single-sample t-tests on mean accuracy relative to chance, where chance was 20% (one in five possible responses), showed that pathologists performed above chance on the image interpretation task; t(53) = 27.59, p < .0001.

To address our first aim, a between-groups analysis of covariance (ANCOVA) was performed, incorporating two levels of cue utilization (higher/lower) as the independent variable, and the number of cases per year included as a covariate, to test its contribution to diagnostic performance; F(1,47) = 8.4, p = .006, $η_{p}^{2}$ = .14.¹ An inspection of the means indicated that, controlling for cases per year, those histopathologists with higher cue utilization (M = 71.57, SD = 9.21) performed more accurately on the static image recognition task compared with pathologists with lower cue utilization (M = 60.93, SD = 13.32; Figure 3).

Figure 3

Percentage accuracy mean scores on the static image recognition task, distributed across cue utilization, controlling for cases read per year, for 50 histopathologists. Error bars represent 95% confidence intervals. Note. Chance is 20%.

An independent samples t-test, incorporating cue utilization as the grouping variable, failed to reveal a statistically significant difference between the groups in the number of cases read per year, t(48) = −1.7, p = .1. This suggests that cue utilization and cases per year read by pathologists contribute independently to the difference in performance on the diagnostic task.

Although the histopathologists performed above chance on the static image recognition task, their performance as a group was far from ceiling (M = 66.55%). It is possible that these results reflect that the sample included pathologists in training (14/40) who are yet to become familiar with the pathologies included in the image set. Alternatively, though the stimuli were carefully selected by two subject-matter experts, it may be the case that the performance of participants reflects the inherent variability in real-world images, and possibly within the image set. These factors include the difference in prevalence of specific types of cases in different practice settings.

In a post hoc, exploratory analysis, we first tested whether disease frequency and the rate at which histopathologists were likely to have encountered similar cases previously in their practice may be related to their performance on the static image recognition task. Of the cases in the set, 12 of 31 cases were classified by a subject-matter expert, and cross-checked with available epidemiology data, as “rare,” whereas 19 of 31 cases were considered “common.” The case with the lowest accuracy across the cohort was a rarely encountered disease occurring in a skin biopsy specimen, where six of the 50 pathologists (5/6 with higher cue utilization and 1/6 with lower cue utilization) diagnosed the case accurately. For this condition, there is a possibility that the low accuracy resulted from the relative unfamiliarity of the case.

A mixed-repeated ANCOVA testing the accuracy data from the static image recognition task, with disease frequency (common/rare) as a within-subjects variable, cue utilization (higher/lower) comprising a between-subjects variable, and the estimated number of cases per year as a covariate, revealed a main effect for disease frequency, F(1,47) = 66.30, p < .0001, $η_{p}^{2}$ = .59. An inspection of the means indicated that, consistent with expectations, accuracy was greater for cases that were classified as common (M = 74.60, SD = 11.29) rather than rare (M = 53.50, SD = 15.9).

A main effect was also evident for cue utilization, F(1,47) = 8.1, p = .007, $η_{p}^{2}$ = .15, and an inspection of the means revealed that those participants with higher cue utilization recorded greater accuracy for both the more common cases (M = 78.84, SD = 13.65) and the rarer cases (M = 58.85, SD = 19.23) compared with histopathologists with lower cue utilization (common: M = 70.37, SD = 18.3; rare: M = 48.16, SD = 25.81), controlling for the number of cases per year (Figure 4).

Figure 4

Percentage accuracy mean scores on the static image recognition task distributed across cue utilization, for the common and rare cases, controlling for cases read per year, for 50 histopathologists. Error bars represent 95% confidence intervals.

For a small number of Figure (4), the correct response was “none of the above,” since the pathology depicted was not listed as one of the other alternatives. Therefore, we explored whether the relationship between cue utilization and accuracy differed, depending upon the presence or absence of a pathology listed as an alternative. A mixed-repeated ANCOVA, incorporating the presence or absence of listed pathology (pathology listed/pathology not-listed) as a within-groupsvariable, two levels of cue utilization (higher/lower) as a between-subjects variable, and the estimated number of cases per year as a covariate, tested whether there were differences in the accuracy of responses.

The results revealed a statistically significant interaction between cue utilization and the presence or absence of a listed pathology, F(1,47) = 7.42, p = .009, $η_{p}^{2}$ = .14. Post hoc contrasts (Bonferroni corrected) indicated that those participants with lower cue utilization (M = 52.54, SD = 38.53) were more likely to record a false positive, rather than correctly respond with “none of the above,” compared with those participants with higher cue utilization (M = 79.04, SD = 28.7). (Note: this analysis is exploratory, as there was a disproportionate number of not-listed cases [four] compared with the listed cases [27]; Figure 5).

Figure 5

Percentage mean scores on the static image recognition task, distributed across cue utilization, for the pathology listed and not-listed cases, controlling for cases read per year, for 50 histopathologists. Error bars represent 95% confidence intervals.

In combination, the data suggest that differences in cue utilization contribute to performance on a diagnostic task, over and above the estimated number of cases read per year, and that these results may in part, be driven by disease prevalence both in our image set, and epidemiologically, along with the ability to recognize when a pathology is present or not present in an image.

Discussion

The overall aim of this study was to investigate whether diagnostic assessments of cue utilization differentiate performance in the context of histopathology, accounting for self-reported experience (number of pathology cases interpreted per year). Consistent with the hypothesis, higher cue utilization was associated with greater accuracy on a simulated diagnostic interpretation task requiring evaluation of tissue with significant abnormalities. A positive relationship was also evident between the estimated number of cases read per year and diagnostic accuracy. However, no relationship was evident between the estimated number of cases read per year and cue utilization. Consistent with previous research, this suggests that cue utilization and experience, in the form of the self-reported estimates of the number of cases read per year, contribute independently to diagnostic accuracy (Carrigan et al., 2020; Crane et al., 2018; Loveday, Wiggins, Searle, et al., 2013; Sturman et al., 2019).

This study provides evidence for differences on diagnostic performance based on cue utilization, independent of the number of cases read per year. However, there are other factors to consider that also contribute to expertise, including personality and genetic and developmental factors (Hambrick et al., 2016), that were not evaluated in the current study. Further, there is evidence to suggest that some pathologists have an inherent domain-general ability for visual recognition, independent of experience (e.g., Trueblood et al., 2018). It is also likely that in practice, if the disease frequency was inconsistent, “experience” would begin to affect diagnostic outcomes and accuracy, independent of the number of cases read per year. Future research examining these issues is warranted, particularly comparing extreme values of cue utilization, where differences and variability in performance across the tasks can be maximized, and equal sample sizes can be assured. Importantly, the findings in the present study are consistent with Carrigan et al. (2020), who showed that among radiologists, cue utilization was positively associated with performance on a diagnostic task, independent of experience.

Although performance on the diagnostic task was above chance, it should be noted that overall accuracy was 66.55%. This is unsurprising as none of the participants reported being accustomed to diagnostic reporting from histopathology images on screens. The simulated diagnostic task presented a target tissue abnormality in a setting that was unfamiliar, rather than the normal pathological examination using an optical light microscope to view multiple fields at multiple magnifications. Although many laboratories are moving from optical microscopy to digital whole slide images, this was not the case for the current sample. Furthermore, the experimental setting lacked the opportunity for corroboration from colleagues or other resources such as textbooks or online images that would normally be available in routine practice when diagnostic uncertainty is encountered. Although presenting its own challenges, our diagnostic setting reproduces the essence of the routine practice setting. Unfortunately, the routine practice setting is not yet amenable to the study of activation of highly-refined cue-based associations in memory and how these associations are able to better discriminate differences in capability.

The post hoc analysis suggested that the types of cases and the prevalence of cases to which pathologists are exposed may be associated with differences in accuracy. Specifically, in comparison to participants with lower cue utilization, participants with higher cue utilization demonstrated relatively greater accuracy for less common pathologies, and where the pathology was not listed as a specific option for selection (i.e., the correct option was “none of the above”). This has important training implications given that discordance errors are higher for histopathologists who interpret lower weekly case volumes, worked in smaller practices or nonacademic settings (Elmore et al., 2015).

Since improvements in cue utilization are thought to occur through the internalization of feature–event associations in memory, learning opportunities are necessary that enable, as quickly as possible, the extraction of features and their association with specific pathologies in memory are necessary (Wiggins, 2014). In the absence of supervised operational exposure, structured simulation such as reviewing and testing using image libraries with feedback offers an alternative with the potential to facilitate the acquisition of cue-based associations at a rate faster and more efficiently than might be afforded through work-based exposure. With further research, evaluations based on normative assessments of performance could be employed to ensure successive improvements in cue utilization.

Opportunities for future research also lie in examining whether cue utilization moderates performance differently depending upon the prevalence of cases. In the current study, our exploratory findings suggested that disease frequency or prevalence was related to diagnostic performance. Low target prevalence (i.e., rare or uncommon targets, few cases are truly abnormal) generally results in elevated miss rates (Evans et al., 2011; Mitroff & Biggs, 2014; Wolfe et al., 2005). Evans et al. (2011) demonstrated this “low prevalence effect” in a cytology screening context where in the United States, the epidemiological prevalence of cervical cancer is around 1% (Benard et al., 2004). By contrast, disease prevalence in the context of histopathology tends to be higher.

Typically, in a breast histopathology setting, prior to the evaluation by a histopathologist, the patient and tissue samples have already been subjected to diagnostic testing and procedures such as mammograms, prostate-specific antigen (PSA) testing for prostate cancer, polyps biopsied during routine colonoscopies, or surgical excision. Further, specimens are accompanied by clinical information and surgical reports. This is likely to create an environment that potentially primes the confirmation of an abnormality. Testing the existence of “abnormality priming” is necessary to consider the effects of prevalence on pathologists’ cue utilization on a longer-term basis.

Conclusion

This study was designed to test the role of cue utilization in the diagnostic performance of histopathologists. Accounting for the self-reported number of cases reported per year, higher cue utilization was associated with higher accuracy on an independent, simulated diagnostic task. This effect was greatest where the pathology depicted on the image was not listed as a potential classification option and “none of the above” was the accurate response.

Despite the limitations associated with the use of a simulated task that potentially lacks ecological validity, the results raise important issues concerning the relationship between motivation and cue utilization, particularly in response to challenging tasks. The outcomes also highlight the value of an approach that does not assume that experts use the same cues or features during problem resolution. Our exploratory image analyses showed that factors such as disease prevalence are also likely to influence accuracy. Crucially, the outcomes of the study have important implications for the development and maintenance of skills amongst pathologists: exposure sufficient to acquire and utilize cue-based strategies appears to be associated with the highest level of performance on the diagnostic task.

Key Points

In pathology, accurate diagnosis involves accurate, reliable and timely responses.

Pathological cue utilization is positively associated with performance on a diagnostic task, independent of the number of cases reported per year.

A post hoc analysis suggested that higher cue utilization may be associated with a greater capacity to recognize rarer and low prevalence cases.

Targeted training that encourages the acquisition and utilization of cue-based strategies is recommended.

Footnotes

Acknowledgments

For data collection, we acknowledge the support of The Royal College of Pathologists of Australasia and the Royal College of Pathologists of Australasia Quality Assurance Program. This work is supported by an ARC Discovery Project grant awarded to M.W.W (CI) K.M.C., A.G., and T.P. (DP5056000).

ORCID iD

Ann J. Carrigan

Note

Author Biographies

Ann J. Carrigan is a postdoctoral researcher at Macquarie University, Australia, and works in radiology as a medical sonographer. Her research focuses on medical image perception and expertise in diagnostic medicine.

Amanda Charlton is an anatomical pathologist at Auckland City Hospital, New Zealand; her clinical work involves making diagnoses on tissue biopsies. Her interests include image perception, medical education, artificial intelligence, and bias in diagnostic pathology.

Elliott Foucar is an adjunct professor in the Department of Pathology, University of New Mexico School of Medicine, USA. His major areas of interest are general diagnostic surgical pathology, dermatopathology, cytopathology, and autopsy pathology.

Mark W. Wiggins is a professor of organizational psychology at Macquarie University, Australia. His research focuses on skill acquisition and expertise in advanced technology environments.

Andrew Georgiou is a professor of diagnostic informatics at Macquarie University, Australia. His research focuses on diagnostic and organizational communication within the healthcare system.

Thomas J. Palmeri is a professor of psychology, ophthalmology, and visual sciences at Vanderbilt University, USA. His research focuses on perceptual categorization, category learning, visual learning, visual memory, perceptual expertise, object and face recognition, automaticity, perceptual decision making, mathematical, computational, and neural modeling.

Kim M. Curby is an associate professor of cognitive psychology at Macquarie University, Australia. Her research focuses on the cognitive mechanisms and neural substrates that underlie expert performance in visually based domains.

References

Australian Institute of Health and Welfare . (2009). Australia’s welfare 2009. AIHW.

Benard

V. B.

Eheman

C. R.

Lawson

H. W.

Blackman

D. K.

Anderson

Helsel

Thames

S. F.

Lee

N. C

. (2004). Cervical screening in the National breast and cervical cancer early detection program, 1995-2001. Obstetrics & Gynecology, 103, 564–571.doi:10.1097/01.AOG.0000115510.81613.f0

14990422

Boyd

Jung

Van Sickle

Schwesinger

Michalek

Bingener

. (2008). Music experience influences laparoscopic skills performance. JSLS: Journal of the Society of Laparoendoscopic Surgeons, 12, 292.18765055

Brennan

P. C.

Gandomkar

Ekpo

E. U.

Tapia

Trieu

P. D.

Lewis

S. J.

Wolfe

J. M.

Evans

K. K

. (2018). Radiologists can detect the ‘gist’ of breast cancer before any overt signs of cancer appear. Scientific Reports, 8, 8717.doi:10.1038/s41598-018-26100-5

29880817

Brouwers

Wiggins

M. W.

Griffin

Helton

W. S.

O’Hare

. (2017). The role of cue utilization in reducing the workload in a train control task. Ergonomics, 60, 1–55.

Brunswik

. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193–217.doi:10.1037/h0047470

14371898

Brunyé

T. T.

Carney

P. A.

Allison

K. H.

Shapiro

L. G.

Weaver

D. L.

Elmore

J. G

. (2014). Eye movements as an index of pathologist visual expertise: A pilot study. PloS One, 9, e103447.doi:10.1371/journal.pone.0103447

25084012

Cain

M. S.

Prinzmetal

Shimamura

A. P.

Landau

A. N

. (2014). Improved control of exogenous attention in action video game players. Frontiers in Psychology, 5, 69.doi:10.3389/fpsyg.2014.00069

24575061

Carmody

D. P.

Nodine

C. F.

Kundel

H. L

. (1981). Finding lung nodules with and without comparative visual scanning. Perception & Psychophysics, 29, 594–598.doi:10.3758/bf03207377

7279589

10.

Carrigan

A. J.

Curby

K. M.

Moerel

Rich

A. N

. (2019). Exploring the effect of context and expertise on attention: Is attention shifted by information in medical images? Attention, Perception & Psychophysics, 81, 1283–1296.doi:10.3758/s13414-019-01695-7

30825115

11.

Carrigan

A. J.

Magnussen

Georgiou

Curby

K. M.

Palmeri

T. J.

Wiggins

M. W

. (2020). Differentiating experience from cue utilization in radiological assessments. Human Factors, 18720820902576.doi:10.1177/0018720820902576

32150500

12.

Carrigan

A. J.

Wardle

S. G.

Rich

A. N

. (2018). Finding cancer in mammograms: If you know it’s there, do you know where? Cognitive Research: Principles and Implications, 3, 10.doi:10.1186/s41235-018-0096-5

29707615

13.

Charness

Krampe

Mayr

. (1996). The role of practice and coaching in entrepreneurial skill domains: An international comparison of life-span chess skill acquisition. In Ericsson

K. A.

(Ed.), The road to excellence: The acquisition of expert performance in the arts and sciences, sports, and games (pp. 51–80). Lawrence Erlbaum Associates, Inc.

14.

Crane

M. F.

Brouwers

Wiggins

M. W.

Loveday

Forrest

Tan

S. G. M.

Cyna

A. M

. (2018). “Experience isnt’ everything”: how emotion affects the relationship between experience and cue utilization. Human Factors, 60, 685–698.doi:10.1177/0018720818765800

29617150

15.

Croskerry

. (2009). A universal model of diagnostic reasoning. Academic Medicine, 84, 1022–1028.doi:10.1097/ACM.0b013e3181ace703

19638766

16.

Curby

K. M.

Gauthier

. (2007). A visual short-term memory advantage for faces. Psychonomic Bulletin and Review, 14, 620–628.doi:10.3758/BF03196811

17972723

17.

Curby

K. M.

Glazek

Gauthier

. (2009). A visual short-term memory advantage for objects of expertise. Journal of Experimental Psychology: Human Perception and Performance, 35, 94–107.doi:10.1037/0096-1523.35.1.94

19170473

18.

Drew

Evans

Võ

M. L. H.

Jacobson

F. L.

Wolfe

J. M

. (2013). Informatics in radiology: What can you see in a single glance and how might this guide visual search in medical images? RadioGraphics, 33, 263–274.doi:10.1148/rg.331125023

23104971

19.

Elmore

J. G.

Longton

G. M.

Carney

P. A.

Geller

B. M.

Onega

Tosteson

A. N. A.

Nelson

H. D.

Pepe

M. S.

Allison

K. H.

Schnitt

S. J.

O’Malley

F. P.

Weaver

D. L.

O’Malley

F. P

. (2015). Diagnostic concordance among pathologists interpreting breast biopsy specimens. JAMA, 313, 1122–1132.doi:10.1001/jama.2015.1405

25781441

20.

Evans

K. K.

Georgian-Smith

Tambouret

Birdwell

R. L.

Wolfe

J. M

. (2013). The GIST of the abnormal: Above-chance medical decision making in the blink of an eye. Psychonomic Bulletin & Review, 20, 1170–1175.doi:10.3758/s13423-013-0459-3

23771399

21.

Evans

K. K.

Tambouret

R. H.

Evered

Wilbur

D. C.

Wolfe

J. M

. (2011). Prevalence of abnormalities influences cytologists’ error rates in screening for cervical cancer. Archives of Pathology & Laboratory Medicine, 135, 1557–1560.doi:10.5858/arpa.2010-0739-OA

22129183

22.

Falkland

E. C.

Wiggins

M. W

. (2019). Cross-task cue utilisation and situational awareness in simulated air traffic control. Applied Ergonomics, 74, 24–30.doi:10.1016/j.apergo.2018.07.015

30487105

23.

Goldman

Sayson

Robbins

Cohn

L. H.

Bettmann

Weisberg

. (1983). The value of the autopsy in three medical eras. New England Journal of Medicine, 308, 1000–1005.doi:10.1056/NEJM198304283081704

6835306

24.

Hambrick

D. Z.

Macnamara

B. N.

Campitelli

Ullén

Mosing

M. A

. (2016). Beyond born versus made: A new look at expertise. In Psychology of learning and motivation (Vol. 64, pp. 1–55). Academic Press.

25.

Hoff

S. R.

Abrahamsen

A.-L.

Samset

J. H.

Vigeland

Klepp

Hofvind

. (2012). Breast cancer: missed interval and screening-detected cancer at full-field digital mammography and screen-film mammography-- results from a retrospective review. Radiology, 264, 378–386.doi:10.1148/radiol.12112074

22700555

26.

Kirch

Schafii

. (1996). Misdiagnosis at a university hospital in 4 medical eras report on 400 cases. Medicine, 75, 29–40.doi:10.1097/00005792-199601000-00004

27.

Klein

. (1989). Recognition-primed decisions (RPD). Advances in Man-Machine Systems, 5, 47–92.

28.

Krupinski

E. A.

Graham

A. R.

Weinstein

R. S

. (2013). Characterizing the development of visual search expertise in pathology residents viewing whole slide images. Human Pathology, 44, 357–364.doi:10.1016/j.humpath.2012.05.024

22835956

29.

Kundel

H. L.

La Follette

P. S

. (1972). Visual search patterns and experience with radiological images. Radiology, 103, 523–528.doi:10.1148/103.3.523

5022947

30.

Kundel

H. L.

Nodine

C. F

. (1975). Interpreting chest radiographs without visual search. Radiology, 116, 527–532.doi:10.1148/116.3.527

125436

31.

Kundel

H. L.

Nodine

C. F.

Carmody

. (1978). Visual scanning, pattern recognition and decision-making in pulmonary nodule detection. Investigative Radiology, 13, 175–181.doi:10.1097/00004424-197805000-00001

711391

32.

Loveday

Wiggins

M. W.

Harris

J. M.

O’Hare

Smith

. (2013). An objective approach to identifying diagnostic expertise among power system controllers. Human Factors, 55, 90–107.doi:10.1177/0018720812450911

23516796

33.

Loveday

Wiggins

M. W.

Searle

B. J

. (2014). Cue utilization and broad indicators of workplace expertise. Journal of Cognitive Engineering and Decision Making, 8, 98–113.

34.

Loveday

Wiggins

M. W.

Searle

B. J.

Festa

Schell

. (2013). The capability of static and dynamic features to distinguish competent from genuinely expert practitioners in pediatric diagnosis. Human Factors, 55, 125–137.doi:10.1177/0018720812448475

23516798

35.

Mitroff

S. R.

Biggs

A. T

. (2014). The ultra-rare-item effect: Visual search for exceedingly rare items is highly susceptible to error. Psychological Science, 25, 284–289.doi:10.1177/0956797613504221

24270463

36.

Morrison

B. W.

Wiggins

M. W.

Bond

N. W.

Tyler

M. D

. (2013). Measuring relative cue strength as a means of validating an inventory of expert offender profiling cues. Journal of Cognitive Engineering and Decision Making, 7, 211–226.

37.

Nodine

C. F.

Krupinski

E. A

. (1998). Perceptual skill, radiology expertise, and visual test performance with NINA and WALDO. Academic Radiology, 5, 603–612.doi:10.1016/S1076-6332(98)80295-X

9750889

38.

Nodine

C. F

Mello-Thoms

Samei, & E. A Krupinski . (2010). The role of expertise in radiologic image interpretation. In The handbook of medical image perception and techniques (pp. 139–156). Cambridge University Press.

39.

Norman

. (2005). Research in clinical reasoning: Past history and current trends. Medical Education, 39, 418–427.doi:10.1111/j.1365-2929.2005.02127.x

15813765

40.

Pauley

O’Hare

Wiggins

. (2009). Measuring expertise in weather-related aeronautical risk perception: The validity of the Cochran–Weiss–Shanteau (CWS) index. The International Journal of Aviation Psychology, 19, 201–216.doi:10.1080/10508410902979993

41.

Raab

S. S.

Grzybicki

D. M.

Janosky

J. E.

Zarbo

R. J.

Meier

F. A.

Jensen

Geyer

S. J

. (2005). Clinical impact and frequency of anatomic pathology errors in cancer diagnoses (Vol. 104, pp. 2205–2213). Interdisciplinary International Journal of the American Cancer Society.

42.

Schriver

A. T.

Morrow

D. G.

Wickens

C. D.

Talleur

D. A

. (2008). Expertise differences in attentional strategies related to pilot decision making. Human Factors, 50, 864–878.doi:10.1518/001872008X374974

19292010

43.

Shojania

K. G.

Burton

E. C.

McDonald

K. M.

Goldman

. (2003). Changes in rates of autopsy-detected diagnostic errors over time: A systematic review. JAMA, 289, 2849–2856.doi:10.1001/jama.289.21.2849

12783916

44.

Sturman

Wiggins

M. W.

Auton

J. C.

Helton

W. S

. (2019). Cue utilisation predicts control room operators’ performance in a sustained visual search task. Ergonomics, 1–13.

45.

Trueblood

J. S.

Holmes

W. R.

Seegmiller

A. C.

Douds

Compton

Szentirmai

Woodruff

Huang

Stratton

Eichbaum

. (2018). The impact of speed and bias on the cognitive processes of experts and novices in medical image decision-making (Vol. 3, pp. 28). Cognitive Research: Principles and Implications.

46.

Wiggins

M. W

. (2014). The role of cue utilization and adaptive interface design in the management of skilled performance in operations control. Theoretical Issues in Ergonomics Science, 15, 283–292.

47.

Wiggins

M. W

. (2021). A behaviour-based approach to the assessment of cue utilisation: Implications for situation assessment and performance. Theoretical Issues in Ergonomics Science, 22, 46–62.doi:10.1080/1463922X.2020.1758828

48.

Wiggins

M. W.

Azar

Hawken

Loveday

Newman

. (2014). Cue-utilisation typologies and pilots’ pre-flight and in-flight weather decision-making. Safety Science, 65, 118–124.doi:10.1016/j.ssci.2014.01.006

49.

Wiggins

M. W.

Brouwers

Davies

Loveday

. (2014). Trait-Based cue utilization and initial skill acquisition: Implications for models of the progression to expertise. Frontiers in Psychology, 5, 541.doi:10.3389/fpsyg.2014.00541

24917844

50.

Wiggins

M. W.

Griffin

Brouwers

. (2019). The potential role of context-related exposure in explaining differences in water safety cue utilization. Human Factors, 61, 825–838.doi:10.1177/0018720818814299

30601676

51.

Wiggins

M. W.

Loveday

Auton

. (2015). EXPERT Intensive Skills Evaluation (EXPERTise 2.0) Test. Macquarie University.

52.

Watkinson

Bristow

Auton

McMahon

C. M.

Wiggins

M. W

. (2018). Postgraduate training in audiology improves clinicians’ audiology-related cue utilisation. International Journal of Audiology, 57, 681–687.doi:10.1080/14992027.2018.1476782

29801417

53.

Weiss

D. J.

Shanteau

. (2003). Empirical assessment of expertise. Human Factors, 45, 104–116.doi:10.1518/hfes.45.1.104.27233

12916584

54.

Wiggins

O’Hare

. (2003). Weatherwise: Evaluation of a cue-based training approach for the recognition of deteriorating weather conditions during flight. Human Factors, 45, 337–345.doi:10.1518/hfes.45.2.337.27246

14529203

55.

Wiggins

M. W.

O’Hare

. (1995). Expertise in aeronautical weather-related decision making: A cross-sectional analysis of general aviation pilots. Journal of Experimental Psychology, 1, 305–320.

56.

Wiggins

M. W.

Stevens

Howard

Henley

O’Hare

. (2002). Expert, intermediate and novice performance during simulated pre-flight decision-making. Australian Journal of Psychology, 54, 162–167.

57.

Wolfe

J. M.

Horowitz

T. S.

Kenner

N. M

. (2005). Cognitive psychology: Rare items often missed in visual searches. Nature, 435, 439.doi:10.1038/435439a

15917795