Abstract
Objective
This research was designed to test whether behavioral indicators of pathology-related cue utilization were associated with performance on a diagnostic task.
Background
Across many domains, including pathology, successful diagnosis depends on pattern recognition that is supported by associations in memory in the form of cues. Previous studies have focused on the specific information or knowledge on which medical image expertise relies. The target in this study is the more general ability to identify and interpret relevant information.
Method
Data were collected from 54 histopathologists in both conference and online settings. The participants completed a pathology edition of the Expert Intensive Skills Evaluation 2.0 (EXPERTise 2.0) to establish behavioral indicators of context-related cue utilization. They also completed a separate diagnostic task designed to examine related diagnostic skills.
Results
Behavioral indicators of higher or lower cue utilization were based on the participants’ performance across five tasks. Accounting for the number of cases reported per year, higher cue utilization was associated with greater accuracy on the diagnostic task. A post hoc analysis suggested that higher cue utilization may be associated with a greater capacity to recognize low prevalence cases.
Conclusion
This study provides support for the role of cue utilization in the development and maintenance of skilled diagnosis amongst pathologists.
Application
Pathologist training needs to be structured to ensure that learners have the opportunity to form cue-based strategies and associations in memory, especially for less commonly seen diseases.
Histopathology is a medical pathology subspecialty, the role of which involves the examination of tissue to correctly identify tissue features that are normal, normal variants, and clinically significant pathological processes. Correct interpretation can require integration of morphological findings with clinical history. Pathologists visually inspect histopathological slides using a light microscope allowing for the interpretation and classification of diseases. Typically, tissue biopsy or resections referred to histopathologists from a variety of medical specialists have already been subjected to diagnostic tests to confirm pathology, thereby creating an environment where disease prevalence tends to be high. The diagnosis rendered by the histopathologist is considered the gold standard and often the starting point for determining the patient’s treatment and prognosis. This is especially critical for breast cancer, where early diagnosis results in a 97%, 5-year or more survival rate(Australian Institute of Health and Welfare, 2009).
Errors within diagnostic medicine more broadly are estimated at >10% (Goldman et al., 1983; Hoff et al., 2012; Kirch & Schafii, 1996; Shojania et al., 2003). In pathology, errors in cancer diagnosis are reported at up to 12% (Raab et al., 2005). The discrepancy between breast cancer staging among breast histopathologists is at a high of 40% (Elmore et al., 2015). Elmore et al. (2015) conducted a large study in the United States with 115 practicing pathologists and showed a 24.7% disagreement rate among pathologists interpreting breast biopsies. This rate was higher for denser breasts, and among pathologists who interpreted lower weekly case volumes, worked in smaller practices or nonacademic settings.
Unfortunately, errors that occur in pathology and diagnostic medicine can of course have grave consequences. For example, the under-interpretation of an atypical cancer may delay the required treatments (false negatives or misses) and conversely an overdiagnosis of normal tissue may lead to unnecessary invasive treatments (false positives or false alarms). These issues are complicated as the base rate for histopathology is nonnormal. In practice, there are programs in place that record a pathologist’s performance and provide the necessary feedback. For example, the Royal College of Pathologists of Australasia (RCPA) offers a Quality Assurance Program (QAP) in which pathologists must participate to maintain their registration. Given that the pathology diagnosis is final and definitive, it is important to explain the underlying processes involved in these crucial, diagnostic decisions and therefore, how errors might be prevented.
Nodine and Krupinski (1998) and, more recently, Drew et al. (2013) proposed that successful diagnosis results when medical image specialists apply their fine-tuned perceptual and cognitive skills to rapidly process a layout/scene globally to reach their decision. Consistent with this account, abnormalities in a display or image can be detected rapidly, following a brief glance (Brennan et al., 2018; Carmody et al., 1981; Carrigan et al., 2018; Charness et al., 1996; Evans et al., 2013). Amongst cytologists, performance is above chance at detecting abnormalities in micrographs of cervical smears presented for 250 ms (Evans et al., 2013). Similarly, expert radiologists and pathologists fixate more rapidly than trainees on an abnormality and do so using fewer visual saccades (Krupinski et al., 2013; Kundel & La Follette, 1972; Kundel & Nodine, 1975; Kundel et al., 1978).
Krupinski et al. (2013) characterized the scanning patterns of resident pathologists (pathologists in training) throughout their training and demonstrated their search patterns changed from a less efficient strategy (scanning around the entire visual field) to a more efficient strategy (targeted search) with experience. Further, visual expertise is associated with less time fixating on diagnostically irrelevant and nondiagnostic regions (Brunyé et al., 2014; Krupinski et al., 2013). These capabilities suggest that experts extract global properties of an image rapidly and develop a finely tuned perceptual representation that almost instantly supports the relationship between visual stimuli and a diagnosis. This ability is also likely supported by patterns in memory gained through past experience that are triggered when presented with the stimulus (Brunyé et al., 2014).
According to Nodine and Mello-Thoms (2010), when an expert considers a case, “features” that are extracted during the initial glance are compared against a template or “pattern” in memory. This process tends to be rapid, nonconscious, and domain-specific, and is dependent upon a repertoire of feature–event/object associations in memory that serve as “cues” (Brunswik, 1955; Croskerry, 2009; Klein, 1989; Wiggins, 2014, 2021; Wiggins, Brouwers, et al., 2014). Carrigan et al. (2019) demonstrated that compared with nonradiologists, radiologists perceive features that are especially diagnostic to the case (subtle nodule in a chest radiograph) as more salient, lending support to the notion that feature-based cues are integral for successful performance.
Norman (2005) demonstrated that the greater and more diverse an expert’s memory for patterns, the faster their capacity to match a visual percept against a stored pattern. For example, a pathologist reviewing a breast biopsy slide might associate the destruction of normal, lobular architecture of the breast glands (feature) with the likelihood of cancer (event). As additional cases are reviewed, a repertoire of feature–event associations is accrued, enabling more precise discrimination in future encounters. As these associations become integrated to form patterns, cognitive processes become automated, reducing the demand on cognitive resources, including working memory, while maintaining accuracy and efficiency (e.g., see Curby et al., 2009 and Curby & Gauthier, 2007, for evidence of domain-specific visual working memory capabilities afforded by expertise).
Indirect evidence of the role of cue utilization in supporting diagnostic performance is evident in experts’ extraction of a limited number of diagnostic features and their relatively rapid and efficient formulation of diagnoses in comparison to nonexperts (e.g., Carrigan et al., 2018, 2019; Brennan et al., 2018; Evans et al., 2013; Krupinski et al., 2013; Kundel & La Follette, 1972). However, the strength of these associations in memory is likely to depend on the frequency with which they are activated. For example, patterns of features that are reviewed less frequently in practice may decay through inactivity, whereas a lack of exposure to specific cases may result in patterns of features that are imprecise and/or liable to inappropriate activation, resulting in diagnostic errors. Since the activation of patterns is reliant upon cues in memory, differences in the capacity to utilize cues within a given context is likely to be an important cause of differences in diagnostic accuracy.
Ideally, assessments of cue-based associations in memory would involve the identification of a universal catalog of features that could constitute a benchmark for a “test of existence.” However, the specific feature–event associations that experts acquire are shaped by their individual experience. Moreover, visual features may be more or less salient based on individual differences in visual detection skills, so that, despite similar levels of experience, experts are unlikely to share identical cue-based relationships in memory.
Despite differences in the specific feature–event/object relationships in memory, it remains possible for practitioners to achieve similar levels of performance, particularly in the context of diagnosis. Characterizing the nature of this behavior allows more general inferences to be drawn concerning the application or utilization of cue-based associations from memory. Relatively higher cue utilization can be inferred based on the capacity of a practitioner to: (1) rapidly identify areas of concern, (2) accurately recognize classes or categories of features, (3) greater and more rapidly differentiate associations between features, (4) discriminate relevant from less relevant features during problem-solving, and (5) demonstrate a more explicit prioritization of the acquisition of features during problem orientation (Wiggins, 2014, 2021).
Comparative behavioral assessments of cue utilization have successfully differentiated diagnostic performance across a range of nonmedical contexts including rail control (Sturman et al., 2019), aviation piloting (Schriver et al., 2008; Wiggins, Azar, et al., 2014), air-traffic control (Falkland & Wiggins, 2019), water safety (Wiggins et al., 2019), software engineering (Loveday et al., 2014), and electricity network power system control (Loveday, Wiggins, Harris, et al., 2013). In medical domains, including radiology (Carrigan et al., 2020), pediatrics (Loveday, Wiggins, Searle, et al., 2013), and anesthesiology (Crane et al., 2018), behavior reflecting higher cue utilization is associated with greater levels of accuracy in both direct and indirect measures of diagnostic performance.
The primary goal of the present study was to test the contribution of cue utilization in the diagnostic performance of histopathologists, accounting for self-reported experience. In the case of histopathology, diagnosis requires that a clinician match features associated with a pattern of cells against a repertoire of features in memory. These features include tissue architecture; cell arrangement; and cytoplasmic and nuclear shape, size, color, and density.
A greater and more discriminating repertoire of feature–event associations, reflecting a greater capacity for the utilization of cues, is likely to enable a more accurate and rapid diagnosis. Therefore, it was hypothesized that relatively higher cue utilization in the context of histopathology would be associated with relatively greater accuracy on a simulated diagnostic task, independent of experience.
Method
Participants
Pilot data were collected from eight histopathologists across a range of experience. Experimental data were then collected from 54 participants (32 female) who participated in a conference setting (n = 35), or online (n = 19) and who were familiar with reporting histopathological specimens. The mean age of histopathologists in the experimental group was 46 years (SD = 12 years). Forty participants were qualified pathologists and 14 were residents (pathologists in training). Participants reported an average experience as a pathologist of 14 years (SD = 12, range 1–52), having recently read between zero (one was retired) to 14,000 cases per year (M = 3663, SD = 3554). Three participants were left-handed, all reported normal or corrected-to-normal vision, and all were naïve to the purposes of the experiment. In return for participation, participants were offered the chance to win an iPad.
Measures
The participants completed a 45-min task which included: (1) a demographic survey, (2) an assessment of cue utilization using the Expert Intensive Skills Evaluation platform (EXPERTise 2.0; Wiggins et al., 2015), and (3) a computer-based static image recognition task that provided an independent measure of diagnostic performance.
Demographic Survey: Covariates
The participants were asked to indicate their age, sex, handedness, qualifications, their number of years of experience in pathology, the number of cases performed per day, the number of cases performed per year, and how frequently they played video games or a musical instrument. If they played a musical instrument, they were asked to indicate their perceived level of proficiency. These data were recorded, as there is evidence to suggest that video game players perform at a higher level on attentional tasks (e.g., Cain et al., 2014), and that playing a musical instrument may enhance visuo-spatial abilities (Boyd et al., 2008). Using five-point Likert scales, participants were also asked about their energy levels at time of experimentation, their confidence in their role, the extent to which they considered that “pattern recognition is an inherent skill which cannot be taught,” and their self-rated performance as a pathologist.
Cue Utilization
Cue utilization was assessed using the EXPERTise 2.0 platform (Wiggins et al., 2015). EXPERTise 2.0. is an online assessment tool that incorporates five tasks that are designed to assess behavior that corresponds to the utilization of cues in a specific domain. Guided by subject-matter experts, stimuli are developed for the various tasks that represent a specific domain or context.
In the feature identification task (FIT), participants are asked to identify, as quickly as possible, an abnormality or area of concern. Participants responded to one practice and 15 randomized trials, 10 of which incorporated a single area of abnormality whereas the remainder were normal. The anatomical location of the abnormal images comprised the breast (2), lung (2), gastrointestinal tract (4), liver (2), and salivary gland (1). The response was made under free-viewing conditions and no feedback was provided. Notably, no assumptions are made about what specific features experts should use to select a region. In the FIT, higher cue utilization is generally associated with lower mean response latency, reflecting the rapid and targeted identification of an area, rather than an exhaustive search (Loveday, Wiggins, Harris, et al., 2013) (Figure 1).

Example image from the first of five tasks (the Feature Identification Task) from the pathology edition of EXPERTise 2.0. Panel (a) depicts an image presented to the participants. Panel (b) depicts the image with the abnormal area outlined in black, representing gastric adenocarcinoma.
In the feature recognition task (FRT), participants are asked to classify stimuli following a short exposure. One practice and 20 randomized, experimental scenarios were displayed for 4 s. Each scenario contained an abnormality, in response to which, participants were asked to select one from five multiple choice options to categorize the abnormality (Benign Neoplasm, Malignant Neoplasm, Reactive/Inflammatory Process, Developmental, Metabolic). No feedback was provided. In the FRT, higher cue utilization is generally associated with greater accuracy (Wiggins & O’Hare, 2003’Hare, 2003).
In the feature association task (FAT), participants are asked to assess the relatedness of pairs of pathological, text-based, task-related stimuli using a six-point Likert Scale. The relatedness of the texts within each trial varied and was reviewed by a subject-matter expert. Over one practice trial and 15 randomized experimental trials, the two terms were presented sequentially for 1500 ms after which participants were asked to rate the relatedness of the features on a six-point Likert scale (from 1 = Extremely unrelated to 6 = Extremely related). For example, “Granulation Tissue” (feature) might be followed by “Tuberculosis” (event). Seven pairs were less likely to be related (e.g., “Amyloid” [feature] followed by “Grass” [event]), and eight were more likely to be related in practice (e.g., “Bubbly Cytoplasm” [feature] followed by “Chordoma” [event]). In the FAT, higher cue utilization is typically associated with greater variance in the perceived relatedness of the terms as a function of response latency (Morrison et al., 2013).
In the feature discrimination task (FDT), participants are presented with a short, written description of a problem-oriented scenario. On the basis of this information, they are asked to select a response from four possible options based on their typical response (i.e., “What would be your first response in this situation?”). Having selected a response, participants are presented with a list of features that were incorporated in the scenario and are asked to rate, with reference to the decision, the perceived importance of the features using a 10-point Likert scale (from 1 = Not important at all to 10 = Extremely important). In the FDT, higher cue utilization is associated with greater variance across feature-relevance ratings (Pauley et al., 2009; Weiss & Shanteau, 2003), where participants are more likely to select either Not important at all (1) or Extremely important (10).
For the feature prioritization task (FPT), participants are required to prioritize information to solve a histopathology-related problem. The histopathologists were presented with incomplete, descriptive scenarios (“You are doing routine reporting. As quickly as possible, access the information below that you feel is necessary to decide on your response.”), and were provided a list of 14 information screens (feature cues), varying in relevance and presented randomly in drop-down tabs from which feature-related information could be accessed (e.g., “Clinical impression,” “Low power microscopic image”). Participants with higher cue utilization tend to access the information in a less sequential manner, reflecting their ability to prioritize the information perceived as important to their goal. Meanwhile, participants with lower levels of cue utilization tend to access the submenus in a more sequential manner, accessing features as they are listed (Wiggins & O’Hare, 1995). Higher cue utilization is associated with a lower frequency of pairs of features accessed in sequence, calculated as a proportion of the total frequency of pairs of features accessed during the scenario (Wiggins & O’Hare, 1995; Table 1).
Summary of Five Tasks Within EXPERTise 2.0
Note. EXPERTise = EXPERT Intensive Skills Evaluation; FAT = feature association task; FIT = feature identification task; FDT = feature discrimination task; FPT = feature prioritization task; FRT = feature recognition task.
The test–retest reliability of each of the tasks that comprise EXPERTise 2.0 has been demonstrated in the context of audiology (Watkinson et al., 2018), whereas that of classifications based on performance aggregated across the five tasks has been demonstrated in the context of electrical transmission power control (Loveday, Wiggins, Searle, et al., 2013). The construct validity of the EXPERTise 2.0 classifications has been demonstrated in radiology (Carrigan et al., 2020), transmission and distribution power control (Loveday, Wiggins, Harris, et al., 2013; Sturman et al., 2019), software engineering (Loveday et al., 2014), and aviation (Wiggins, Brouwers, et al., 2014), whereas predictive validity has been demonstrated in audiology (Watkinson et al., 2018).
Diagnostic Performance
Diagnostic performance was assessed using a static image recognition task. The stimuli consisted of one practice and 31 test images from various anatomical locations (e.g., breast, kidney, skin). Each image included five unique multiple-choice responses developed by two subject-matter experts. Four of the images demonstrated pathologies not included in the multiple-choice options, where the correct answer was “none of the above.” The participants were instructed: “As quickly as possible, please select the most correct answer from the options below.” The de-identified images were provided by the subject-matter experts (Figure 2).

Exemplar of an image from the static image recognition stimuli set. Answer = C.
Procedure
The research was conducted in accordance with the American Psychological Association Code of Ethics and was approved by Macquarie University Human Research Ethics Committee (Medical Sciences). Informed consent was obtained from all participants. EXPERTise 2.0 and the static image recognition tasks were counterbalanced across participants. All of the tasks within EXPERTise 2.0 were blocked in a set order, and the scenarios were randomized within task. After completing a series of demographic questions, each task began with a practice trial followed by the EXPERTise 2.0 or static image recognition tasks.
Results
Preliminary Analysis
All statistical analyses were performed using IBM Statistical Software for the Social Sciences (SPSS Version 25). Outliers defined as ±2 SD were removed from the EXPERTise raw data (four participants) leaving 50 participants for the main analysis. As the participants’ raw scores for each of the five EXPERTise 2.0 tasks were on different scales, these were standardized (z scores) and then aggregated across the five tasks. The dependent variable for the image interpretation task was accuracy (% correct).
Nineteen participants completed the study online at their convenience. The study instructions included requests to perform the study on a laptop or desktop (no tablets or cell phones), and at a time when they were least likely to be interrupted. An independent samples t-test was performed and showed no differences in performance between those participants who completed the study in a conference setting or online; t(52) = −.3, p = .77.
Demographic Survey: Covariates
A series of correlations were conducted between the demographic variables and accuracy on the static image recognition task. As expected, positive Pearson’s correlations were evident between accuracy on the static image recognition task and both the number of cases performed per day; r(52) = .39, p = .004, and the number of cases performed per year; r(52) = .37, p = .006. This suggests that the diagnostic performance measure has construct validity. A positive Spearman’s correlation was evident between self-rated confidence in their role as a pathologist and performance on the image interpretation task; rs (52) = 34, p = .01. There was no correlation between self-reported years of experience and performance on the static image recognition task; r(52) = .23, p = .097. There were no other statistically significant correlations evident (p > .05). Since level of confidence is a subjective measure, it was not included in the main analysis. As the number of cases/day and per year are essentially the same measure, only the number of cases per year was included as a potential explanatory variable (covariate) in the main analysis.
Cue Utilization
Cue utilization was established based on the participant’s performance across the five tasks included in EXPERTise 2.0. Previous research involving the five EXPERTise tasks suggests that performance can be discriminated at two levels (higher, lower) that reflect differences in behavior during tasks that demand the utilization of cues (e.g., Brouwers et al., 2017; Carrigan et al., 2020; Falkland & Wiggins, 2019). Consequently, a k-means cluster analysis was employed to delineate two groups based on the standardized scores for the five EXPERTise tasks.
An inspection of the centroids that emerged following the cluster analysis was used to distinguish higher from lower performance and ensure that this pattern was uniform across the five tasks. Consistent with the hypothesis, the cluster analysis in the present study reflected uniformly, a pattern of centroids that distinguished higher from lower performance. In comparison to lower cue utilization, higher cue utilization is characterized by faster responses in the FIT, greater accuracy in the FRT, greater variance as a proportion of response latency in the FAT, greater variance in the FDT, and a lower proportion of features accessed in the sequence in which they were presented in the FPT.
Eighteen participants comprised a group whose behavior was consistent with lower cue utilization, whereas 32 participants comprised a group whose behavior reflected higher cue utilization (Table 2). This delineation is intended to take into account nonlinear changes that occur in the rate at which cue-based associations are identified and retained in memory. There were proportionally more trainees assigned to the lower cue utilization group (Table 3). Four participants did not complete all of the EXPERTise 2.0 tasks, so were not assigned a cluster membership. Notably, cue utilization in this case, is a relative, not absolute, measure. Therefore, those participants assigned to Cluster 1 show patterns consistent with relatively lower cue utilization compared, whereas those participants assigned to Cluster 2 showed patterns consistent with relatively higher cue utilization.
Centroids for the Standardized Scores for Each of the EXPERTise 2.0 Five Tasks Distributed Across the Two Groups That Were Delineated by a K-Means Cluster Analysis
Frequency of Trainee and Qualified Pathologists Per Cluster Group
Diagnostic Performance
Diagnostic performance was measured as percentage accuracy on the static image recognition task; mean accuracy = 66.55% (SD = 12.4). Single-sample t-tests on mean accuracy relative to chance, where chance was 20% (one in five possible responses), showed that pathologists performed above chance on the image interpretation task; t(53) = 27.59, p < .0001.
To address our first aim, a between-groups analysis of covariance (ANCOVA) was performed, incorporating two levels of cue utilization (higher/lower) as the independent variable, and the number of cases per year included as a covariate, to test its contribution to diagnostic performance; F(1,47) = 8.4, p = .006,

Percentage accuracy mean scores on the static image recognition task, distributed across cue utilization, controlling for cases read per year, for 50 histopathologists. Error bars represent 95% confidence intervals. Note. Chance is 20%.
An independent samples t-test, incorporating cue utilization as the grouping variable, failed to reveal a statistically significant difference between the groups in the number of cases read per year, t(48) = −1.7, p = .1. This suggests that cue utilization and cases per year read by pathologists contribute independently to the difference in performance on the diagnostic task.
Although the histopathologists performed above chance on the static image recognition task, their performance as a group was far from ceiling (M = 66.55%). It is possible that these results reflect that the sample included pathologists in training (14/40) who are yet to become familiar with the pathologies included in the image set. Alternatively, though the stimuli were carefully selected by two subject-matter experts, it may be the case that the performance of participants reflects the inherent variability in real-world images, and possibly within the image set. These factors include the difference in prevalence of specific types of cases in different practice settings.
In a post hoc, exploratory analysis, we first tested whether disease frequency and the rate at which histopathologists were likely to have encountered similar cases previously in their practice may be related to their performance on the static image recognition task. Of the cases in the set, 12 of 31 cases were classified by a subject-matter expert, and cross-checked with available epidemiology data, as “rare,” whereas 19 of 31 cases were considered “common.” The case with the lowest accuracy across the cohort was a rarely encountered disease occurring in a skin biopsy specimen, where six of the 50 pathologists (5/6 with higher cue utilization and 1/6 with lower cue utilization) diagnosed the case accurately. For this condition, there is a possibility that the low accuracy resulted from the relative unfamiliarity of the case.
A mixed-repeated ANCOVA testing the accuracy data from the static image recognition task, with disease frequency (common/rare) as a within-subjects variable, cue utilization (higher/lower) comprising a between-subjects variable, and the estimated number of cases per year as a covariate, revealed a main effect for disease frequency, F(1,47) = 66.30, p < .0001,
A main effect was also evident for cue utilization, F(1,47) = 8.1, p = .007,

Percentage accuracy mean scores on the static image recognition task distributed across cue utilization, for the common and rare cases, controlling for cases read per year, for 50 histopathologists. Error bars represent 95% confidence intervals.
For a small number of Figure (4), the correct response was “none of the above,” since the pathology depicted was not listed as one of the other alternatives. Therefore, we explored whether the relationship between cue utilization and accuracy differed, depending upon the presence or absence of a pathology listed as an alternative. A mixed-repeated ANCOVA, incorporating the presence or absence of listed pathology (pathology listed/pathology not-listed) as a within-groupsvariable, two levels of cue utilization (higher/lower) as a between-subjects variable, and the estimated number of cases per year as a covariate, tested whether there were differences in the accuracy of responses.
The results revealed a statistically significant interaction between cue utilization and the presence or absence of a listed pathology, F(1,47) = 7.42, p = .009,

Percentage mean scores on the static image recognition task, distributed across cue utilization, for the pathology listed and not-listed cases, controlling for cases read per year, for 50 histopathologists. Error bars represent 95% confidence intervals.
In combination, the data suggest that differences in cue utilization contribute to performance on a diagnostic task, over and above the estimated number of cases read per year, and that these results may in part, be driven by disease prevalence both in our image set, and epidemiologically, along with the ability to recognize when a pathology is present or not present in an image.
Discussion
The overall aim of this study was to investigate whether diagnostic assessments of cue utilization differentiate performance in the context of histopathology, accounting for self-reported experience (number of pathology cases interpreted per year). Consistent with the hypothesis, higher cue utilization was associated with greater accuracy on a simulated diagnostic interpretation task requiring evaluation of tissue with significant abnormalities. A positive relationship was also evident between the estimated number of cases read per year and diagnostic accuracy. However, no relationship was evident between the estimated number of cases read per year and cue utilization. Consistent with previous research, this suggests that cue utilization and experience, in the form of the self-reported estimates of the number of cases read per year, contribute independently to diagnostic accuracy (Carrigan et al., 2020; Crane et al., 2018; Loveday, Wiggins, Searle, et al., 2013; Sturman et al., 2019).
This study provides evidence for differences on diagnostic performance based on cue utilization, independent of the number of cases read per year. However, there are other factors to consider that also contribute to expertise, including personality and genetic and developmental factors (Hambrick et al., 2016), that were not evaluated in the current study. Further, there is evidence to suggest that some pathologists have an inherent domain-general ability for visual recognition, independent of experience (e.g., Trueblood et al., 2018). It is also likely that in practice, if the disease frequency was inconsistent, “experience” would begin to affect diagnostic outcomes and accuracy, independent of the number of cases read per year. Future research examining these issues is warranted, particularly comparing extreme values of cue utilization, where differences and variability in performance across the tasks can be maximized, and equal sample sizes can be assured. Importantly, the findings in the present study are consistent with Carrigan et al. (2020), who showed that among radiologists, cue utilization was positively associated with performance on a diagnostic task, independent of experience.
Although performance on the diagnostic task was above chance, it should be noted that overall accuracy was 66.55%. This is unsurprising as none of the participants reported being accustomed to diagnostic reporting from histopathology images on screens. The simulated diagnostic task presented a target tissue abnormality in a setting that was unfamiliar, rather than the normal pathological examination using an optical light microscope to view multiple fields at multiple magnifications. Although many laboratories are moving from optical microscopy to digital whole slide images, this was not the case for the current sample. Furthermore, the experimental setting lacked the opportunity for corroboration from colleagues or other resources such as textbooks or online images that would normally be available in routine practice when diagnostic uncertainty is encountered. Although presenting its own challenges, our diagnostic setting reproduces the essence of the routine practice setting. Unfortunately, the routine practice setting is not yet amenable to the study of activation of highly-refined cue-based associations in memory and how these associations are able to better discriminate differences in capability.
The post hoc analysis suggested that the types of cases and the prevalence of cases to which pathologists are exposed may be associated with differences in accuracy. Specifically, in comparison to participants with lower cue utilization, participants with higher cue utilization demonstrated relatively greater accuracy for less common pathologies, and where the pathology was not listed as a specific option for selection (i.e., the correct option was “none of the above”). This has important training implications given that discordance errors are higher for histopathologists who interpret lower weekly case volumes, worked in smaller practices or nonacademic settings (Elmore et al., 2015).
Since improvements in cue utilization are thought to occur through the internalization of feature–event associations in memory, learning opportunities are necessary that enable, as quickly as possible, the extraction of features and their association with specific pathologies in memory are necessary (Wiggins, 2014). In the absence of supervised operational exposure, structured simulation such as reviewing and testing using image libraries with feedback offers an alternative with the potential to facilitate the acquisition of cue-based associations at a rate faster and more efficiently than might be afforded through work-based exposure. With further research, evaluations based on normative assessments of performance could be employed to ensure successive improvements in cue utilization.
Opportunities for future research also lie in examining whether cue utilization moderates performance differently depending upon the prevalence of cases. In the current study, our exploratory findings suggested that disease frequency or prevalence was related to diagnostic performance. Low target prevalence (i.e., rare or uncommon targets, few cases are truly abnormal) generally results in elevated miss rates (Evans et al., 2011; Mitroff & Biggs, 2014; Wolfe et al., 2005). Evans et al. (2011) demonstrated this “low prevalence effect” in a cytology screening context where in the United States, the epidemiological prevalence of cervical cancer is around 1% (Benard et al., 2004). By contrast, disease prevalence in the context of histopathology tends to be higher.
Typically, in a breast histopathology setting, prior to the evaluation by a histopathologist, the patient and tissue samples have already been subjected to diagnostic testing and procedures such as mammograms, prostate-specific antigen (PSA) testing for prostate cancer, polyps biopsied during routine colonoscopies, or surgical excision. Further, specimens are accompanied by clinical information and surgical reports. This is likely to create an environment that potentially primes the confirmation of an abnormality. Testing the existence of “abnormality priming” is necessary to consider the effects of prevalence on pathologists’ cue utilization on a longer-term basis.
Conclusion
This study was designed to test the role of cue utilization in the diagnostic performance of histopathologists. Accounting for the self-reported number of cases reported per year, higher cue utilization was associated with higher accuracy on an independent, simulated diagnostic task. This effect was greatest where the pathology depicted on the image was not listed as a potential classification option and “none of the above” was the accurate response.
Despite the limitations associated with the use of a simulated task that potentially lacks ecological validity, the results raise important issues concerning the relationship between motivation and cue utilization, particularly in response to challenging tasks. The outcomes also highlight the value of an approach that does not assume that experts use the same cues or features during problem resolution. Our exploratory image analyses showed that factors such as disease prevalence are also likely to influence accuracy. Crucially, the outcomes of the study have important implications for the development and maintenance of skills amongst pathologists: exposure sufficient to acquire and utilize cue-based strategies appears to be associated with the highest level of performance on the diagnostic task.
Key Points
In pathology, accurate diagnosis involves accurate, reliable and timely responses.
Pathological cue utilization is positively associated with performance on a diagnostic task, independent of the number of cases reported per year.
A post hoc analysis suggested that higher cue utilization may be associated with a greater capacity to recognize rarer and low prevalence cases.
Targeted training that encourages the acquisition and utilization of cue-based strategies is recommended.
Footnotes
Acknowledgments
For data collection, we acknowledge the support of The Royal College of Pathologists of Australasia and the Royal College of Pathologists of Australasia Quality Assurance Program. This work is supported by an ARC Discovery Project grant awarded to M.W.W (CI) K.M.C., A.G., and T.P. (DP5056000).
Note
Author Biographies
Ann J. Carrigan is a postdoctoral researcher at Macquarie University, Australia, and works in radiology as a medical sonographer. Her research focuses on medical image perception and expertise in diagnostic medicine.
Amanda Charlton is an anatomical pathologist at Auckland City Hospital, New Zealand; her clinical work involves making diagnoses on tissue biopsies. Her interests include image perception, medical education, artificial intelligence, and bias in diagnostic pathology.
Elliott Foucar is an adjunct professor in the Department of Pathology, University of New Mexico School of Medicine, USA. His major areas of interest are general diagnostic surgical pathology, dermatopathology, cytopathology, and autopsy pathology.
Mark W. Wiggins is a professor of organizational psychology at Macquarie University, Australia. His research focuses on skill acquisition and expertise in advanced technology environments.
Andrew Georgiou is a professor of diagnostic informatics at Macquarie University, Australia. His research focuses on diagnostic and organizational communication within the healthcare system.
Thomas J. Palmeri is a professor of psychology, ophthalmology, and visual sciences at Vanderbilt University, USA. His research focuses on perceptual categorization, category learning, visual learning, visual memory, perceptual expertise, object and face recognition, automaticity, perceptual decision making, mathematical, computational, and neural modeling.
Kim M. Curby is an associate professor of cognitive psychology at Macquarie University, Australia. Her research focuses on the cognitive mechanisms and neural substrates that underlie expert performance in visually based domains.
