Abstract
Background:
The ability to understand and make use of object-scene relationships are critical for object and scene recognition.
Objective:
The current study assessed whether patients with mild cognitive impairment (MCI), possibly in the preclinical phase of Alzheimer’s disease, exhibited impairment in processing contextual information in scene and object recognition.
Methods:
In Experiment 1, subjects viewed images of foreground objects in either semantic consistent or inconsistent scenes under no time pressure, and they verbally reported the names of foreground objects and backgrounds. Experiment 2 replicated Experiment 1, except that subjects were required to name scene first. Experiment 3 examined object and scene recognition accuracy baselines, recognition difficulty, familiarity with objects/scenes, and object-scene consistency judgements.
Results:
There were contextual consistency effects on scene recognition for MCI and healthy subjects, regardless of response sequence. Scenes were recognized more accurately under the consistent condition than the inconsistent condition. Additionally, MCI patients were more susceptible to incongruent contextual information, possibly due to inhibitory deficits or over-dependence on semantic knowledge. However, no significant differences between MCI and healthy subjects were observed in consistency judgement, recognition accuracy, recognition difficulty and familiarity rating, suggesting no significant impairment in object and scene knowledge among MCI subjects.
Conclusions:
The study indicates that MCI patients retain relatively intact contextual processing ability but may exhibit inhibitory deficits or over-reliance on semantic knowledge.
Keywords
INTRODUCTION
In the real world, objects are usually embedded in particular scenes, so that object-scene relationships have a critical impact on object and scene recognition. For example, a traffic light is recognized faster when it appears in the street than in an apartment, and an air conditioner unit is identified faster when it is seen in an apartment than in the street. Recent studies have shown that the ability of processing contextual information and object-scene relationships is impaired by neurodegenerative diseases such as Alzheimer’s disease (AD) [1–4] due to dysfunctions of the brain network related to processing object-context relationships [5–7]. However, it remains unclear how early on in the disease the deficit appears. Therefore, in this study we define contextual information as semantic knowledge of object-scene associations and focus on individuals with mild cognitive impairment (MCI) who may be in the preclinical phase of AD [8–10]. Our objective is to examine the alterations in their abilities to make use of contextual information in object and scene recognition.
Contextual information in object and scene recognition
A great deal of research has demonstrated that objects and scenes are processed interactively, so that their recognition is modulated by the semantic consistency between them [11–22]. Previous studies have shown that both objects and scenes are processed faster and more accurately when they are consistent than when inconsistent in an image [11, 12, 22–24]. Neuropsychological studies have also discovered increased brain activities in contextually inconsistent conditions, indicating that there is an enhanced processing cost associated with violations of object-scene consistency [25–29]. According to the perceptual schema model and object selection model, the observed scene consistency effects are driven by the formulation of context-based expectations [14, 21, 24, 30]. However, the two models diverge in their views regarding the levels at which contextual-based predictions impact object recognition, as well as the bidirectional nature of the interactions between object and scene recognition [14, 21, 24, 30].
The perceptual schema model proposes that semantic expectations sensitize the initial object perceptual analysis [15–17]. In recognition, people can quickly extract scene gist after scanning an image and construct predictions about possible objects that may appear in a given situation, which speed up the initial perceptual analysis of scene-consistent objects [31]. However, as semantic predictions are believed to be generated prior to the construction of visual descriptions for objects, object recognition can be benefited from the presence of a consistent scene, while a semantic consistent foreground object has little influence to scene recognition.
However, the object selection model (priming model) posits that semantic prediction affects object recognition at the percept-to-representation matching stage, after the generation of visual representations for objects [14, 18–20, 30]. According to this model, semantic consistency between object and scene would lower concept activation thresholds, when matching visual descriptions to stored memory representations. As a result, less perceptual evidence is required to achieve final object identification. Moreover, such model proposes a bidirectional interplay between objects and scenes. It is suggested that partial recognition of an object also activates the probable scenes in which it may appear, which lowering the criterion at matching stage in scene recognition.
The functional isolation model, on contrast, postulates that objects and scenes are processed independently, and the observed contextual consistency effects are attributed to guessing errors or response bias at the post-perceptual stage [32, 33]. For example, Hollingworth and Henderson [32] discovered that the contextual consistency effect diminished after controlling for response bias. Moreover, brain research has revealed separate neural pathways for object and scene processing. Repetitive transcranial magnetic stimulation (TMS) to the left lateral occipital cortex or right occipital face area has been shown to impede object identification but improve scene recognition [34, 35]. This model highlights the independence of object and scene recognition, primarily driven by bottom-up sensory processing [33]. Consequently, the functional isolation model predicts relatively little influence from contextual information through top-down modulation.
Despite the controversial theories and mixed evidence regarding to the contextual consistency effect, most studies in this field have primarily focused on young healthy populations [11–36], with relatively limited attention given to senior individuals [37], especially those with cognitive impairments or memory disorders [2, 38, 39]. Hence, it remains unclear whether individuals with MCI disease also demonstrate contextual consistency effect.
Alzheimer’s disease and the ability to process contextual information
Impairments in object and scene processing are widely documented in AD patients [40–46]. For example, AD patients have been shown to have deficits in object relating tasks, such as word-picture matching [46, 47], category fluency [48–50], object categorization [51], and object naming [46, 52, 53], as well as scene tasks, such as scene categorization [36], scene discrimination [54], scene construction [55], scene search [38, 41], and scene familiarity judgement [1]. However, only few studies have investigated scene consistency effects or contextual influences in mild and moderate AD, and a consensus has yet to be reached [2, 3, 38, 39].
Most researchers suggest the ability to utilize contextual information is compromised in mild and moderate AD [2, 3, 39]. For instance, Boucart et al. [2] assessed object categorization for animals in either their natural environment or on a gray background. They discovered that both healthy and AD seniors categorized objects more accurately in their natural environment than on a gray background, but the contextual facilitation effects were statistically significant only for the healthy subjects. In addition, Daffner et al. [39] reported that AD patients spent less time and made fewer saccades towards inconsistent objects and scenes. Moreover, Lenoble et al. [3] further assessed the ability to process scene consistency in moderate AD patients by saccadic choice task. In the experiment, they had participants view objects in semantic consistent or inconsistent scenes. The results showed that AD patients judged contextually consistent images with good accuracy; however, they were significantly slower than the healthy controls, and the accuracy of their first saccade toward contextually consistent images was only about chance level (Experiment 3). These findings suggest that individuals with AD exhibit deficits in understanding and in making use of contextual consistency information.
On the contrary, an eye-tracking study by Ramzaoui et al. [38] argued against this view. In the study, subjects viewed a picture cue or a word cue of a target object, and they were then exposed to a contextually consistent or inconsistent image containing the target object. Subjects needed to fix their eye-movements on the target object in the scene and press a button as soon as fixation was done. The results showed a similar contextual benefit for both the AD group and healthy controls: there was a higher proportion of first saccades directed to the target, when the target was semantically consistent with the cue. The result suggests a preserved top-down guidance of object-scene association in AD patients. Whereas, it should be noted that, in Ramzaoui et al.’s [38] study, the semantic consistency effect in AD patients was only observed during the initial scene inspection (first saccades), and diminished during subsequent scanning and verification phases, for both AD patients and controls. Thus, the study conducted by Ramzaoui et al. [38] may fail to observe a deficit in processing object-scene semantic associations, as the semantic consistency effect was absent overall, except during the search initiation phase.
As there is little consensus as to impairment in contextual information processing in moderate and mild AD patients, more studies should be carried out in this region to resolve current controversies. In addition, previous studies in this area have only focused on clinically diagnosed AD patients, without considering MCI patients who are at the transitional stage between normal aging and clinically identifiable AD [8–10]. At this stage, MCI patients have manifested subtle semantic memory deficits [56–58] and showed corresponding brain dysfunction [59–62]. Therefore, it is possible that the MCI patients exhibit deficits in processing contextual information and in understanding object-scene relationships. In addition, previous studies of scene consistency effects in AD patients have only focused on the contextual influence on object recognition, with little attention to its influence on scene recognition [2, 3, 38, 39]. Hence, it is worthy to examine whether semantic consistency affects scene recognition in MCI patients.
To address the aforementioned questions, the present study sought to assess the existence of how contextual consistency affects both object and scene recognition in MCI patients. To do this, we presented both MCI patients and healthy controls with objects in either contextually congruent or incongruent scenes (see Fig. 1), and they were required to report the names of the foreground object and background scene of each image (Exp.1 and Exp.2). In addition, to construct a baseline for object and scene recognition, a questionnaire was designed to examine object and scene recognition accuracy when no contextual information is provided (Exp.3). Recognition difficulty, familiarity with the objects and scenes and their semantic consistency judgement were also collected. Following previous studies, we expected to observe a general group difference in both object and scene recognition accuracy, with a decline in semantic memory in MCI patients. Moreover, we also expected a decrease in contextual influence in both object and scene recognition in MCI patients, which would indicate the deterioration of object-scene relationship knowledge.

Examples of consistent scenes (A) and inconsistent scenes (B).
EXPERIMENT 1
Method
Participants
A total of 63 participants were recruited from the Second Hospital of Shandong University, with 34 participants in the control group and 24 in the MCI group. In addition, 5 participants diagnosed with mild dementia were tested but removed from the data analysis. Subjects in the MCI and control groups were randomly allocated to two experimental lists. There were 15 subjects in list 1 and 19 in list 2 for control group, and there were 12 participants in both lists for the MCI group. This study was approved by the ethics committee of the Second Hospital of Shandong University, and all subjects provided their informed consent prior to their participation. They were given small gifts as participation rewards.
Inclusion criteria for MCI and control groups
All subjects were Chinese native speakers above 60 years old with more than 5 years of education and without history of neurological or psychiatric diseases. In addition, the two groups were matched in age, gender, and years of education (see Table 1).
Means, Standard Deviations, and Significance Levels of Demographic information and Scores on Neuropsychological Tests of Experiment 1
As the Montreal Cognitive Assessment (MoCA) is widely used and has been suggested to be superior to the Mini-Mental State Examination (MMSE) for detecting MCI [63–66], the diagnoses of probable MCI and healthy control were based on the results of a 30-point Chinese version of MoCA. According to the optimal cutoff scores proposed by Huang et al. [67], the MoCA cut-off scores for detecting MCI in the three groups (low education,< = 6 years; middle education, 7–12 years; high education,>12 years) were< = 19,< = 22, and< = 24, respectively, and the cut-off scores for detecting mild dementia were< = 13,< = 15, and< = 16, respectively.
The sample size of this study was determined based on a power analysis of a similar behavior study (Cohen’s d = 0.5) [21]. Both studies examined contextual facilitating effect on recognition accuracy with a similar amount of items. In addition, our aim was to achieve a power greater than 80% with alpha of 0.05 (two-tailed). In addition, according to Lenth [68], to achieve this power and the required effect size, a minimum of 55 subjects is needed in a repeated experiment design. Therefore, we aimed to recruit about 55 participants.
Materials
72 color images constructed from 36 foreground objects and 36 background scenes were used as experimental stimuli. Each image consisted of a foreground object presented with a background setting, and they were either semantically consistent (e.g., a hammer in a factory) or inconsistent (e.g., a skipping rope in a factory) (see Fig. 1). Following Davenport and Potter’s [11] experimental design, foreground objects were grouped as pairs. By exchanging the consistent background settings of the objects in a pair, we constructed the inconsistent scenes. For example, “factory” is both the consistent scene for “hammer” and the inconsistent scene for “skipping rope”, and “playground” is the consistent scene for “skipping rope” and the inconsistent scene for “hammer”. Hence, half of the experimental images are in the scene consistent condition and the other half are in the scene inconsistent condition.
All the foreground objects were tools, such as a hammer and skipping rope, which were displayed in red-framed boxes (227×227 pixels) in the center of the image. The boxes covered 16% of the background scene (567×567 pixels). All background scenes were chosen and edited to avoid the inclusion of foreground objects or similar types. Additionally, to rule out the influence of visual similarities between objects and scenes, the objects and scenes were selected from separate source images and then combined into one image using Adobe Photoshop CS 9.0. Objects and scenes were selected from identification flashcards for children [69] or taken by the experimenters.
Design and procedure
We applied a counter-balanced Latin square design in this study to allocate items. All 72 experimental images were assigned to two lists and presented in a randomized order. No filler was used in the experiment. In addition, each list consisted of all 36 foreground objects and 36 backgrounds, and there were equal numbers of images in the consistent and inconsistent conditions. Therefore, each participant would see all 36 scenes and 36 objects only once in either semantically consistent or inconsistent combinations.
In the study, participants finished MoCA and MMSE tests and had their eyesight adjusted to normal prior to the experiment. We screened participants for visual acuity first, and then participants who did not meet the criteria were provided with corrective glasses. This will ensure that they have normal vision and can accurately perceive the stimuli presented in the study. Next, after a brief introduction to the experimental task, they would view the experimental images one after another in random order. Upon viewing each image, they needed to verbally report the names of the foreground objects and background scenes of each presented image, and their voices were recorded. It should be noted that subjects can take as long as they want to view the images and organize their answers. The experimental images were presented on the screen until subjects finished their responses. Moreover, if subjects were unable to provide a name for a foreground object or background scene, they were encouraged to describe the functions of the object or scene or their experiences with the object or scene. If they were unable to recognize the object or scene, they reported “I don’t know” as their answer. The experiment took approximately 45 minutes, and subjects were allowed to have a rest after finishing the MoCA and MMSE tests.
All pictures were presented using Psychopy 3.0 [70]. They were displayed on a Windows computer with a 400-MHz processor. The resolution of the Monitor was 1920×1080 pixels, with a refresh rate of 75 Hz. The computer screen was placed at approximately 60 cm viewing distance.
Data treatment and statistical analysis
Two coders blind to the experimental condition and the experiment design transcribed and scored all the responses. Their disagreements in rating were decided by a third coder. By Cohen’s kappa, the inter-rater agreement was 0.92, indicating a relatively high agreement between the two coders. As we are interested in the ability to recognize objects and scenes, following Sastyin et al.’s [22] study, if an answer was one of the names or a synonym of the foreground object or scene, it was scored as 1, correct. In addition, if a subject failed to provide a correct name for an object or scene, but they could still accurately describe the function of the object or scene, it was scored as 1, correct as well (e.g., “spoon”, the thing for drinking soup). However, if participants reported they did not know the answer or their answer was incorrect (e.g., “bottle” for “spoon”), it was scored as 0. When the answer showed a general level of descriptiveness (e.g., “vehicle” for “motor tricycle”), it was marked as 0, incorrect as well.
Results
We carried out a by-subject ANOVA analysis on the percentage of recognition accuracy, with the scene consistent condition (Consistent versus Inconsistent), group (MCI versus Control), and item type (Object recognition versus Scene recognition) as independent variables (see Table 2 and Fig. 3). The results revealed significant main effects for the consistency condition and item type, and a marginal significant main effect for group. The findings demonstrate higher recognition accuracy in the consistent condition than in the inconsistent condition and in object recognition than in scene recognition for the healthy control group compared to the MCI group.
ANOVA Summary of by subject analysis for consistency×group×item type for percentage of recognition accuracy
There was also a significant interaction between consistency and item type. A further paired T-test suggested a consistency effect for scene recognition (t(57)=7.43, p < 0.001), but not for object recognition (p > 0.05). In addition, the interaction between item type and group was also significant. A further independent T-test suggested a significant higher recognition accuracy among the healthy controls than MCI patients for scene recognition (t(114) = – 2.74, p = 0.008), but not for object recognition (t(114) = – 0.48, p = 0.63). Additionally, the results showed a significant interaction between group and consistency, and independent T-tests revealed a significant group difference in the inconsistent condition (t(114) = – 2.51, p = 0.014), but not in the consistent condition (t(114) = – 0.98, p = 0.33). Moreover, there was a three-way interaction between type, group, and consistency. The results of independent T-tests and effects plots revealed a group difference for the scene recognition task under the scene inconsistent condition (t(56) = 3.17, p = 0.002), but this was smaller or absent in the consistent condition and for object recognition (ps > 0.05).
Following Davenport et al. [11], by-item analyses were performed separately for scene and object recognition, with the contextual condition and group as independent variables. Similar to the findings of cross-subject analyses, the by-item analysis on scene recognition revealed a main effect for consistency (F(1,70) = 25.09, p < 0.001, ηp2 = 0.26), a main effect for group (F(1,70) = 5.08, p = 0.027, ηp2 = 0.07), and a marginal significant interaction between consistency and group (F(1,70) = 3.38, p = 0.07,ηp2 = 0.046). In contrast, all the main effects and interactions remained insignificant for object recognition (ps > 0.05).
Discussion
The results of Experiment 1 show a three-way interaction between group, consistency, and item type: a group difference only existed for scene recognition when the scene was semantically inconsistent to its foreground object. This finding may indicate that MCI patients were more likely to be distracted by inconsistent contextual information in scene identification, which leads them to have lower scene recognition accuracy than healthy controls. However, it is also possible that MCI patients benefited from congruent contextual information in scene recognition, which minimizes the difference between the MCI and control group in scene recognition accuracy. Therefore, we address these two possibilities in Experiment 2 by testing recognition accuracy for objects and scenes when they are presented alone with no contextual information.
In addition, as this experiment reveals a three-way interaction, other two-way interactions and main effects are theoretically unable to be interpreted. Nonetheless, with further analysis and effects plots, we believe the two-way interaction between consistency and item type still makes sense in the respect that contextual consistency effects were present in scene recognition for both MCI (by-item: t(35) = 4.25, p < 0.001; by-subject: t(23) = 6.54, p < 0.001) and control groups (by-item: t(35) = 2.68, p = 0.01; by-subject: t(33) = 4.69, p < 0.001), while they were absent for object recognition (ps > 0.05). In this way, the findings may suggest that contextual effects vary as a function of target type.
The asymmetric contextual effects may be attributed to the response sequence, that subjects tended to report the name of the foreground object first and then respond to the background scenes. In our experiment, the foreground objects were presented in a red-framed box at the center of each experimental image, making them more salient. As a result, subjects tended to prioritize responding to the foreground objects and initially disregarded the backgrounds, though they can make response in either order. Consequently, while recognizing the foreground objects, the probable scenes in which the objects may appear would automatically be activated and interfered with the subsequent scene recognition. To examine this possibility, we coded the original data and discovered around 95% of trails involved reporting the foreground object before the background scene. To test the influence of response sequence on contextual consistency, we specifically instructed subjects to report the backgrounds before the foreground objects in Experiment 2.
Additionally, the differential contextual consistency for object and scene recognition observed in Experiment 1 could potentially be explained by varying levels of impairments in object and scene knowledge. Therefore, to gain a comprehensive understanding of possible impairments of object and scene knowledge among individuals with MCI, Experiment 3 was conducted to investigate object and scene familiarity, recognition accuracy, recognition difficulty, when objects and scenes were presented in isolation. In addition, the explicit judgment of contextual consistency was also examined in Experiment 3.
EXPERIMENT 2
Methods
Participants
A total of 59 participants were recruited for this experiment, and 4 of them being tested but excluded from the data analysis due to a diagnosis of mild dementia. The subjects in this experiment were carefully matched in terms of age, education, MoCA and MMSE scores to the subjects who participated in Experiment 1. In addition, 15 healthy subjects and 13 MCI patients were allocated to list1, and the remaining subjects were assigned to list2. The demographics and neuropsychological information of the MCI and control groups were matched to ensure comparability (see Table 3).
Means, Standard Deviations, and Significance Levels of Demographic information and Scores on Neuropsychological Tests of Experiment 2
Inclusion criteria
The inclusions criteria were the same as those used in Experiment 1.
Materials, procedure, data treatment and statistical analysis
Materials and procedure were identical to those used in Experiment 1. Data treatment was similar to that of Experiment 1. The reliability between two coders reached 97%.
Results
A by-subject ANOVA analysis was carried out on recognition accuracy, with scene consistency (Consistent versus Inconsistent), group (MCI versus Control), and item type (Object recognition versus Scene recognition) as the independent variables (see Table 4 and Fig. 3). The results showed significant main effects of consistency condition, item type, and group. The findings demonstrate decreased recognition accuracy in the inconsistent condition compared to the consistent condition and in scene recognition compared to object recognition for the MCI patients relative to the healthy controls.
ANOVA Summary of Experiment 2: by subject analysis for consistency×group×item type for percentage of recognition accuracy
There was also a significant interaction between consistency and group. Further T-tests revealed a significant difference in recognition accuracy between MCI and healthy subjects for the inconsistent images (object recognition: t(53) = 2.12, p = 0.04; scene recognition: t(53) = 2.52, p = 0.02), but not for the consistent images (ps > 0.05), whereas, the interaction between item type and consistency, item type and group, and the three-way interaction between type, group, and consistency did not reach statistical significance.
By-item analyses were performed separately for object and scene recognition, with contextual condition and group as independent variables. Consistent with the results of the by-subject analyses, there were main effects for consistency (object recognition: F(1,70) = 26.82, p < 0.001, ηp2 = 0.28; scene recognition: F(1,70) = 24.69, p < 0.001,ηp2 = 0.26). In addition, a significant interaction between consistency and group was also observed (object recognition: F(1,70) = 4.16, p = 0.05, ηp2 = 0.06; scene recognition: F(1,70) = 4.97, p = 0.03, ηp2 = 0.07). However, the main effect of group and other interactions remained insignificant (ps > 0.05).
Discussion
To examine whether the observed effects of contextual consistency on scene recognition in Experiment 1 were merely a result of response sequence, Experiment 2 required subjects to first report the name of the background scene before responding to the foreground objects. Despite this manipulation, the contextual consistency effects on scene recognition remained significant, indicating that the observed results cannot be solely attributed to the response sequence. However, the alteration in response sequence did result in contextual consistency effects on object recognition, which were absent in Experiment 1. Thus, the response sequence may contribute to the contextual consistency effects on object recognition. The asymmetrical contextual consistency effects on object and scene recognition could be attributed to the arrangement of the foreground object and background scene, which will be discussed in detail in the general discussion.
In addition, similar to Experiment 1, the results of Experiment 2 showed an interaction between group and consistency: a group difference observed only in inconsistent images, but not for the consistent images. This finding suggests that MCI patients were more susceptible to the interference of inconsistent contextual information in scene identification, resulting in lower scene recognition accuracy compared to healthy controls. However, it is also possible that MCI patients benefit more from congruent contextual information in scene recognition, which reduces the difference in scene recognition accuracy between the MCI and control groups. To investigate the two possibilities, we conducted Experiment 3, where we tested recognition accuracy for objects and scenes presented alone (baseline condition).
EXPERIMENT 3
Methods
Participants
57 participants were recruited for this experiment, who were matched in age, education, and MoCA scores to subjects participating in Experiment 1 and Experiment 2. One participant was excluded from data analysis for very poor performance in recognition (accuracy below 60%). Therefore, there were in total 56 subjects who took part in Experiment 3. The demographics and neuropsychological information of the MCI and control groups were matched (see Table 5).
Means, Standard Deviations, and Significance Levels of Demographic information and Scores on Neuropsychological Tests in Experiment 3
Inclusion criteria
The inclusions criteria were the same as those used in Experiment 1 and Experiment 2.
Materials
The 36 foreground objects and 36 background scenes used in Experiment 1 and Experiment 2 were used as experimental stimuli. To remove the contextual influence of images in Experiment 1 and Experiment 2, each object and background scene was presented alone in this experiment. In addition, to rule out other possible confounding effects, the foreground objects were presented in red-frames, and each background scene was shown with a red-framed box in the middle (see Fig. 2).

Examples of object image (A) and scene image (B) presented alone.

Percentages of accurate answers under each condition. The figure shows the mean percentages of correct responses for object and scene recognition among healthy subjects and MCI subjects as a function of contextual consistency. Panels A1 and A2 display the data from the baselines and Experiment 1 (95% of trails reported object first). Panels B2 and B3 present the data from the baselines and Experiment 2 (scenes were required to be responded to first). Error bars indicate standard errors.
An online questionnaire was constructed, and no filler images were used. In the first phase of the questionnaire, we examined participants’ familiarity, naming difficulty, and naming accuracy for the objects and scenes when no contextual information was given. Therefore, each object and scene were viewed in isolation, and each image was followed by three questions. First, object/scene naming: subjects were required to report the name of the object or scene in the image. Second, naming difficulty judgement: subjects needed to rate the difficulty in naming the image on a 7-point scale (1 indicates extremely easy, 7 indicates extremely difficult). Finally, familiarity judgement: participants selected from 1 (not familiar at all) to 7 (extremely familiar) to represent their familiarity with the object/scene presented in the image. In addition, in the second phase of the questionnaire, subjects rated the consistency of foreground objects and background scenes. They viewed the images used in Experiment 1 and then rated how often an object occurs in the scene by using a 7-point scale (1 indicates very rare, 7 indicates very often).
Procedure
In this experiment, participants completed MoCA and MMSE tests first and then had their eyesight adjusted to normal. Afterwards, they were shown the online questionnaire using an iPad and were given brief instructions on the tasks to be completed. As most elderly have little experience with typing, the experimenter helped subjects to fill in their answers on the questionnaire. In the first phase of the questionnaire, after viewing each image, subjects verbally reported the name of the foreground object or background scene, and then they rated their naming difficulty and familiarity with the object/scene. In the second phase of the questionnaire, subjects made judgements about the semantic consistency of the foreground object and background scene. All experimental images were presented in random order. The experiment was approximately 40 min. Subjects were allowed to have a rest after finishing the MoCA and MMSE tests.
Data treatment and statistical analysis
The scoring of recognition accuracy was identical to Experiment 1, and the inter-coder reliability reached 0.97, indicating a relatively high agreement between the two coders.
Results
Recognition accuracy, recognition difficulty, and familiarity
We conducted by-subject ANOVAs to analyze recognition accuracy, naming difficulty, and familiarity, with group (Control versus MCI) and item type (Object versus Scene) as independent factors. The results showed a significant main effect of item type on recognition accuracy (F(1,54) = 20.86, p < 0.001 ηp2 = 0.28), naming difficulty (F(1,54) = 23.73, p < 0.001, ηp2 = 0.31), and familiarity (F(1,54) = 16.44, p < 0.001, ηp2 = 0.23). The findings indicate that scene recognition was less accurate and more difficult than object recognition, and subjects were less familiar with scenes than objects (see Table 6). However, there were no significant differences between subjects with MCI and the healthy controls on these variables, and no significant interactions were observed as well (ps > 0.05). Following Davenport & Potter’s [11] study, we performed by-item ANOVAs on recognition accuracy, familiarity, and difficulty with group (MCI versus healthy) as the independent factor. Separate analyses were conducted for object and scene, revealing that the effects of group remained insignificant for both object and scene (ps > 0.05).
Summary of Questionnaire: Means and Standard Deviations of Recognition Accuracy, Recognition Difficulty, Familiarity, and Consistency Rating
Contextual consistency judgement
We also did an ANOVA analysis on consistency ratings, with group (MCI versus Control) and the consistency condition (Consistent versus Inconsistent) as independent variables. There was a main effect for the consistency condition (by-subject: F(1,54) = 4375.93, p < 0.001, ηp2 = 0.99, by-item: F(1,70) = 1452.07, p < 0.001, ηp2 = 0.95), but there were no significant group differences or interactions between group and consistency (ps > 0.05).
Experiment 1 versus Experiment 3 (Facilitation or Interference)
In order to compare recognition accuracy between the consistent and inconsistent conditions (collected in Experiment 1) with the no-contextual baselines (collected in Experiment 3) (see Fig. 3), we conducted one-way ANOVAs with semantic consistency (consistent versus inconsistent versus baseline) as a within-subject independent factor. In addition, the analyses were performed separately for object and scene recognition, and for MCI subjects and healthy controls. The result showed that the main effect of consistency was observed in both MCI patients and healthy controls, but only for scene recognition (MCI: F(2,70) = 18.75, p < 0.001, ηp2 = 0.35; healthy: F(2,70) = 5.03, p = 0.009, ηp2 = 0.13), while it was absent for object recognition (ps > 0.05). Moreover, to further examine the consistency effect in scene recognition, a post-hoc analysis was conducted, and the p-values were adjusted using Bonferroni correction in order to remove influence of multiple-comparison. The post-hoc analysis revealed that among MCI patients, scene recognition was significantly less accurate when presented with an inconsistent foreground object compared to when presented in isolation (p < 0.001). Nevertheless, although a similar trend was observed among healthy subjects, it did not reach significance (p = 0.09). In addition, for both groups, a consistent foreground object did not significantly enhance scene recognition compared to when no foreground object was presented (ps > 0.05).
Experiment 2 versus Experiment 3 (Facilitation or Interference)
Similar analyses were performed to compare the recognition accuracy obtained in Experiment 2 with the baselines. We observed main effects of consistency among both MCI patients and healthy controls for both object (MCI: F(2,70) = 17.03, p < 0.001, ηp2 = 0.33; healthy: F(2,70) = 5.58, p = 0.006, ηp2 = 0.14) and scene recognition (MCI: F(2,70) = 20.92, p < 0.001, ηp2 = 0.37; healthy: F(2,70) = 4.77, p = 0.01, ηp2 = 0.12). Further post-hoc analyses revealed that object recognition accuracy was significantly lower in the inconsistent condition compared to the baselines (MCI: p < 0.001, healthy: p = 0.004). In addition, similar pattern was observed for scene recognition, but it reached statistical significance for MCI subjects (p < 0.001) and marginal significance for healthy subjects (p = 0.08). Nevertheless, there were no significant differences in object and scene recognition between the consistent condition and the baselines (ps > 0.05).
Discussion
The results of the questionnaire revealed no significant difference between MCI patients and the healthy controls in object and scene knowledge in terms of recognition accuracy, recognition difficulty, familiarity judgement, and semantic consistency ratings. The findings indicate that the MCI patients do not show significant impairments in object and scene knowledge. However, it was found that scene recognition tends to be more challenging than object recognition, evident from its lower accuracy, higher difficulty, and lower familiarity in scene identification. The results can possibly be explained by the selection and design of the experimental stimuli, and we will discuss this perspective in the general discussion.
In addition, the comparisons between Experiments 1 and 3 and Experiments 2 and 3 show the impact of contextual inconsistency on scene recognition among MCI patients and healthy subjects. Specifically, an inconsistent foreground object impeded scene recognition and resulted in lower scene recognition accuracy. However, such effects reached significance only for scene recognition among MCI patients, while they were marginally significant in healthy participants, suggesting a potentially stronger interference of inconsistency in MCI patients. Furthermore, contextual inconsistency significantly decreased object recognition compared to the baselines for both MCI and healthy subjects, when inconsistent scenes were responded to before foreground objects (Experiment 2), whereas, such interference was absent for object recognition, when objects were reported before scenes (Experiment 1). Further details and implications of these results will be discussed in the general discussion.
GENERAL DISCUSSION
Overall, the present research investigated whether MCI impairs the ability to process contextual information. In Experiment 1, we examined scene and object recognition under semantic consistent and inconsistent conditions, without specific requirement of response sequence nor time limitation. It was discovered that both MCI and healthy groups showed contextual consistency effects in scene recognition, with scenes being identified more accurately under the consistent condition than the inconsistent condition. In addition, we also observed significant lower recognition accuracy for MCI patients than for healthy seniors, when recognizing scenes presented with an inconsistent foreground object. However, object recognition did not vary as functions of consistency or MCI disease. As the majority of subjects reported foreground objects before scenes, Experiment 2 manipulated the response sequence by explicitly requiring subjects to report the names of backgrounds before reporting the foreground objects. With this manipulation, contextual consistency effects were observed for both object and scene recognition among both MCI and healthy subjects. Furthermore, MCI subjects performed worse than healthy controls on object and scene recognition under the inconsistent condition, but not under the consistent condition. In Experiment 3, we constructed baselines for object and scene identification when they were presented alone. Comparisons between Experiment 1 and Experiment 2 with the baselines indicated that the observed semantic consistency effects may have resulted from interference with inconsistent contextual information, which intruded on scene recognition (Experiments 1 and 2) and object recognition (Experiment 2) for both MCI patients and healthy subjects. In addition, it was found that scenes were generally rated as harder to identify and less familiar than objects. Nevertheless, significant differences between subjects with MCI and healthy controls were not found in terms of general recognition accuracy, the judgement of object-scene consistency, identification difficulty, and familiarity with the objects and scenes. The findings have several implications as discussed below.
Processing object and scene interactively
In the current study, we observed contextual consistency effects on scene recognition (Experiments 1 and 2) and object recognition (Experiment 1) in both MCI patients and the healthy controls. We found that incongruent contextual information significantly decreases recognition accuracy.
There are two ways in which the observed contextual consistency effect can arise: serial naming and concurrent presentation. For example, when reporting the background scene name before reporting the object name, the activated semantic representations of the scene may lead to interference in the subsequent object recognition (serial naming). This explanation may account for the contextual consistency effects on the latter named target in sequential recognition, such as scene recognition in Experiment 1 (where more than 95% trials reported object’s name prior to scene naming) and object recognition in Experiment 2. On the other hand, the concurrent presentation of an object or background that is semantically consistent or inconsistent with the target would activate object-scene associations and also affect target identification [11, 12, 22]. This account may explain the contextual consistency effect on scene recognition in Experiment 2, when scenes were named before reporting the objects. However, this type of consistency effect is absent for object recognition, which may be attributed to our experimental design and will be discussed in the last section of the general discussion.
The observed contextual consistency effect on scene recognition (Experiment 2) provides additional support for the object selection model, suggesting that contextual consistency affects recognition via adjusting the criterion at the percept-to-representation matching stage [14, 18–20, 30]. Following Leory’s [14] views, we suggest that low spatial frequencies percepts for objects and scenes occur simultaneously via parallel processing. It allows partial recognition and predictions about the probable objects and scenes. And then, if the object and scene are semantically consistent, the intersection between these two sources of predictions would further narrow down candidate interpretations through contextual knowledge. Therefore, less perceptual evidence would be required to achieve an exact identification. Otherwise, when the contextual relationship is violated, more perceptual information would be required for identification. Moreover, the contextual consistency effect on scene recognition can hardly be explained by the perceptual enhancement model [15–17]. This model predicts a null effect of foreground object on scene identification, as scene-gist perception and scene-based predictions of object candidates should be generated prior to initial object recognition.
However, the consistency effect on scene recognition cannot completely rule out the functional isolation account, which posits that object and scene recognition are independent from each other at the perceptual stage, and the observed contextual consistency effect is attributed to post-perceptual guesses [32, 33]. To further examine this possibility, we coded guessing errors in the inconsistent and consistent conditions. An incorrect answer was coded as a semantic guessing error if it was semantically consistent with the contextual information. For example, if the image presented the combination of ‘pencil box-kitchen,’ reporting the background scene as ‘library’ or ‘stationery shop’ would be considered a semantic guessing error. In general, pure semantic guessing errors were very rare in our data, accounting for less than 1% of all data in both object and scene recognition. In addition, there was no significant difference in semantic guessing errors between the MCI and control groups (p > 0.05). Therefore, the observed contextual consistency effect on scene recognition should not be the result of pure guessing based on object-scene association.
In general, the current findings generalize the contextual consistency effect to a more diverse sample, while previous research in this field mainly focused on young, healthy college students. Similar to young, healthy subjects, MCI patients and healthy seniors process objects and scenes interactively by utilizing stored contextual knowledge. In addition, the current findings contribute to our understanding of contextual consistency effects by providing further support for the object selection model.
Relatively intact ability in processing contextual information
We discovered no evidence in favor of a compromised ability in processing contextual information among MCI patients, which is in contrast to our predictions. On the one hand, the observed contextual consistency effects among MCI subjects reflect the relatively intact ability to implicitly process contextual information and take use of knowledge of object-scene relationships in object and scene recognition. This top-down semantic guidance is primarily based on automatic processes without effortful control. For another thing, the scene consistency ratings did not differ between MCI patients and healthy controls, indicating the ability of explicitly process contextual relationships is also preserved in MCI patients.
The current study, therefore, found no evidence in support of the proposition that there is a compromised ability to process contextual information among MCI patients, which is in line with Ramzaoui et al.’s [38] view. Moreover, Ramzaoui et al. [38] only reported preserved contextual consistency effects during the initial scene inspection, while the current study extends the findings further to the post-perceptual stage. With regards to the research by Lenoble et al. [3], impaired understanding of contextual consistency was observed among AD patients at the moderate stage. Thus, our findings seem to suggest that the ability to make use of contextual information remains intact in AD patients until the moderate stage. However, the failure to observe a deficit in context processing in MCI patients could also be attributed to the task used in the present study. Lenoble et al.’s [3] study investigated the consistency effect on the first saccade in a visual search task, capturing the initial activation of contextual information. In contrast, our experiment employed a recognition task with no time limitation, reflecting a relatively later processing stage or processing results. Moreover, according to Lenoble et al. [3], AD patients were able to correctly differentiate scene-consistent and scene-inconsistent images in manual responses when there was no time pressure, though they failed to distinguish scene consistency before initiating the first saccade. Therefore, we may also be able to detect an impaired context processing ability in MCI patients, if we examine recognition under time limitations.
Larger susceptibility to contextual inconsistency: inhibitory deficits or overdependence on semantic knowledge
In contrast to our prediction, patients with MCI were more likely to be affected by incongruent contextual information than healthy controls, resulting in decreased accuracy in both object recognition (Experiment 2) and scene recognition accuracy (Experiments 1 and 2). The increased susceptibility of MCI patients to incongruent contextual information may reflect underlying deficits in cognitive inhibition or an overdependence on semantic knowledge.
First, a great deal of research has demonstrated deficient inhibition-related processes in individuals with MCI, suggesting a compromised ability to suppress task-irrelevant information [71–74]. Therefore, MCI subjects participating in the present study exhibit reduced capacity to filter out goal-irrelevant information, such as distractor interference (e.g., an inconsistent foreground object presented simultaneously during scene naming) and proactive interference (e.g., naming an inconsistent background prior to object identification). As a consequence, MCI subjects may experience larger competition for attention resources and semantic representations between the target stimuli to be recognized and the inconsistent contextual information that should be ignored. This competition impedes recognition and leads to an enlarged consistency effect. In this way, the findings observed in our study may be attributed to deficits in the inhibitory processes among MCI patients.
In addition, it is also possible that the greater vulnerability of MCI patients to inconsistent contextual information is driven by their overdependence on semantic knowledge during problem-solving. Previous research has found that reliance on heuristics (referring to pre-existing knowledge structures such as schemas, stereotypes, or scripts) can reduce processing costs and improve processing speed and efficiency, while solving complex problems [75–77]. However, an over-reliance of heuristics would result in more inadequate responses, when stored knowledge is inappropriate to deal with novel or unfamiliar situations [76–78]. In the current study, the over-reliance on contextual knowledge may have led to more errors under the inconsistent condition. Furthermore, previous research has shown that compared to younger subjects, older adults are more reliant on schematic knowledge and are prone to making schema-based errors due to age-related cognitive decline [79, 80]. The over-reliance on schematic knowledge may be further amplified by dementia, as a compensatory mechanism for severe cognitive impairments [81, 82].
A substantial body of functional brain imaging research (fMRI) in early AD patients has demonstrated increased brain activity within the prefrontal cortex during cognitive tasks, compared to age-matched healthy controls [83, 84]. The increased prefrontal activity in AD patients may indicate a greater reliance on prefrontal lobe, reflecting a compensatory reallocation of cognitive resources to maintain adequate cognitive performance [81, 82]. Moreover, it has been suggested that prefrontal activity plays a key role in processing schema-related information, such as generating top-down guidance during cognitive tasks [85]. Previous studies have shown that enhanced activity in prefrontal lobe is associated with schema-congruency effect [86]. Consequently, the observed hyperactivity in the prefrontal cortex of MCI patients may result in a stronger top-down semantic modulation, ultimately amplifying the effects of contextual consistency.
The enlarged contextual consistency effects in MCI subjects may result from both defective inhibitory processes and over-reliance on semantic knowledge. The two explanations do not have to be mutually exclusive, and the two factors may confound each other, making it hard to isolate their contributions to the enlarged contextual consistency effects in MCI patients. Further studies can be carried out to examine these two explanations in greater detail.
Additionally, the findings also have possible practical implications in detecting MCI disease. It was revealed that scene recognition accuracy under contextually inconsistent conditions can reliably distinguish MCI subjects from healthy controls, suggesting that such measures may be useful in diagnosing MCI disease. However, the existing screening tests for MCI only considered object naming, with relatively little attention given to scene naming, especially when contextual consistency is violated [63, 64, 87]. The current study, therefore, suggests that it is valuable to include contextual inconsistent images and scene recognition tasks in the screening test for MCI disease.
Contextual consistency facilitation versus contextual inconsistency interference
By comparing recognition accuracy under inconsistent conditions and consistent conditions (Experiments 1 and 2) to their baselines (Experiment 3), it was found that recognition accuracy of objects (Experiment 2) and scenes (Experiments 1 and 2) declined in both MCI patients and healthy controls when object-scene associations were semantically incongruent, as opposed to when they were presented alone. However, there was no significant improvement in recognition accuracy when comparing object/scenes recognition under congruent contextual condition (Experiments 1 and 2) with the baselines (Experiment 3). The findings indicate that the observed contextual consistency effects may stem from contextual inconsistency interference rather than contextual consistency facilitation.
On one hand, the absence of consistency facilitation may be due to the ceiling effect, as the objects and scenes employed in the current experiments are generally highly familiar and of low difficulty. Therefore, recognition accuracy baselines reached 98% for object and 93% – 94% for scene, leaving relatively little room for further improvement. On the other hand, the findings can also be attributed to attentional distraction elicited by goal-irrelevant information. For example, during scene naming, the presentation of foreground objects can distract attention and interfere with response execution [11, 88]. In this way, relative to recognizing object and scene in isolation, identifying objects or scenes under both consistent and inconsistent conditions impedes recognition. Nevertheless, such interference can be offset by contextual consistency facilitation and becomes less noticeable under consistent conditions. Therefore, it is possible that contextual facilitation exists but was not observed in the current study.
Asymmetrical contextual consistency effect due to experiment design
Critically, the contextual consistency effect derived from concurrent presentation, as discussed earlier, was only observed in scene recognition and was absent in object recognition. Semantic consistency affected object identification only when background scenes were named first (Experiment 2), but not when objects were named first (Experiment 1, 95% of trials). Therefore, why does the concurrent presentation of semantic consistent/inconsistent information affect scene recognition but not object recognition? There are three possible explanations.
First, in the experimental images, there was a gap between backgrounds and foreground objects, which prevented the segmentation of objects from their backgrounds during object recognition. Thus, object naming can be relatively independent of its background, as long as object recognition occurs before scene naming (Experiment 1). However, when recognizing a scene, its foreground object was located at the center of the image, and subjects’ visual scanning could not skip it, leading to interactive processing of objects and scenes.
Additionally, it is important to note that an occlusion was presented at the center of the background scenes, rendering the scene images less recognizable. It has been suggested that as recognition difficulty increases, individuals tend to rely more on extrinsic information [21, 30, 89]. Therefore, in scene recognition, subjects may depend more on contextual information due to the higher difficulty involved compared to object recognition. Consistent with this notion, the questionnaire results (Experiment 3) revealed that scenes were rated as less familiar and more difficult to recognize compared to objects.
The use of indoor scene images in this study may also increase recognition difficulty and enhance the likelihood of observing contextual consistency effects in scene recognition. Among the images used, there were 31 indoor scenes and 5 outdoor scenes. Previous research has shown that indoor scenes are more similar in local and global visual textual patterns than outdoor scenes [90, 91], which may further increase the difficulty of scene recognition.
The explanations mentioned above do not have to be mutually exclusive, and it is likely that we would observe the concurrent-presentation-based contextual consistency effects in object recognition, if the foreground objects were directly pasted onto the background scene or if the foreground objects were less recognizable or less familiar. Whereas, our interpretation must be made with caution, and it is necessary to carry out further experiments to examine the underlying mechanisms of the asymmetrical contextual consistency effects.
Conclusion
In summary, the present study provides a better understanding of whether MCI affects the ability to make use of contextual information in object and scene recognition. The results show that the ability of processing contextual information and understanding object-scene relationships appear to be preserved in MCI patients. In addition, similar to healthy subjects, they also process object and scene interactively. However, due to the deficit in their inhibitory abilities or the over-dependence on semantic knowledge, the disturbances of inconsistent contextual information in recognition tasks were amplified, resulting in significant decreases in recognition accuracy. Taken together, the current study provides insights to object and scene recognition and improves our understanding of the cognitive dysfunction of MCI patients. This may help to identify better cognitive tests for detecting MCI and provide a foundation upon which further studies need to be conducted.
Footnotes
ACKNOWLEDGMENTS
The authors would like to thank all the participants for taking part in this research.
FUNDING
This research was founded by “The Youth Project of the Shandong Social Science Planning Fund Program” (20DYYJ04).
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
