Abstract
Background
Sleep disorders are a significant risk factor for cognitive decline and Alzheimer's disease (AD). However, preclinical studies investigating the effects of sleep deprivation (SD) on cognition and synaptic proteins have produced inconsistent findings, hindering translational progress.
Objective
This systematic review and meta-analysis identify key factors moderating the effects of SD on cognition and synaptic proteins in rodent models.
Methods
Following PRISMA guidelines, we analyzed 21 eligible studies using meta-analysis, subgroup, meta-regression, and multilevel analyses to identify sources of experimental heterogeneity.
Results
SD significantly reduced synaptic proteins overall. While the global effect on cognition was not significant, subgroup analyses revealed robust cognitive impairment in Wistar rats undergoing fragmented sleep, particularly when assessed by Morris water maze or novel object recognition tests. Key synaptic proteins (PSD-95, synaptophysin) were consistently reduced in the hippocampus and prefrontal cortex.
Conclusions
Our comprehensive review synthesizes diverse studies, concluding that methodological choices are the primary drivers of heterogeneity in preclinical SD research. We provide evidence-based guidance for selecting appropriate rodent models, behavioral paradigms, and biochemical indicators to investigate the molecular mechanisms linking sleep disorders and cognitive decline. This work offers a significant framework to standardize and enhance the reliability of preclinical studies. By validating models that directly connect fragmented sleep—a condition common in AD patients—to synaptic pathology in vulnerable brain regions, our research strengthens the mechanistic link between sleep disturbance and cognitive impairment in AD and encourages a greater focus on this critical relationship for the readers of the Journal of Alzheimer's Disease Reports and AD researchers.
Introduction
An increasing number of individuals worldwide are grappling with sleep disturbances. Recent reports indicate that over a third of adults slept for less than 6 h per night, while nearly half of adults aged over 60 experienced sleep disorders.1–6 It is well-known that sleep plays a pivotal role in learning and memory consolidation. 7 A decrease in sleep quality or duration inevitably would impair memory and cognition. 8 Sleep disturbance has been considered as a contributor to cognitive decline and dementia. 9
In contrary, experimental animal research plays an indispensable role in clarifying molecular mechanism. 10 Summarizing and interpreting experimental results from animal studies is crucial for guiding and informing future research endeavors.10–12 The sleep deprivation (SD) rodent model is the most commonly used model in animal research on sleep disorders. 13 Numerous experimental animal studies have confirmed SD could disrupt memory and cognition, which is associated with synaptic plasticity and synaptic proteins.14–19
Behavioral tests have already been demonstrated in numerous literatures to be a valuable approach for directly investigating common indicators and detection methods of cognitive dysfunction.20–22 A variety of methods evaluating rodents behavior of learning and memory, such as Morris water maze (MWM), novel object recognition (NOR), Y-Maze, have been widely used. 23 Furthermore, the visual detection of cognitive functions using common synaptic associated proteins can effectively reveal cognitive status. However, variations in research methods (e.g., SD methods), research protocols (e.g., SD duration) as well as animal factors (e.g., species and strains) utilized in different studies present challenges for conducting cohesive research.
To address the impact of these factors on cognition and synaptic proteins resulting from sleep deprivation, we conducted this meta-analysis comprehensively. Our study would contribute to guiding the future experimental study of sleep deprivation, especially in choosing rodent model, behavior paradigms and biochemical metrics for evaluating cognition in rodents.
Methods
Systematic review
This systematic review was performed in the guidance with the updated 2020 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 24 It encompasses four domains: animals (as population), sleep deprivation (as intervention), control group (as comparison), cognition and synaptic associated proteins (as outcomes). The review was prospectively registered with PROSPERO at https://www.crd.york.ac.uk/PROSPERO/ as an animal study (CRD42022370665).
Search strategy
Three different online databases (PubMed, Web of Science-complete collection, and Scopus) were searched on November 2022 to identify experimental studies related to cognition and sleep deprivation in animal models. The main topics were “rat” or “mouse” and “sleep deprivation” and “synaptic proteins”. The detailed topics used in this study are included in the Supplemental Material. The full original protocol that was uploaded to PROSPERO is available in the Supplemental File 1. To identify animal studies together with specified experimentation goals that is disclosed below, the search strategy for PubMed database and the search strategy of the Web of Science can be accessed in the Supplemental Material.
After excluding duplicated references, we screened titles and abstracts to identify potentially relevant articles that met the inclusion criteria outlined. Subsequently, a full-text analysis of the remaining articles was performed. This analysis is carried out independently by two researchers. In case where there is disagreement between the researchers, a third researcher was consulted for resolution. Additionally, as part of our supplementary search, we screened for eligible studies referenced in the articles obtained from the bibliographic search and also considered studies recommended by experienced researchers.
Inclusion and exclusion criteria
Experimental studies that assessed the effects of sleep deprivation on cognition and synaptic associated proteins in rodents were considered meeting inclusion criteria. The following inclusion criteria were used: (1) Article type: Experimental studies. And the article language is limited to English. (2) Population: Healthy experimental rodents. We restrict species to mice and rats, but no restriction access to ages, strains and sex (3) Intervention: Sleep deprivation. Any type of experimentally induced sleep loss (including total sleep deprivation, partial sleep deprivation, rapid eye movement (REM) sleep restriction, and sleep fragmentation) was included. There were no restrictions for the method, duration and frequency of sleep deprivation. (4) Control group: A separated control group (between-subject design) or by measuring the outcome at baseline before the sleep intervention (within-subject design). At least one sleep group or control condition should be subjected without concomitant intervention for inclusion. (5) Outcome measures: Any numeric measure of behavior test, such as MWM, Elevated arm test, Y maze, T maze etc. Numeric measure of synaptic associated protein, such as postsynaptic synapse density 95 (PSD-95), synapsin I (SYN, or SYN1), growth associated protein 43 (GAP43), synaptotagmin (SYT), and synaptophysin (SYP) with western blotting or ELISA.
Exclusion criteria: (1) Reviews, theoretical studies and any other type of non-original articles were excluded. (2) Non-experimental sleep deprivation (genetic models, etc.) will be excluded. Surgical or pharmacological methods of sleep deprivation will also be excluded. (3) No relevant outcome measure related to behavior test (such as novel object recognition, T maze, elevated maze, Morris Water Maze, Y maze etc.), or synaptic associated proteins (PSD-95, SYN, GAP43, SYP, SYT). (4) In vitro and in silico studies. (5) Articles after 1/11/2022.
Data extraction
Data were extracted directly from the full text by two independent reviewers if the relevant articles met the inclusion criteria. In case of any discrepancies or disagreements, a third reviewer was consulted to resolve them. If the article did not provide data or study characteristics, the original authors were contacted for clarification. In cases where contact attempts were unsuccessful, results presented as graphs were estimated proportionally to the axis. The following data were extracted from each article: title, authors, full reference, authors’ institution, sample size, species, strain, age, number of sleep-deprived animals, number of control animals, type of sleep deprivation, method, duration, types of behavior test, and measurement of synaptic-associated proteins. Included articles could contain multiple comparisons. Each comparison between a control and a sleep-deprived group for one or more variables was separately included. For every comparison, the numerical variable used to assess behavior and synaptic proteins was extracted as mean ± standard deviation. In studies using standard error (SE) of the mean, the standard deviation was calculated by dividing it by the square root of the sample size.
Risk of bias assessment
The SYRCLE Risk of Bias (RoB) Tool was used to evaluate the possible biases within the selected articles for animal studies. 25 All RoB items presented on the tool were screened. No detailed methodological assessment of the results of RoB was conducted, aside from inclusion. Each RoB item is graduated into three descriptive levels: low risk of bias, unknown risk of bias, and high risk of bias.
Publication bias assessment
To visually assess possible publication bias, funnel plots were created for the included studies using the “metabias” function of the “meta” package. Each funnel plot displays standard mean difference (SMD) on the x-axis and standard error (SE) on the y-axis. To avoid possible false-positive results, trim and fill method and Egger's test were used to adjust for funnel plot asymmetry and correct the publication bias.26,27 If p < 0.05, indicating the presence of publication bias, and if p ≥ 0.05 indicating no publication bias. Finally, radial plots were illustrated to assess heterogeneity and publication bias in the “metafor” package with “rma” function. 28
Sensitivity analysis
All included studies were conducted tests using a random effect model and visually inspected the sensitivity charts using the “metainf” function of the “meta” package. The overall forest plots were displayed for primary and secondary outcomes separately. The sensitivity analysis was conducted by omitting one article at a time and then performed a meta-analysis on the remaining articles (n-1 articles). This method was used to assess the stability of studies within primary or secondary outcomes and to identify the studies that may contribute to the heterogeneity.
Data analysis
Meta-analysis
An initial meta-analysis about overall behavior-additional to the original protocol-was performed with all included studies, analyzing the most important mental behavior-related outcome in each article (in order: time spent in target quadrant or arm, distance in the test, entries into the target quadrant or arm). Both behavioral test or synaptic protein within the subject and between subjects were analyzed separately under intervention (SD) and control conditions. Meta analyses were performed whenever three or more studies assessed comparable outcome variables. Data in the forest plots are presented and determined using effect size (represented as standard mean difference, SMD) ± 95% confidence interval (CI, calculated by Q-Profile method for confidence interval of τ2 and τ). Results were considered significant when CIs did not contain zero, associated with the significance test (p < 0.05). Subgroup analyses were performed on the potential moderators causing heterogeneity in either behavior test or synaptic protein. For subgroup categories, the effect size (Hedges’ g corrected method) was pooled by category and not by experiment, with subsequent generic inverse variance meta-analysis was used to estimate the effect sizes (with Knapp-Hartung method), resulting in k = 27 effect sizes of 468 observations in behavioral test, and k = 43 of 509 observations in synaptic proteins.29,30 The I2 index were used to test heterogeneity between studies, with over 50% considered as high heterogeneity and less than 30% as low heterogeneity. 31 For the subgroup analysis for behavioral test there were eight subgroups, including method of SD, behavior test, SD duration (less than 7 days for short term and the other for long term), species, strains, age, study location, study regions, and for synaptic associated proteins there were twelve, such as method of SD, biochemical test, synaptic proteins, SD duration, species, strains, relative expression, ROI, loading control, study location, study design, and age).
Meta regression
Meta-regression was performed to analyze the effect of protocol parameters on the magnitude of study outcomes among subjects. Meta-regression analysis of between-subjects studies was performed using the “metareg” function. Moderating variables include age, behavior test, duration, method of SD, SD days, species, strains, relative expression, ROI, synaptic proteins, loading control, biochemical test, study location, and study regions. Variance function analysis was used to examine the overall effect of categorical moderators in meta-regression.
Multilevel meta-analysis
To account for the statistical dependency inherent in nested data structures, where multiple effect sizes were derived from single studies, a multilevel random-effects meta-analytic framework was constructed with “metafor (v 4.4-0)”. 32 This approach obviated the need to average or discard data, thereby preserving statistical power and enabling a more nuanced investigation of heterogeneity. Depending on the complexity of the data's hierarchical structure, either three-level (primary outcomes) or four-level (secondary outcomes) models were implemented to partition variance appropriately at the effect size within-study and between-study levels. All models were fitted using the restricted maximum likelihood (REML) estimation method, with the SMD, including Hedges’ g correction, serving as the effect size metric.
The selection of the model structure was statistically validated through formal model comparisons using “anova” function. Specifically, likelihood ratio tests were conducted to confirm the superior fit of the more complex multilevel models over simpler, reduced alternatives. To investigate sources of heterogeneity, multilevel mixed-effects models (i.e., meta-regression) were conducted and assessed the overall significance of categorical study characteristics as moderators using omnibus F test.
To ensure the robustness of statistical inferences, particularly in the context of a small number of studies, a sequence of advanced methods was applied. First, the correlated and hierarchical effects (CHE) models were fitted, and calculated with the robust variance estimation (RVE) employing “wildmeta (v 0.3.2)” and “clubSandwich (v 0.6.1)” packages. This method provided reliable standard errors for moderator coefficients without requiring explicit knowledge of the within-study correlation structure. To counteract the tendency of standard RVE to produce inflated type I error rates in small samples, CR2 small-sample corrections were applied to the variance-covariance matrix. While corrected RVE effectively controls type I error, it can be overly conservative, leading to a loss of statistical power, especially when testing multiple-contrast hypotheses (e.g., the overall effect of a categorical moderator with more than two levels). Therefore, the cluster wild bootstrapping (CWB) was implemented as a superior inferential method for such tests. CWB is a non-parametric resampling algorithm that empirically generates a null distribution for the test statistic by repeatedly fitting the model to datasets constructed from the residuals of a null model. This procedure maintains nominal error rates while providing greater statistical power than corrected RVE, offering more definitive conclusions for complex hypothesis tests.
Cumulative meta-analysis
To investigate the dynamic evolution of the evident base for both primary and secondary outcomes, a cumulative meta-analysis was conducted for each, and performed using the “metacum” function. In this procedure, studies were chronologically ordered by their publication year. A random-effects meta-analysis, using the REML method to estimate between-study variance (τ²), was then performed iteratively, with each run including one additional study. The SMD was used as the effect size measurement. This temporal approach allows for a retrospective assessment of how the pooled effect size, its precision, and the level of between-study heterogeneity (quantified by the I² statistic) have changed as scientific evidence has accumulated over time. The primary value of this method lies in its ability to contextualize the final meta-analytic result by revealing the stability and historical trajectory of the evidence.
Trial sequential analysis
To ascertain the robustness of the cumulative evidence and control for the risk of type I errors arising from repeated significance testing, we performed trial sequential analysis (TSA) on both primary and secondary outcomes using the “TSA (v 1.3.1)” package. TSA adjusts the thresholds for statistical significance by constructing monitoring boundaries based on the accrued information size. This method helps to determine whether the available evidence is conclusive or if further studies are required. For this analysis, we set the overall type I error rate (α) at 0.05 (two-sided) and the type II error rate (β) at 0.20, corresponding to a statistical power of 80%. We calculated the required information size (RIS) needed to reliably detect the observed effect size. The O'Brien-Fleming α-spending function was utilized to construct the trial sequential monitoring boundaries. A conclusive finding was determined if the cumulative Z-score curve crossed the TSA monitoring boundaries for benefit, harm, or futility. If the curve did not cross any boundary and the total accrued information was less than the RIS, the evidence was deemed inconclusive.
Statistical analysis
All analyses were performed in the R 4.3.1. using the “meta (v 6.5-0)”, “metafor (v 4.4-0)”, “wildmeta (v 0.3.2)”, “clubSandwich (v 0.6.1)”, and “TSA (v 1.3.1)” packages. For the between-subjects meta-analysis conducted on cognition (behavioral test) and synaptic associated proteins, the “metacont” function was used to perform a random-effects meta-analysis with the Knapp-Hartung method for variance estimation, and Hedges’ g corrected method to calculate and correct the SMD. Publication bias was assessed using the trim and fill (“trimfill” function) method to explore funnel plots’ asymmetry and further statistically proven by Egger's regression test. The random-effects model was used depending on the absence or presence of heterogeneity. Overall effect sizes in meta-analyses were calculated using the random-effects model. The statistically significance was considered p < 0.05. And all forest plots were carried out using GraphPad Prism (version 7.0, GraphPad Software, Boston, USA) and “ggplot2” package. 33 The Sankey diagrams were displayed with “ggalluvial” and “ggplot2” packages.33,34 The pie charts were created with “ggplot2” package.
Results
Selected studies and sample description
As shown in Figure 1, we bibliographically searched 907 articles with 898 articles included after removing duplicates. Through full texts screening and estimating eligibility, only 127 studies were included. Eventually only 21 studies were eligible due to correct animal population, outcomes, and interventions. All these rodent studies (n = 21), the most common method for sleep deprivation was the modified multiple platform method (MMPM, n = 15) and the others are rotation bar (n = 1), running belt (n = 1), gentle touch (n = 1) and shaking cage (n = 2). Moreover, 12 studies evaluated cognition for primary outcome and 20 studies assessed synaptic proteins for secondary outcome. The specific character description was shown in Table 1.

Flow chart of study selection. Study flowchart of excluded and included studies in the meta-analysis. With bibliographically searched 907 articles, 898 articles were included after removing duplicates. Through full texts screening, only 127 studies were included. And excluding inappropriate abstracts, only 21 studies were eventually eligible due to availablity correct animal population, outcomes, and interventions.
List and description of selected articles.
N, number; Ctrl, control; NA, not applicable; SD, sleep deprivation; MMPM, modified platform method; MWM, Morris water maze; NOR, novel object recognition; CFC, condition fear context; PSD-95, post-synaptic density 95; SYN-1, Synapsin 1; SYP, synaptophysin; GAP43, growth-associated protein 43; USA, United States of America.
Characteristics of included studies
For included 21 studies, 188 SD model animals were enrolled, divided into kinds of strains and undergone different duration of SD (Supplemental Tables 1 and 2). To quantified cognition, primary outcome could be sorted into exploring time (n = 5), recognition index (n = 7) and times of interaction with goals (n = 4) (Supplemental Table 3). To be more specific, MWM is a widely accepted method (n = 6) and the others are Y-maze, NOR (n = 3, respectively), condition fear context test (CFC), T-maze and Lashley III maze (n = 1, respectively, Figure 2(a) and (b); Supplemental Tables 4 and 5). For secondary outcome, PSD-95 and SYP, SYN1, GAP43, and SNAP25 were considered (Figure 2(c) and (d); Supplemental Tables 6 and 7).

Study characters. (a) Sankey plot of primary indicators for cognition. (b) Pie charts of detailed subgroup distribution. (c) Sankey plot of secondary indicators for cognition. (d) Pie chart of detailed subgroup distribution.
Study quality assessment
Risk of bias assessment
To comprehensively evaluate the reliability of the evidence synthesized in this meta-analysis, a rigorous, multi-faceted assessment was conducted, examining both the methodological quality of individual studies (risk of bias) and the potential for publication bias across the body of literature. The results revealed a differential risk of bias across outcome domains, which has significant implications for the interpretation of the overall findings. First, the methodological quality of the 21 included studies was assessed using the SYRCLE Risk of Bias tool, revealing a variable quality profile (Figure 3). While most studies demonstrated a low risk of bias in domains such as random sequence generation, selective reporting, and handling of incomplete outcome data, significant deficiencies were noted in the critical area of blinding. Specifically, a high or unclear risk of performance bias (blinding of personnel) and detection bias (blinding of outcome assessment) was prevalent across the majority of studies. This lack of blinding compromises the internal validity of the primary evidence, as it introduces a high potential for unconscious influence on experimental procedures and data interpretation. Following the assessment of individual study quality, the overall body of literature was evaluated for publication bias using graphical (funnel plots) and quantitative methods (Egger's regression test and the Trim and Fill method).27,56 For the primary cognitive outcomes, there was strong and convergent evidence of substantial publication bias. The funnel plot was visibly asymmetrical, a finding confirmed with high statistical significance by Egger's test (p = 0.0001), and the Trim and Fill analysis imputed seven potentially missing studies (Tables 2–4; Supplemental Figure 1). These results collectively point to a “file drawer problem,” wherein smaller studies with null or negative findings are likely underrepresented. Consequently, the pooled effect size for cognitive impairment reported in this meta-analysis is likely an overestimation. For the secondary synaptic protein outcomes, the evidence was more nuanced. While visual inspection and the Trim and Fill method did not suggest significant bias, the more sensitive Egger's test detected a statistically significant asymmetry (p = 0.0182), indicating a subtle, underlying publication bias (Tables 3 and 4; Supplemental Figure 1).

Study quality assessment. (a) Assessment for the studies in the meta-analysis. (b) Risk of bias assessment for the meta-analysis included in the between- and within-subjects meta-analysis for intakes using the SYRCLE RoB (Risk of bias) tool for animal studies.
Risk of bias assessment.
SYRCLE: SYstematic Review Centre for Laboratory animal Experimentation.
Trim and fill method for publication bias.
I2: test of heterogeneity; d.f.: degree of freedom.
Egger's test for publication bias.
d.f.: degree of freedom; se: standard error.
In conclusion, this comprehensive bias assessment reveals a clear differential in the robustness of the evidence: the findings related to synaptic protein alterations appear more robust and less affected by publication bias than those related to cognitive performance. The strong bias detected in the cognitive outcome literature, compounded by methodological weaknesses in blinding, highlights a potential systemic issue in preclinical behavioral neuroscience where positive findings are preferentially published. Therefore, while this meta-analysis confirms a consistent link between sleep deprivation and changes in synaptic proteins, the magnitude of its effect on cognitive function must be interpreted with considerable caution. These findings underscore the critical importance of implementing measures such as study pre-registration and rigorous adherence to blinding to mitigate bias and ensure that future evidence synthesis is based on a more complete and reliable body of literature.
Sensitivity analysis
To robust the effectiveness, we utilized a random-effects model to conduct sensitivity analysis. For primary outcome, provided certain article excluded, no obvious alternation of SMD was seen on the forest map, indicating that the overall sensitivity of cognitive assessments met criteria (Supplemental Figure 2A). Similarly, included articles for synaptic protein measurements revealed an eligible sensitivity, with no reports of exceptional data (Supplemental Figure 2B).
Effectiveness
Effects of sleep deprivation on cognition (primary outcome)
Among 12 articles conducted behavioral tests, the overall effect (range in yellow shadow) of reduced cognition was in a SMD of −0.22, but with no significance (95% CI [−0.73, 0.28], p = 0.38). Simultaneously, an obvious heterogeneity was tested within the overall cognitive assessments (Figure 4(a), I2 = 77%, τ2 = 1.4753, p < 0.01). The sources of this heterogeneity within the primary outcome were identified in the subsequent subgroup analyses (shown below).

Effects of sleep deprivation on behavior tests and synaptic associated proteins. (a, b) The overall effects of SD on behavior test (Primary Outcome) and synaptic proteins (Secondary Outcome). The green dashed lines represented the overall 95% CI; the black thin dotted lines represented the overall effect sizes (SMDs). The red dashed lined represented the highest upper CIs and least lower CIs of included studies.
Effects of sleep deprivation on synaptic associated proteins (secondary outcome)
Twenty articles assessed synaptic associated protein levels in sleep-deprived animals. The effect sizes were analyzed and revealed a highly reduction of synaptic protein levels (SMD = −1.11, 95% CI [−1.69, −0.54], p < 0.01). Considered the level of heterogeneity observed (Figure 4(b), I2 = 76%, τ2 = 2.4586, p < 0.01), subgroup analyses were conducted to investigate the probable causes for heterogeneity (shown below).
Subgroup analysis from meta-analyses
Subgroup analysis for behavioral test (primary outcome)
For comprehensively accounting the effect of SD on cognition, subgroup analysis was conducted based on different categories (Table 5). Firstly, for the age of the experimental animals, 11 studies utilized young-adult rodents (2–6 months) (SMD = −0.15, 95% CI [−0.82, 0.52], I2 = 81%, τ2 = 2.0806, p < 0.01, weight 77.3%); revealing a significant source of overall heterogeneity (Figure 5(a), Table 5). Besides, among these age groups, young-adult mice exhibited less overall effect size compared to middle-age mice (Figure 5(a), Table 5).

Subgroup analysis for behavioral tests. (a) Subgroup analysis of the animal conditions showed a more stable alteration for rats. (b) Subgroup analysis of the method of SD, indicating an obvious reduction of cognition for short-term SD and shaking cage. (c) Subgroup analysis of the behavior testing revealed a more adaptable paradigm of NOR and MWM. The bar represents the 95% CI. *p < 0.05.
Subgroup analyses for between-subject studies of behavioral test.
No.: Numbers; SMD: standard mean derivation; CI: confidence intelligence; SD: sleep deprivation; MMPM: modified platform method; MWM: Morris water maze; NOR: novel object recognition; CFC: condition fear context; d.f.: degree of freedom; τ2: heterogeneity variance.
Likewise, subgroup analysis was conducted based on the species of experimental animals. The rats were used in 5 studies (Figure 5(a), Table 5, SMD = −0.71, 95% CI [−1.56, 0.14], I2 = 74%, τ2 = 1.2008, p < 0.01, weight 29.9%) and 7 studies used mice (Figure 5(a), Table 5, SMD = −0.02, 95% CI [−0.64, 0.61], I2 = 78%, τ2 = 1.5695, p < 0.01, weight 70.1%). It was suggested that both rat and mouse species contribute to the overall heterogeneity, but experiments conducted with mice showed greater heterogeneity and less significant overall effect, with almost no alternation in cognition for mouse studies. Subgroup analysis based on the rodent strain also revealed a significant source of overall heterogeneity in the primary outcome. To be more specific, 3 studies used Sprague Dawley rats (Figure 5(a), Table 5, SMD = −0.41, 95% CI [−1.86, 1.04], I2 = 84%, τ2 = 2.3677, p < 0.01, weight 18.3%), 2 studies used Wistar rats (Figure 5(a), SMD = −1.09, 95% CI [−1.67, −0.52], I2 = 0%, τ2 = 0, p = 0.44, weight 11.6%) and four studies used C57BL/6J mice (Figure 5(a), SMD = −0.19, 95% CI [−0.66, 0.29], I2 = 57%, τ2 = 0.3974, p < 0.01, weight 45.2%). Among these strains, Wistar rats exhibit a relatively favorable effect size in the primary outcome, with an obvious reduction of cognition.
Apart from age, species and strains of animals, method and duration of SD were also considered as a source of heterogeneity. Most articles used method of SD was the MMPM (Figure 5(b), Table 5, SMD = −0.31, 95% CI [−0.79, 0.17], I2 = 73%, τ2 = 1.0626, p < 0.01, weight 85.5%). Moreover, gentle touch exhibited an abnormal overall effect size (Figure 5(b), Table 5, SMD = 2.14, 95% CI [−1.29, 5.57], I2 = 90%, τ2 = 5.5316, p < 0.01, weight 6.6%), suggesting a difference effect on cognition between REM sleep loss and all sleep loss. In contrast, the use of the shaking cage method in SD studies yielded an effect size that aligned with expectations and showed good overall consistency (Figure 5(b), Table 5, SMD = −1.32, 95% CI [−2.30, −0.35]), suggesting a fragmented sleep could remarkably impair cognition. Regarding the duration of SD, most of studies conducted short-term SD and revealed less heterogeneity and a downtrend of cognition (Figure 5(b), Table 5, SMD = −0.37, 95% CI [−0.88, 0.14], I2 = 74%, τ2 = 1.1870, p < 0.01, weight 81.8%), compared with long-term SD (Figure 5(b), Table 5, SMD = 0.44, 95% CI [−1.23, 2.11], I2 = 85%, τ2 = 3.2472; p < 0.01, weight 18.2%).
For behavioral paradigm, MWM method was used in 6 studies (Figure 5(c), Table 5, SMD = −0.70, 95% CI [−1.19, −0.20], I2 = 58%, τ2 = 0.4078, p < 0.01, weight 41.7%). Besides, three articles used Y-maze and NOR respectively (Figure 5(c), Table 5, SMD = 0.43, 95% CI [−0.44, 1.30], I2 = 66%, τ2 = 0.8184, p = 0.01, weight 21.7% for Y-maze; SMD = −0.63, 95% CI [−1.25, 0.00], I2 = 62%, τ2 = 0.3844, p = 0.02, weight 23.6% for NOR). Taking together, Y-maze method yields an adverse effect size, when compared with NOR and MWM, indicating that NOR and MWM may be more adaptable to assess cognitive function in SD rodents.
Finally, the geographic location of the studies was identified as a significant source of heterogeneity (Table 5, Q = 27.30, p < 0.0001). Notably, studies from Brazil (SMD = 1.84, 95% CI [−0.72, 4.40]) and France (SMD = 2.61, 95% CI [1.07, 4.15]) reported outcomes suggesting cognitive enhancement, whereas studies conducted in Iran (SMD = −1.50, 95% CI [−2.22, −0.77]), India (SMD = −0.81, 95% CI [−1.52, −0.10]), and China (SMD = −0.43, 95% CI [−0.88, 0.01]) indicated cognitive impairment (Table 5; Supplemental Figure 3A). When aggregated by continent, this disparity remained significant (Q = 19.36, p < 0.0001), with studies from South America and Europe showing positive effects that contrasted sharply with the negative effect observed in studies from Asia (Table 5; Supplemental Figure 3A; SMD = −0.63, 95% CI [−0.97, −0.29]). This suggests that unmeasured systematic differences related to laboratory practices, environmental conditions, or animal sourcing across different global regions may substantially influence experimental outcomes.
Subgroup analysis of synaptic associated proteins (secondary outcome)
For synaptic associated proteins, subgroup analysis was conducted based on the category of experimental animal conditions (Table 6). Most studies with 16 articles used young adult rodents (Figure 6(a), Table 6, SMD = −1.22, 95% CI [−1.77, −0.68], I2 = 75%, τ2 = 1.8896, p < 0.01, weight 90.8%). Notably, the studies used middle-aged rodents exhibited greater heterogeneity of synaptic protein alterations and an adverse overall effect tendency compared to the young-adult group. As for species, 12 studies used rats (Figure 6(a), Table 6, SMD = −0.73, 95% CI [−1.65, 0.18], I2 = 76%, τ2 = 3.8645, p < 0.01, weight 56.1%) and 8 studies used mice (Figure 6(a), Table 6, SMD = −1.57, 95% CI [−2.32, −0.82], I2 = 73%, τ2 = 1.6852, p < 0.01, weight 43.9%), indicating a stable alternation for mouse. To be more specific, Wistar rats and Sprague Dawley rats constituted 42.0% and 14.1% of the overall samples respectively and compared with SD-Rat, Wistar rats revealed an insignificant alternation of synaptic proteins and reveled a heterogeneity (Figure 6(a), Table 6, SMD = −0.33, 95% CI [−1.73, 1.08], I2 = 78%, τ2 = 7.4223, p < 0.01 for Wistar rats; SMD = −1.83, 95% CI [−2.59, −1.07], I2 = 0%, τ2 = 0, p = 0.91 for Sprague Dawley rats). For mice, C57BL/6J were widely used (Figure 6(a), Table 6, SMD = −2.13, 95% CI [−3.71, −0.56], I2 = 76%, τ2 = 3.4704, p < 0.01, weighting 17.3%), and for Swiss mice, Balb/c mice and Kunming mice, except Kunming mice, there are reduced synaptic proteins (Figure 6(a), Table 6, SMD = −0.67, 95% CI [−1.30, −0.06], weight 12.5%; SMD = −2.53, 95% CI [−3.55, −1.50], weight 8.9%; SMD = −0.51, 95% CI [−4.13, 3.12], weight 4.4% respectively).

Subgroup analysis of synaptic associated proteins. (a) Subgroup analysis of the animal conditions demonstrated an adverse alternation tendency for middle age and a more stable change witnessed in mouse. (b) Subgroup analysis of the method of SD revealed a negative effect size in the expression of synaptic proteins except from using running belt. (c) Subgroup analysis of the behavior testing indicated a more stable reduction of synaptic proteins for beta-actin used as internal reference in WB. (d) Subgroup analysis of the brain region and types of synaptic proteins, suggesting a more susceptibility for SYP and PSD95 in HIP and PFC. The bar represents the 95% CI. *p < 0.05.
Subgroup analyses for between-subject studies of synaptic associated proteins.
No.: Numbers; SMD: standard mean derivation; CI: confidence intelligence; SD: sleep deprivation; NA: not applicable; MMPM: modified platform method; MWM: Morris water maze; NOR: novel object recognition; CFC: condition fear context; ROI: region of interest; HIP: hippocampus; PFC: prefrontal cortex; Con: control; WB: western blot; ELISA: enzyme-linked immunosorbent assay; GAPDH: glyceraldehyde 3-phosphate dehydrogenase; PSD-95: post-synaptic density 95; SYN-1: Synapsin 1; SYP: synaptophysin; SNAP25: synaptosome associated protein 25; GAP43: growth-associated protein 43; τ2: heterogeneity variance.
For methods and duration of SD, 15 studies with MMPM for SD (Figure 6(b), Table 6, SMD = −1.21, 95% CI [−2.05, −0.38], I2 = 79%, τ2 = 4.3303, p < 0.01, weight 73.2%) revealed a larger negative effect size in the expression of synaptic proteins and the other studies except from using running belt also indicated a reduction of that (Figure 6(b), Table 6, SMD = −2.54, 95% CI [−4.17, −0.91] for shaking cage; SMD = −0.68, 95% CI [−1.30, −0.06] for gentle touch). On top of that, compared with long-term SD (Figure 6(b), Table 6, SMD = −1.31, 95% CI [−2.60, −0.03], I2 = 45%, τ2 = 0.3547, p = 0.16), most studies performed short-term SD (Figure 6(b), Table 6, SMD = −1.10, 95% CI [−1.72, −0.48], I2 = 77%, τ2 = 2.7379, p < 0.01, weight 93.3%) and demonstrated a remarkable reduction of overall effect.
What is more, for biochemical method of synaptic associated proteins detection, compared with enzyme-linked immunosorbent assay (ELISA), western Blotting (WB) is most used (Figure 6(c), Table 6, SMD = −1.15, 95% CI [−1.76, −0.53], I2 = 77%, τ2 = 2.7405, p < 0.01). On top of that, as far as internal reference used in WB, GAPDH (I2 = 85%, p < 0.01) and beta-tubulin (I2 = 63%, p < 0.01) ware deemed as a source of heterogeneity for detecting synaptic associated proteins. Besides, compared with other internal reference, beta-actin was commonly used and presented a more significant relative reduction of synaptic proteins (Figure 6(c), Table 6, SMD = −2.01, 95% CI [−2.58, −1.44], I2 = 51%, τ2 = 0.4103, p < 0.01, weight 34.8%).
Regional susceptibility also considered as a source of heterogeneity, particularly for HIP (Figure 6(d), Table 6, SMD = −1.39, 95% CI [−2.12, −0.66], I2 = 78%, τ2 = 3.3193, p < 0.01, weight 78.1%). Notably, except from whole brain homogenate, cognition related region, such as HIP and PFC (Figure 6(d), Table 6, SMD = −0.85, 95% CI [−1.31, −0.40], I2 = 0, τ2 = 0, p = 0.59, weight 21.0%), indicated a significant reduction in synaptic proteins. Plus, different proteins existed different resistance to SD. PSD-95 (Figure 6(d), Table 6, SMD = −1.10, 95% CI [−1.90, −0.30], I2 = 72%, τ2 = 2.1947, p < 0.01, weight 78.1%) and SYP (Figure 6(d), Table 6, SMD = −1.30, 95% CI [−2.23, −0.38], I2 = 76%, τ2 = 1.8676, p < 0.01, weight 31.1%) are more sensitive to SD.
Geographic factors also significantly moderated the outcomes for synaptic proteins (Q = 14.19, p = 0.0145; Table 6; Supplemental Figure 3B). Studies from Iran reported a particularly large reduction in synaptic proteins (SMD = −6.14, 95% CI [−9.79, −2.49]), with substantial negative effects also seen in India (SMD = −1.44, 95% CI [−3.35, 0.47]) and China (SMD = −1.20, 95% CI [−2.22, −0.18]; Table 6; Supplemental Figure 3B). In contrast, studies from Brazil showed a negligible effect (SMD = −0.11, 95% CI [−0.69, 0.47]), while the few from the USA indicated a large, anomalous increase in proteins (SMD = 3.43, 95% CI [−6.99, 13.85]). This regional pattern was also significant (Q = 10.86, p = 0.0044), with a strong negative effect in Asia (SMD = −1.62, 95% CI [−2.34, −0.90]) compared to minimal or positive effects in South and North America, respectively (Table 6; Supplemental Figure 3B). These findings further reinforce that geographic location is a powerful, though likely proxy, variable for underlying methodological differences that drive heterogeneity in this field.
Meta regression
A total of 74 comparisons were included in the meta-regression. The potential moderative effect was calculated and behavioral paradigm and SD duration were identified as significant moderators in the primary outcome meta-regression model (Table 7, p < 0.0001), accounting for some heterogeneity. With moderators added sequentially (Supplemental Table 8), method and duration of SD, and strains of experimental animals as predictors accounted for the highest amount of heterogeneity. Paralleling with the above obtained results from subgroup analyses, duration (short term), MWM and NOR presented a relatively better effect in the SD regression models within all moderators (Table 7 and Supplemental Table 8). Furthermore, geographic factors were confirmed as significant moderators in the meta-regression models. When study region was included as a predictor alongside other variables such as SD duration or animal species, the overall model was consistently significant (Table 7, p < 0.05). For instance, a model including study region and species accounted for 45.22% of the total heterogeneity (Table 7, p = 0.0058), with studies from Europe and South America showing significantly different effects compared to Asia. Similarly, when analyzing specific countries, a model that included both location and behavioral test type was highly significant (Table 7, p = 0.0011) and explained 75.29% of the heterogeneity, with studies from China, Iran, and Turkey emerging as significant predictors of the cognitive outcome (Table 7).
Moderators of primary outcomes in mix effect models by meta-regression.
se: standard error; SD: sleep deprivation; I2: test of heterogeneity; τ2: heterogeneity variance.
For secondary outcome meta-regression models (Table 8), method and duration of SD, strains of animals and internal reference accounted for the significant model heterogeneity (p < 0.0001), in line with our results in the subgroup analyses. Specifically, shaking cage for SD, internal reference of GAPDH and beta-tubulin, and whole brain detection are particularly significant effectors of model heterogeneity, while Wistar rats and short-term SD accounted for more sensitivity in the regression models of synaptic proteins change (Supplemental Table 9). Thereby, geographic location also proved to be a significant moderating factor for synaptic protein outcomes. In a mixed-effects model that included both animal strain and location, the moderators were highly significant (Table 8, p = 0.0088) and accounted for nearly 50% of the between-study heterogeneity (Table 8, R² = 49.99%). In this model, studies conducted in Iran showed a significantly larger negative effect compared to the reference location. When aggregated by continent, a model including study region and animal age was also significant (Table 8, p = 0.0244), explaining 26.63% of the heterogeneity, with studies from North and South America showing significantly different effects than those from Asia. These results underscore that geographic variable, likely acting as proxies for localized methodological standards or environmental factors, are important predictors of heterogeneity in both behavioral and biochemical outcomes (Table 8).
Moderators of secondary outcomes in mix effect models by meta-regression.
se: standard error; SD: sleep deprivation; ROI: region of interest; I2: test of heterogeneity; τ2: heterogeneity variance.
Multilevel meta-analysis
To elucidate the effects of SD, two distinct multilevel meta-analyses were conducted. The first analysis examined behavioral tests for cognition using a three-level model to account for dependencies within the primary outcomes data. The second, more granular analysis investigated the synaptic associated proteins, employing a four-level model to dissect a more complex hierarchical data structure and explore sources of heterogeneity in greater details of secondary outcomes.
Overall effects of sleep deprivation on cognition (primary outcome) by multilevel meta-analysis
To account for the dependency of 27 effect sizes nested within behavioral tests (level 2) and study regions (level 3), a three-level meta-analysis model was conducted for deciphering SD effects on cognition (primary outcomes). The overall pooled effect was non-significant (Figure 7(a), Table 9, estimate/β = 0.914, 95% CI [−1.38, 3.20], p = 0.4196). A variance decomposition of the null model revealed that the majority of this heterogeneity was attributable to the highest level of clustering, with 56.11% of the total variance occurring between study regions and 37.89% between behavioral tests within those study regions (Figure 7(b)). The choice of the final three-level model was empirically justified by likelihood ratio tests (LRT). Compared to the other reduced models with any of two levels, the three-level model indeed showed a better fit, with lower Akaike (AIC) and Bayesian Information Criterion (BIC), indicating favorable performance and better balances goodness of fit with model complexity. The three-level model was also confirmed better estimate of the pooled effect that accounting for the same model direction of the other reduced models (Table 10, reduced model1, LRT = 28.11, p < 0.0001; reduced model2, LRT = 7.63, p = 0.0057), which significantly improved model fit. To further investigate potential sources of the three-level model heterogeneity for primary outcomes, we examined two moderator, method of SD (Table 11, F test = 3.8515, p = 0.0354) and species (Table 11, F test = 5.4427, p = 0.028), according to the previous analysis results and the subgroup analysis results in three-level model. To better unveil the crucial moderators in model, the stratified multilevel meta-analysis of primary outcomes with correlated and hierarchical effects (CHE) models was constructed using standard omnibus Wald-type tests, with two moderators (method of SD and species) as stratifications. Moreover, the overall stratified meta-analysis results suggested that both the method of SD (QM = 19.16, d.f. = 3, p = 0.0003) and species (QM = 15.61, d.f. = 2, p = 0.0004) were significant moderators, which consistent with previous results (Supplemental Table 10). While for the detailed coefficients from above moderators, none of them displayed the significance, suggesting the reliable robust variance estimation, reasonably approximated, and potential misspecification of the three-level model (Supplemental Table 10). However, these tests are known to exhibit highly inflated type I error rates when the number of high-level clusters is small, as was the case here with only three study regions. This unreliability was further suggested by the discrepancy between the significant omnibus tests and the non-significant individual coefficients for each moderator level. Consequently, to obtain robust inferences, we conducted cluster wild bootstrapping (CWB) algorithm to re-evaluate the overall effect of each moderator. This more appropriate, CWB test revealed that the overall moderation effects were not statistically significant for either the method of SD (p = 0.506) or species (p = 0.741), which further suggesting the robust and reliability of the three-level model for SD effects on behavioral tests (Supplemental Table 11). Therefore, while substantial unexplained heterogeneity exists, particularly between study regions, the tested methodological and biological variables could not be identified as reliable sources of this variance.

Multilevel meta-analysis and distribution of variance across levels for primary and secondary outcomes. (a, c) Forest plots displayed the multilevel meta-analysis of the primary and secondary outcomes, respectively. Solid lines represent the 95% confidence intervals; the pink dashed lines represent the heterogeneity statistic I²; the light blue diamond represents the pooled effect size. (b, d) Distribution of variance and heterogeneity (I2) across different levels and the model.
Multilevel meta-analysis of primary and secondary outcomes.
REML: restricted maximum-likelihood; se: standard error; d.f.: degree of freedom; CI: confidence interval; loglik: log-likelihood for multivariate meta-analysis model; AIC: Akaike information correction; BIC: Bayesian information correction.
Multilevel meta-analysis models comparisons of primary and secondary outcomes.
d.f.: degree of freedom; CI: confidence interval; loglik: log-likelihood for multivariate meta-analysis model; AIC: Akaike information correction; BIC: Bayesian information correction; LRT: likelihood ratio test.
Subgroup analyses in multilevel meta-analysis models of primary outcomes.
se: standard error; SD: sleep deprivation; MMPM: modified platform method.
Overall effects of sleep deprivation on synaptic associated protein (secondary outcome) by multilevel meta-analysis
In contrast to the behavioral tests, the analysis of synaptic associated proteins revealed a more complex data dependency structure, necessitating a more sophisticated analytical approach. To account for effect sizes nested within strains, which were in turn nested within method of SD and study regions, a four-level meta-analysis model was constructed. This model, based on 43 effect sizes, revealed a statistically significant, negative overall pooled effect (Figure 7(c), Table 9, estimate/β = −0.7428, 95% CI [−1.4103, −0.0753], p = 0.0300). This result stands in stark opposition to the findings for primary outcomes, suggesting that while the behavioral tests of SD may be ambiguous and context-dependent, a clear and consistent signal is present at the underlying biochemical level of synaptic associated proteins. The decision to employ a four-level model was not merely theoretical but was rigorously supported by empirical model comparison. For valid comparison using LRT, the full four-level model and other reduced three-level alternative models were fitted and compared. As detailed in Table 10, the full model of secondary outcomes consistently demonstrated a superior fit, evidenced by substantially lower AIC (261.3953) and BIC (268.4401) values compared to all other reduced models. Crucially, the LRTs confirmed that the inclusion of the additional variance component provided a significantly better explanation of the data than models that collapsed or omitted a level of the hierarchy. This comprehensive model selection process ensures that the subsequent inferences are based on the most appropriate and statistically justified representation of the data's structure.
Furthermore, the four-level meta-analysis confirmed the presence of extremely high total heterogeneity (Figure 7(d), I2 = 92.47%), corroborating with significance. The primary value of the four-level model lies in its ability to partition this substantial variance across the different levels of the data hierarchy. The variance decomposition revealed that the single largest contributor to heterogeneity was the choice of animal strain (level 4), accounting for 45.53% of the total variance, followed by the study region (level 3), which accounted for 29.22%, and the method of SD (level 2), which accounted for 17.72%. A modest negative correlation between variance components (Table 9, −0.6083) was also observed, suggesting complex interdependencies in the variance structure across hierarchical levels.
To further investigate potential sources of the four-level model heterogeneity for secondary outcomes we examined seven moderators, which were location (F test = 3.8515, p < 0.0001), biochemical tests (F test = 0.7474, p = 0.3923), ROI (F test = 5.5222, p = 0.0076), synaptic proteins (F test = 0.4651, p = 0.7083), SD duration (F test = 10.6819, p = 0.0022), loading control (F test = 4.2751, p = 0.0059) and species (F test = 1.5309, p = 0.223), according to the previous analysis results and the subgroup analysis results in three-level model (Table 12). In contrast to the analysis of primary outcomes, where sources of heterogeneity remained elusive, the stratified four-level meta-analysis results successfully identified ROI (QM = 23.1865, d.f. = 3, p < 0.0001), location (QM = 51.609, d.f. = 6, p < 0.0001), loading control (QM = 14.1485, d.f. = 5, p = 0.0147), and SD duration (QM = 17.7653, d.f. = 2, p = 0.0001) as key moderators for the secondary outcomes (Supplemental Table 10). This demonstrates that with a sufficiently detailed and well-structured dataset, the variability that often plagues preclinical research can be statistically explained. To further unveil the detailed coefficients from above moderators, the stratified four-level model results were presented in Supplemental Table 10.
Subgroup analyses in multilevel meta-analysis models of secondary outcomes.
se: standard error; ROI: region of interest; PFC: prefrontal cortex; USA: United States of America; Con: control; WB: western blot; GAPDH: glyceraldehyde 3-phosphate dehydrogenase; PSD95: post-synaptic density 95; SYN-1: Synapsin 1; SYP: synaptophysin.
For ROI, the post-hoc analysis of this coefficient in the four-level CHE models, revealed that this omnibus effect was driven by studies that analyzed tissue from the whole brain with a significant and large positive effect (estimate/β = 7.9867, p = 0.0003), whereas studies that focused on specific, targeted brain regions such as the HIP (estimate/β = 1.6838, p = 0.197) or the PFC (estimate/β = 0.9708, p = 0.4596) showed no statistically significant effect (Supplemental Table 10). While for the geographic location of the laboratory conducting the research, only studies conducted in Iran showed a significant, large negative effect (estimate/β = −5.0073, p = 0.0165), while no other single country demonstrated a statistically significant effect on its own (Supplemental Table 10). The most critical methodological finding of this four-level meta-analysis was the identification of the loading control used for synaptic associated protein quantification in WB, among which GAPDH was associated with a significant positive effect (estimate/β = 3.4741, p = 0.0387), consistent with the results of subgroup meta-analysis and meta-regressions, while other common controls like beta-actin or beta-tubulin were not associated with a significant effect (Supplemental Table 10). This result provided strong meta-analytic evidence for a widely suspected but often overlooked methodological flaw in molecular neuroscience, pointing to a potential methodological division which was often inherent to SD experiment protocols for rodents. Therefore, this finding underscores the critical importance of validating loading controls for specific experimental contexts and serves as a significant cautionary note for the field. And the effect of SD duration was primarily influenced by studies employing short duration protocols, although this effect was only marginally significant (estimate/β = 5.5427, p = 0.0590, Supplemental Table 10).
However, aforementioned tests are known to exhibit highly inflated type I error rates when the number of high-level clusters is small. Consequently, to obtain more robust inferences, we re-evaluated the CHE effect of each moderator employing the CWB algorithm. This more appropriate, statistically conservative test revealed that the overall moderation effects were not statistically significant for ROI (p = 0.745), location (p = 0.243), loading control (p = 0.2665), or SD duration (p = 0.5025) in Supplemental Table 11, suggesting the robust and reliability of the four-level model for SD effects on synaptic associated proteins. Collectively, while the CWB analysis tempered the conclusion of definitive moderation, the stratified results strongly suggest that animal strain, study location, anatomical ROI, and specific protein expression reference were the principal, albeit complex and interacting, sources of the profound heterogeneity in the biochemical outcomes.
In summary, these comprehensive multilevel meta-analyses present a bifurcated view of the effects of SD. The impact on higher-order, primary cognitive functions on behavioral tests appeared to be non-significant at the aggregate level and is characterized by substantial, unexplained heterogeneity that is structurally linked to the study region and the specific behavioral test employed. This suggested that the behavioral consequences of SD are highly sensitive to experimental context, making broad generalizations difficult. In stark contrast, a significant and robust overall effect was identified for secondary biochemical tests on synaptic associated proteins. Crucially, the extreme heterogeneity observed in these outcomes was not random noise but could be systematically explained by specific, identifiable methodological and biological factors. The choice of animal strain was the single largest driver of variability, followed by the geographic location of the study (a proxy for localized research practices and genetic drift) and the anatomical brain region analyzed. Most notably, the analysis uncovered a significant methodological artifact related to the use of GAPDH as a loading control, providing strong evidence of systematic bias in a subset of the included studies. Together, these findings highlight the critical sensitivity of preclinical results to specific experimental parameters and provide clear, data-driven directions for improving the rigor, validity, and reproducibility of future research in this field.
Cumulative meta-analysis
Gradually emerging effects of sleep deprivation on cognition (primary outcome)
The cumulative meta-analysis for the primary outcome reveals a narrative of gradual evidence maturation. The final pooled result from a standard meta-analysis indicates a statistically significant, albeit small, negative effect (Figure 8(a), SMD = −0.35, 95% CI [−1.01, 0.30]). The cumulative analysis contextualizes this finding by illustrating the path taken to reach this conclusion. As shown in the cumulative forest plot and time-series graph (Figure 8(a) and (b)), the initial evidence was highly unstable, with the pooled SMD fluctuating on both sides of the null line for several years. A statistically significant effect was not established until the inclusion of the 15th study in 2021, at which point the 95% confidence interval for the cumulative SMD first excluded zero. From this point forward, the effect remained significant and demonstrated increasing stability, even as new studies were added. Concurrently, the between-study heterogeneity remained persistently high throughout the analysis, stabilizing at approximately 80% (Figure 8(b), Supplemental Table 12). The significance of the cumulative analysis for the primary outcome demonstrated that the conclusion was not self-evident from the outset. It highlights that a considerable volume of research was required before a consistent, statistically significant signal could be detected. This underscores the risk of drawing premature conclusions in a developing field and validates the final result as one that has emerged from a large and ultimately consistent body of evidence, despite the high underlying heterogeneity.

Cumulative meta-analysis and fluctuation of SMD and heterogeneity for primary and secondary outcomes over time. (a, c) Forest plots displayed the cumulative meta-analysis of the primary and secondary outcomes, respectively. (b, d) The double y-axis line charts illustrated the changes of standardized mean differences (SMD) across the cumulative meta-analysis. The blue solid lines represent the 95% confidence intervals; the purple dashed lines represent the heterogeneity statistic I², ordered by the chronological inclusion of studies. The overall trend of SMD is represented by a solid line that fluctuates with sequential study inclusion. Distribution of variance and heterogeneity (I2) across different levels and the model.
Robust effects of sleep deprivation on synaptic associated proteins (secondary outcome)
In stark contrast to the primary outcome, the cumulative meta-analysis of the secondary outcome tells a story of an early, decisive, and highly stable effect. The final pooled result is a strong and highly significant negative effect (Figure 8(c), SMD = −1.58, 95% CI [−3.36, 0.19]). The cumulative plots show a dramatic trend (Figure 8(c) and (d)). Following some initial volatility in the first two to three studies, the pooled SMD shifted decisively. A statistically significant negative effect was established as early as 2013, with the inclusion of just the fifth study (Figure 8(d), Supplemental Table 13). Remarkably, once this significance was achieved, it was never lost. Over the subsequent decade and the addition of more than 30 further studies, the cumulative 95% confidence interval consistently remained on the side of a negative effect. The point estimate, while moderating from its initial extremes, stabilized and has remained highly significant. The value of the cumulative analysis here is profound. It moves beyond the static final p value to provide a powerful validation of the finding. It demonstrates that the observed effect is not a fragile statistical anomaly dependent on the full dataset, but rather a robust signal that was present early in the research timeline and has been consistently reinforced over many years. This historical stability provides a much higher degree of confidence in the conclusion than a standard meta-analysis alone could offer.
Trial sequential analysis
Inconclusive evidence of sleep deprivation on cognition (primary outcome)
To evaluate the reliability and conclusiveness of our meta-analytic findings, we conducted trial sequential analysis (TSA) for both the primary and secondary outcomes. For the primary outcome, the conventional meta-analysis yielded a statistically significant pooled effect, with a final SMD of −0.2309 and a cumulative Z-score of −2.399. This Z-score surpasses the traditional significance threshold of −1.96 (Figure 9(a), Table 13). However, the TSA provides a more conservative assessment. As depicted in Figure 9(a), the cumulative Z-score trajectory (blue line), while crossing the conventional boundary, failed to cross the more stringent trial sequential monitoring boundary for harm (red dashed line). Furthermore, the total accrued information size was 107.91, which accounts for only 73.3% of the required information size (RIS) of 147.17 (Figure 9(a), Table 13, green dashed line). This information gap indicates that the analysis is underpowered. Therefore, despite the nominally significant result from the standard meta-analysis, the TSA reveals that the current evidence for the primary outcome is not definitive and remains susceptible to a potential type I error. The findings are thus considered inconclusive, and further research is warranted to draw a firm conclusion.

Trial sequential analysis of primary and secondary outcomes. (a, b) The line charts displayed the cumulative meta-analysis of the primary and secondary outcomes. The cumulative Z-curve (blue solid lines) represented the trial sequential analytic results as studies are added chronologically. Horizontal green dashed lines indicated the alpha spending boundaries (upper: 0.025, lower: 0.975) to control for type I error, while the vertical dashed line denotes the required information size (RIS) to achieve sufficient statistical power. The beta boundary (pink dotted line) marks the threshold for futility.
Trial sequential analysis of primary and secondary outcomes.
SMD: standard mean difference.
Conclusive and robust evidence of sleep deprivation on synaptic associated proteins (secondary outcome)
In contrast to the primary outcome, the analysis of the secondary outcome provided a conclusive result. The final pooled SMD was identical at −0.2309, but the cumulative Z-score reached a highly significant value of −9.9338 (Figure 9(b), Table 13). The TSA plot for the secondary outcome demonstrates that the cumulative Z-score trajectory decisively crossed the trial sequential monitoring boundary for harm (Figure 9(b)). This event provides firm statistical evidence for a significant negative effect, even after adjusting for multiple testing. Although the accrued information size of 81.45 represents only 55.3% of the calculated RIS, the strength of the cumulative effect was sufficient to meet the criteria for a definitive conclusion prematurely (Figure 9(b), Table 13). Consequently, the TSA confirms that the evidence regarding the secondary outcome is robust and conclusive, indicating that further studies are unlikely to alter this finding.
Discussion
Sleep plays a pivotal role in facilitating learning, consolidating memories, and preserving cognition. 57 It promotes the upregulation of synaptic proteins, thereby perpetuating synaptic plasticity.58–60 Extensive researches emphasized the adverse impact of chronic sleep disorders on cognitive function via impeding metabolic waste removal and synaptic reconfiguration.61–64 Both human subjects and animals demonstrate prolonged compensatory periods following SD to restore normal cerebral function. However, exceeding this recovery window poses risks to cognitive and synaptic deficiency, underscoring the challenge of mitigating the consequences of SD. 65
Current sleep-related research employs diverse SD methods, such as modified multiplatform methods for REM SD and a specialized cage for fragmented SD.66–68 The chronic fragmentated SD rather than total deprivation is usually to simulate the most common kind of sleep disorders in AD.69,70 Apart from that, gentle touch, forced movement, and other methods are utilized for SD as well, but each potentially bearing unique nonspecific effects yet to be comprehensively explored.
Through the application of meta-analysis and systematic review, we have analyzed the primary evidence for preclinical animal studies from 21 articles and offered a rational and unbiased overview of the impact of sleep deprivation on cognition and synaptic associated proteins.
Heterogeneity in cognitive impairment caused by SD
Although most studies claim control over these nonspecific effects, our meta-analysis reveals that different SD modalities contribute to heterogeneity in final effect sizes on cognition. In our study, our main results suggest that method and duration of SD emerge as pivotal factors contributing to heterogeneity.
Methods in SD
Among the 21 studies included in our analysis, the MMPM method was the most commonly used (weight = 85.5%). Notably, the shaking cage method exhibited better overall agreement compared to other methods (SMD = −1.32, 95% CI [−2.30; −0.35]). Considering about the aforementioned method for SD, shaking cage induced fragmented sleep may exert a notable effect on cognition. Conversely, the gentle touch method, observed in only one study, contradicted established research findings and consensus, presenting significant heterogeneity (I2 = 90%) and differing effect sizes (SMD = 2.14). This hints at potential influences such as intervention duration and heterogeneity of researcher touching. Based on our findings, we recommend the shaking cage method for constructing animal models of SD due to its superior consistency and its statistically significant difference in effect values compared to alternative approaches.
Duration in SD
Our study attempted to investigate the role of SD duration in driving heterogeneity in primary outcomes. Nevertheless, our meta-analysis and subsequent subgroup analyses underscore the pivotal role of SD duration, directly impacting the stability and reliability of findings. Short-term SD notably shows a significant detrimental effect on cognition, aligning with Sabia et. al. findings linking sustained short sleep duration to a 30% risk of dementia, corroborated with meta-analyses by Lim J and Dinges DF indicating short-term SD impact on cognition.71,72 Contrarily, long-term SD interventions may yield more erratic cognitive behavioral indicators. Integrating these SD models with varying duration in our overall effect size assessment elucidates the significant role of SD duration in overall heterogeneity, potentially yielding disparate outcomes. Future studies should discern if different SD duration uniformly promote cognitive dysfunction and explore potential compensatory mechanisms, e.g., sleep rebound.73–75
Behavioral experiments in SD
Given the prevalent use of behavioral experiments in rodents to evaluate cognitive functions in various disease models, our analysis focused on behavioral test as primary outcomes in rodent SD models. In recent years, the novel object recognition experiment gained prominence as a recommended method, exhibiting promising overall effect sizes across these cognitive assessments in SD animal models (SMD = −0.63, 95% CI [−1.25, 0.00]). 76 This method involves multiple brain region collaborations to detect cognitive memory changes post-SD. Despite its lower weight in our study (weight = 23.6%), its mechanisms warrant consideration. Besides, MWM stands as a widely acknowledged method for evaluating cognitive behavior, renowned for its efficacy in discerning spatial memory, primarily engaging the hippocampus and prefrontal lobes. In our meta-analysis encompassing 16 studies exploring the effects of SD on cognitive deficits using the MWM method. The MWM exhibited lower heterogeneity (I2 = 58%) and more favorable effect sizes (SMD = −0.70, 95% CI [−1.19, −0.20]). In contrast, the CFC method, though employed in a single study, yielded markedly disparate results compared to other studies (SMD = 4.16, 95% CI [2.75, 5.57]). Thus, in future animal experiments on sleep deprivation and cognitive dysfunction, it is essential to expand the utilization of diverse ethological methods to minimize variability stemming from ethology type. Despite its recent increased use and emphasis on reflecting cognitive function decline and memory capacity in animals, particularly in response to fearful or electroshock stimuli, its limited inclusion in our analysis necessitates caution.34,77–79
Rodents in SD
Behavioral tests are very direct, convenient for detecting behaviors in rodent models. However, in clinical practice, there are limited tests for patients who suffer or are suffering from SD or sleep disorders, and the means of evaluating cognitive functions are mostly limited to cognitive behavioral scales, which rely more on the experience and judgment of the attending doctors and lack of objectivity. 80 Therefore, in order to better explore the link between sleep disorders and cognitive dysfunction, the role of experimental animal models cannot be ignored, and the common animal models are rodent models, i.e., rat or mouse models. Overall, our study indicated both the MWM method and the NOR method as suitable tools for evaluating SD-induced cognitive dysfunctions.
Our subgroup analyses revealed that the age of the experimental animals was at the root of the overall heterogeneity of the main results. Specifically, when we analyzed age as a factor, we found that young adult mice exhibited more pronounced heterogeneity (I2 = 81%) and lower overall effect levels (SMD = −0.15) compared to middle-aged mice. However, there were no significant differences in behavioral tests in middle-aged mice. These differential results may be attributable to aging-induced changes in sleep habits, sleep structure, and sleep quality levels.81–84 Our subgroup analyses also underscored the significance of experimental animal age in driving overall heterogeneity in primary results. Specifically, when age was considered, young adult mice exhibited higher heterogeneity (I2 = 81%) and lower overall effect levels (SMD = −0.15) compared to middle-aged mice, where no significant differences in behavioral tests were noted. These contrasting outcomes may stem from age-induced alterations in sleep patterns, structure, and quality levels.
Moreover, the decomposition of multilevel meta-analysis provided a powerful, data-driven confirmation of a widely discussed but rarely quantified source of irreproducibility in preclinical neuroscience. The finding that animal strain is the paramount driver of variability aligns with an extensive body of literature demonstrating profound behavioral, physiological, and cognitive differences between commonly used rodent strains, such as Wistar versus Sprague-Dawley rats or C57BL/6J versus BALB/c mice. Furthermore, the substantial contribution of study region to the variance likely reflects more than just geographic location; it is a proxy for the effects of genetic drift, where geographically separated breeding colonies of the same nominal strain diverge genetically over time, leading to distinct sub-strains with unique phenotypes. This analysis, therefore, does not merely report heterogeneity but pinpoints its primary sources, highlighting the critical importance of genetic background in determining experimental outcomes.
In addition, the consistency of the behavioral tests was also affected by species, with differences in performance between mice of different age groups, as previously described.85,86 Although rats contributed to the overall heterogeneity of the primary outcome (I2 = 74%), their contribution was less significant compared with mice (I2 = 78%). In addition, the overall effect size in the rat group after SD treatment (SMD = −0.71) was smaller than that in the mouse group (SMD = −0.02). Overall, the rat group showed relatively better performance after SD treatment compared to the mouse group. Diving deeper into heterogeneity across various rat and mouse strains—such as Sprague Dawley rats, Swiss mice, and C57BL/6J mice—highlighted their impact on primary results. Among these, the Wistar rat strain displayed robust outcomes and effect sizes (SMD = −1.09). While Sprague Dawley rats exhibited inconsistent performance in the overall outcome index (SMD = −0.41, 95% CI [−1.86, 1.04]; I2 = 84%; τ2 = 2.3677; p < 0.01). Our findings suggest that the Wistar rat strain stands out as a preferred experimental model for studying the effects of SD on cognition due to its sensitivity and relative stability in reflecting behavioral changes. However, C57BL/6J mice, renowned for their utility in gene editing and specific processing requirements, could also serve in these studies. 87 Nonetheless, conducting further quantitative investigations is essential to address potential overlooked publication bias.
Geographic disparities of cognition in SD
A striking finding from our analysis was the significant role of geographic location as a moderator of cognitive outcomes. The subgroup analysis revealed a highly significant difference between studies conducted in different countries and continents (Q = 27.30, p < 0.0001). Specifically, studies from Asia consistently reported cognitive impairment following SD (SMD = −0.63, 95% CI [−0.97, −0.29]), whereas those from Europe and South America reported contradictory effects suggesting cognitive enhancement. This geographic disparity was confirmed in the meta-regression analysis, where study region was a significant predictor of the outcome, explaining 45.22% of heterogeneity when modeled with animal species (p = 0.0058).
These findings do not imply that geography itself directly influences neurobiology, but rather that location serves as a powerful proxy for latent methodological variables that are not explicitly reported. It is well-documented that subtle, unstandardized differences in laboratory environments—such as housing conditions, ambient noise, microbiome, and experimenter handling—can lead to profound lab-to-lab variation in behavioral phenotypes, even when using the same animal strains and protocols.88–90 This phenomenon, sometimes termed the “standardization fallacy” highlights how rigorous within-lab standardization can paradoxically decrease between-lab reproducibility by amplifying the effects of minor, lab-specific environmental factors.90–92 The strong regional clustering of effects observed in our analysis strongly suggests that such systematic, localized differences in experimental practice are a major driver of the heterogeneity that characterizes the field.
Heterogeneity in synaptic associated protein deficiency caused by SD
Synaptic proteins in SD
The cognitive process relies significantly on the intricate functioning of neurons within the brain.93–95 The neuronal networks formed by extensive synapses contribute to cognition.96–99 The normal neuronal and synaptic function is pivotal in maintaining cognitive abilities and decelerating cognitive decline.96,100 Synapse-associated proteins, which are linked to synaptic transmission, plasticity and neuronal signaling, play crucial roles in cognitive function.101–104 Drawing from observations of synaptic abnormalities in sleep-deprived rats, there has a suggestion that sleep disorders might synergistically interact with AD-related pathological changes. 105 SD potentially exacerbates the onset of synaptic toxicity and induces synaptic dysfunction at the cellular level, which preceded macroscopic cognition-related behavioral changes in experimental animals.106–110
In our analysis, PSD-95, SYN-1, SYP, and GAP43 were chosen as crucial synaptic proteins linked to cognitive function. PSD-95, known for anchoring synaptic proteins, is crucial for synaptic plasticity and long-term enhancement. The decreased PSD-95 expression in sleep-deprived rats implicated the hippocampal damage.35,42,111,112 SYN-1, a phosphoprotein regulating synaptogenesis and neurotransmitter release, exhibits altered expression with SD, impacting synaptic regulation.50,113–115 Synaptophysin (SYP), essential for synaptic transmission, shows decreased expression in sleep-deprived mice, affecting learning and memory abilities. In humans, reduced SYP levels are linked to dementia development.116–120 GAP43, involved in axon development and synaptic plasticity, is phosphorylated during learning processes. Increased GAP43 expression aids memory restoration in mice.121,122 Our results revealed that PSD-95, GAP43, and SYP consistently decreased in post-SD rodents (SMD: −1.10, −2.08, −1.30, respectively). SYN-1 and GAP43 exhibited limited heterogeneity despite significant p-values, likely owing to their distinct cellular localization and functions.
Methods in SD
Our present finding suggested that SD methods affect synapse-associated proteins level in rodents. Notably, the shaking cage and MMPM methods of SD led to significant decreased levels of synaptic proteins. Further analyzing revealed that the MMPM method, being the most frequently used method in the studies analyzed (accounting for 15 articles) had a more substantial negative effect on protein levels (SMD = −1.21, p < 0.43). When analyzing sources of heterogeneity, MMPM were also deemed as a major contributor to heterogeneity. It is reported that the central nervous system precisely regulates REM sleep via synaptic pruning instead of NREM, suggesting potential explanations for the heterogeneity across different SD methods.123,124
The duration of SD, with only a small percentage (6.7%) experiencing prolonged SD in our analysis, is another factor contributing to the observed heterogeneity (I2 = 77%). Considering the complexity of sleep stages and their roles in synaptic plasticity, i.e., memory consolidation, integration and connectivity, further investigations are crucial to comprehensively understand how different SD methods and durations impact synaptic protein changes and their functional activity during distinct sleep stages.7,8,125–127 Addressing these gaps will contribute to understanding the effects of sleep on synaptic function.
Besides, the methods of detecting synapse-associated proteins also contributed to the heterogeneity observed in secondary outcome. Despite the predominant use of WB methods in the included studies, there remained sources of heterogeneity related to the employment of semiquantitative internal controls. Interestingly, our analysis revealed varying levels of synapse-associated protein expression depending on the internal reference used. Among these, β-actin demonstrated higher stability (I2 = 51%) across the 21 studies, indicating it could be as a more reliable protein for semi-quantitative evaluations
Rodents in SD
Our study also highlighted the different age of rodents affected synaptic proteins levels under SD conditions. Young-adult rodents exhibited reduced levels of these proteins following SD, consistent with behavioral assay outcomes. The middle-aged rodents had no significant changes in protein levels, on the contrary, it even showed an increased trend. Similarly, Yuan et al. observed SD enhancing memory and hippocampal representation quality in older animals while impairing these aspects in young mice. 128 Chronic SD could trigger additional regulatory mechanisms, hinting at adaptive traits, a concept echoed in previous research by Raven F, who observed reduced dendritic spine density and impaired synaptic efficacy in sleep-deprived hippocampi. 129 This exploration could unveil microscopic brain changes and related metrics, shedding light on age-specific responses to SD and the difference between young and old rodents in this state. These findings support our study, underscoring age-related disparities in maintaining cognitive function and responding to SD stress.
On top of that, synapse-related protein responses to SD are related to species of rodents. SD induced a significantly decreased synaptic proteins in rats. Moreover, in subgroup analyses, Wistar rats were a significant source of heterogeneity (78% heterogeneity). The rat species showed a notable decrease in synapse-associated proteins (SMD = −1.83), indicating a high sensitivity to sleep loss. Hence, the rat model might be the preferable choice for sleep deprivation study.
While in mice, the Swiss and Balb/c mice displayed significant decreases in protein levels induced by SD, indicating lower heterogeneity (28% and 48%, respectively). However, C57BL/6J mice exhibited inconsistent results across studies. These variations, on the one hand, might stem from species differences, on the other hand, some specific physiological and anatomical features for different species of rodent, such as the rich cerebral vasculature in mice maybe one reason for the resilient synaptic proteins in mice.130–132 The limited studies using mice as a SD model (only 9 articles) could contribute to the heterogeneity observed in secondary outcomes. Researchers using mouse models should interpret results cautiously and design experiments thoughtfully based on these inconsistencies.
Brain regions in SD
Numerous studies link cognitive dysfunction to brain region lesions and reduced synaptic protein levels in affected areas.133–135 Studies, notably by Cakir et al., revealed SD's varied effects on postsynaptic proteins across brain regions. While REM SD lowered hippocampal PSD-95 levels, the cerebral cortex remained unaffected. 136 Our analysis investigated synapse-associated protein expression across rodent brain regions to discern their relationship with SD. In the 21 selected papers of our study, the regions for synaptic protein assessment were the hippocampus (78.1%), prefrontal cortex (21.0%) and whole brain (0.9%) respectively. We observed reduced synaptic protein levels in both the hippocampus and prefrontal cortex, contrary to an increase observed in the whole brain—an unusual deviation from the consensus of decreased synaptic protein levels after SD in rodents. Heterogeneity in synaptic protein expression as a secondary outcome was linked to brain regions, notably pronounced in the hippocampus (I2 = 78%, p < 0.01). And MRI functional imaging study revealed altered activity in several brain regions after SD, particularly reduced fronto-parietal attentional network activity. 19 Our present finding indicated that, the prefrontal cortex and hippocampus, strongly associated with cognition, appeared more susceptible to SD. Besides that, some other brain regions may also contribute to cognitive function. For instance, Ma et al. associated SD-induced cognitive deficits with basal forebrain BDNF and PSD-95 expression. 19 The cerebellum, often overlooked, shows pathological changes in sleep disorders, warranting further investigation of associated synaptic protein alterations. 137 Future studies should delve deeper into the effects of SD on synaptic function across different brain regions, including the influence of left-right brain hemispheres, considering their lateralized alterations post-SD.105,138
Moreover, our detailed and comprehensive stratified four-level model of synaptic proteins presents a compelling challenge to the prevailing hypothesis that the cognitive and biochemical effects of SD are primarily localized to vulnerable cognitive hubs like the HIP and PFC. The robust signal detected only in whole-brain homogenates suggests several possibilities. Firstly, the underlying biochemical effect may be more diffuse throughout the brain than previously assumed. Secondly, the process of dissecting and isolating specific brain regions may introduce additional methodological variability that obscures the true effect. And studies utilizing whole-brain analysis may systematically differ in other unmeasured ways, such as employing more sensitive or globally-acting biochemical assays. Our findings provided a critical data point for guiding the design of future molecular studies in this field.
Geographic disparities of biochemical tests in SD
Mirroring the findings for the primary cognitive outcomes, the analysis of synaptic proteins also revealed significant heterogeneity based on geographic location (Q = 14.19, p = 0.0145). The pattern was again distinct, with studies from Asia showing a strong and consistent reduction in synaptic proteins (SMD = −1.62, 95% CI [−2.34, −0.90]), particularly in Iran (SMD = −6.14). In contrast, studies from South America reported a negligible effect, and those from North America showed a highly variable but directionally opposite effect. The meta-regression analysis confirmed that location and region were significant moderators, with a model including animal strain and location explaining nearly 50% of the between-study heterogeneity (p = 0.0088). Furthermore, stratified four-level model of secondary outcomes powerfully illustrates that “location” is likely a proxy for a constellation of unmeasured, localized methodological factors. It is improbable that geography itself is the causal factor. Instead, this finding likely reflects a distinct “lab culture” or regional research ecosystem, which could encompass the use of a common local animal supplier (leading to a genetically distinct sub-strain), a shared and specific protocol for the SD paradigm or biochemical assay, or other local research practices that are homogenous within that region but different from those elsewhere. This provides strong meta-analytic evidence for the concept of lab-specific idiosyncrasies as a major driver of between-study heterogeneity and a key challenge for research reproducibility.
The fact that this geographic clustering is present in both behavioral and biochemical data strengthens the conclusion that it reflects systematic methodological variance rather than a true biological phenomenon related to geography. The consistency of biochemical assays like western blotting can be influenced by numerous subtle factors, including antibody sources and batches, reagent quality, equipment calibration, and specific protocol variations (e.g., protein extraction methods, transfer times).139,140 It is plausible that these technical details are more consistent within a given geographic region or academic lineage than between them, leading to the observed clustering of results. This highlights a critical need for greater transparency in reporting the fine details of biochemical procedures and for initiatives that promote the standardization of these methods to improve the comparability and reliability of data across different laboratories worldwide.
Evidence of neurological impacts on cognitive impairment and synaptic associated protein deficiency in rodents caused by SD
Our comprehensive cumulative meta-analysis was conducted to systematically track the temporal evolution of evidence regarding the neurological consequences of SD in rodent models. By sequentially incorporating studies over time, this approach provides a dynamic perspective on how scientific consensus forms, stabilizes, or remains contested. The findings reveal a stark divergence in the evidentiary trajectories of the two primary domains investigated. For the primary outcome—cognitive impairment—the cumulative evidence remains unstable, inconclusive, and fraught with profound methodological heterogeneity, failing to converge on a discernible effect. In striking contrast, the secondary outcome—alterations in synapse associated proteins—demonstrates a clear, albeit initially volatile, convergence toward a robust and statistically significant detrimental effect. This discussion will dissect these contrasting narratives, exploring the temporal dynamics of the evidence and deconstructing the sources of heterogeneity that shape our current understanding of SD's impact on the brain.
Unstable and heterogeneous evidence of cognitive impairment caused by SD
The cumulative analysis of SD's effect on cognition illustrates a field defined by instability. The evidentiary journey began with an extreme, counterintuitive positive finding (SMD = 2.61), a likely “winner's curse” phenomenon. As subsequent studies were added, this initial effect was corrected, with the cumulative estimate rapidly regressing toward and oscillating around the null line. At no point did the evidence achieve statistical stability or significance. The final pooled estimate, encompassing 27 effect sizes, confirmed a null effect (SMD = −0.22, 95% CI [−0.73, 0.28]), with persistently high heterogeneity throughout the entire period (final I² = 77%). This profound heterogeneity is the central finding. It demonstrates that pooling these disparate studies is problematic and that the observed variability is driven by a lack of methodological standardization. Key moderators, including the SD paradigm (e.g., the consistent impairment from the shaking cage method versus the anomalous positive effect from gentle touch), duration (short-term SD), the behavioral assay (hippocampus-dependent tasks like MWM and NOR show impairment, while others do not), and the animal model (Wistar rats appear more sensitive than mice), show a clearer detrimental trend, consistent with human data.72,141,142 Aggregately, the evidence suggests the research question “What is the effect of SD on cognition?” is ill-posed. The effect is highly conditional on the specific experimental context, precluding a single definitive answer.
Stable and consensus evidence of synaptic associated protein deficits caused by SD
In stark contrast, the cumulative analysis of synapse-associated proteins tells a story of convergence toward a stable, biologically plausible consensus. This trajectory began with an even more extreme outlier (SMD = 9.06), which was rapidly corrected as more evidence accumulated. A pivotal turning point occurred when the cumulative effect crossed the null line, reversing direction from positive to negative. From this point, the evidence began to strengthen, becoming statistically significant in 2021 and culminating in a final, robust estimate of SMD = −1.11 (95% CI [−1.69, −0.54]). This finding provides strong empirical support for the synaptic homeostasis hypothesis, which posits that sleep is critical for renormalizing synaptic strength.129,143 While heterogeneity remained high (I² = 76%), the underlying biological signal was strong enough to emerge consistently. Here, the heterogeneity provides valuable insight into the effect's boundary conditions. The analysis reveals that SD disparately affects specific proteins (e.g., consistent decreases in PSD-95 and SYP) and is highly dependent on the brain region, with the hippocampus and prefrontal cortex being most vulnerable. Furthermore, technical factors like the choice of internal control for western blots (beta-actin provided more stable results) and paradoxical biological factors, such as an apparent resilience or compensatory response in middle-aged rodents, were identified as key moderators. In conclusion, the convergence of this outcome points to a core biological phenomenon—SD-induced synaptic pathology in key cognitive circuits—where the heterogeneity itself helps to map the specific molecular and anatomical contours of this effect.
Implications of evidential certainty in SD research
Inconclusive evidence for cognitive impairment caused by SD
Our TSA results reveals that the evidence for a detrimental effect of SD on cognitive performance, while suggestive, is not yet conclusive. Although the conventional meta-analysis found a statistically significant impairment, the TSA indicates that this result may be a false positive, as the cumulative evidence is not robust enough to meet the stringent criteria required to confirm a definitive effect. This inconclusiveness is highly significant in the context of sleep research, reflecting the profound complexity of the relationship between sleep and cognition.
The detrimental effects of sleep loss on a wide range of cognitive domains, including attention, executive function, and memory, are well-documented.144,145 However, the magnitude of these effects can be highly variable, influenced by factors such as the duration of sleep deprivation, the nature of the cognitive task, and profound individual differences in resilience to sleep loss.146–148 The high heterogeneity observed in our meta-analysis likely mirrors this real-world complexity. Our TSA result formally quantifies the consequence of this variability: despite numerous studies, the field has not yet accumulated sufficient statistical power (information size) to draw a firm conclusion on the overall effect size. This finding strongly advocates for future, larger-scale, and methodologically harmonized studies to provide the necessary evidence to either confirm or refute this widely accepted, yet statistically inconclusive, hypothesis.
Conclusive evidence for synaptic associated protein deficiency caused by SD
In stark contrast to the cognitive data, our TSA provides conclusive evidence that SD has a robust and detrimental effect on synaptic associated proteins. This finding offers strong support for the central tenets of the synaptic homeostasis hypothesis, which posits that sleep plays a critical role in renormalizing synaptic strength—a process essential for memory consolidation, cognitive function, and synaptic plasticity. 149 According to synaptic homeostasis, wakefulness is associated with a net potentiation of synapses, which is energetically unsustainable and needs to be downscaled during sleep to maintain synaptic efficiency and allow for new learning.
Our results, demonstrating a clear negative impact on key synaptic associated proteins, provide compelling molecular-level evidence for this hypothesis. Sleep deprivation appears to disrupt this essential renormalization process, leading to a pathological state at the synapse. This could manifest as a reduction in crucial postsynaptic density proteins like PSD-95 or presynaptic proteins like synaptophysin, which are vital for synaptic structure and neurotransmission. Such disruptions have been mechanistically linked to the cognitive deficits observed after sleep loss. 150 Furthermore, recent work has shown that even one night of sleep deprivation can alter the brain's proteome, affecting synaptic function and highlighting the brain's vulnerability to acute sleep loss. 151 The conclusiveness of our TSA result is particularly impactful, as it establishes this link with a high degree of statistical certainty. It suggests that the effect of sleep deprivation at the synaptic level is not a subtle or borderline phenomenon but a powerful and reliable biological response, providing a solid foundation upon which the more variable cognitive impairments likely arise. This finding firmly implicates synaptic dysregulation as a core neuropathological consequence of sleep loss.
Systemic perspective on SD: translational implications for AD
Our meta-analysis provides robust, quantitative evidence that SD paradigms in rodents lead to significant reductions in key synaptic proteins (e.g., PSD-95, synaptophysin) and, under specific conditions, measurable cognitive impairment. While these findings are critical, a purely neuro-centric interpretation risks underestimating the true impact of sleep loss. It is essential to frame these results within the broader context of SD as a systemic stressor, particularly given the profound link between sleep disturbance and AD. Our findings on fragmented sleep-induced cognitive decline (particularly in Wistar rats) and synaptic protein loss (e.g., PSD-95, synaptophysin) in the hippocampus and prefrontal cortex hold direct translational relevance for understanding sleep-related cognitive decline in AD patients. AD is characterized by progressive synaptic loss and cognitive impairment, with sleep disturbances—including fragmented sleep—being highly prevalent and predictive of faster disease progression.105,152–154 Our subgroup and following analysis results showed that fragmented sleep (induced by the shaking cage method) robustly impairs rodent cognition, aligning with clinical data that sleep fragmentation in AD correlates with worse memory and increased amyloid-β deposition.155–158
Sleep fragmentation is not merely a late-stage symptom of AD but is recognized as a cardinal and early feature of the disease, potentially preceding overt cognitive decline by years.159–161 Our meta-analysis reveals that sleep fragmentation models in rodents reliably induce synaptic protein loss in the hippocampus and prefrontal cortex. This is not merely an academic finding; it provides a direct molecular underpinning for the clinical observation that sleep disturbances actively contribute to the synaptic decay that drives cognitive decline in AD. This connection suggests a pernicious feedback loop: early AD pathology, potentially in brainstem sleep-regulating nuclei, disrupts sleep architecture, leading to fragmented sleep.162–164 This fragmented sleep, in turn, accelerates AD progression through at least two parallel pathways.165–167
A compelling body of evidence now supports a bidirectional, pernicious feedback loop between sleep disruption and AD pathology.168–174 The cycle may be initiated by early AD-related neurodegeneration within key subcortical sleep-regulating nuclei, such as those in the hypothalamus and brainstem, leading to the sleep fragmentation and loss of restorative slow-wave sleep that are characteristic of the disease, even in its preclinical stages.166,175–177 Our meta-analysis provides a crucial piece of this puzzle by demonstrating that rodent models of fragmented sleep—a paradigm with high clinical fidelity—reliably induce the loss of synaptic proteins in the hippocampus and prefrontal cortex. This offers a direct molecular underpinning for how sleep disruption, in turn, actively accelerates the synaptic decay that drives cognitive decline in AD. This destructive cycle appears to operate through at least two interacting pathways: a direct central pathway and an indirect systemic one.
The direct central pathway involves the immediate consequences of sleep loss on synaptic homeostasis, leading to the reduction in proteins like PSD-95 and synaptophysin that our analysis confirms. However, the indirect, systemic pathway, which has been largely overlooked in the preclinical literature synthesized here, may be equally crucial. SD acts as a potent whole-body stressor, inducing widespread physiological dysregulation that creates a pro-neurodegenerative milieu.178,179 This is particularly relevant as AD patients often present with pre-existing systemic comorbidities. For instance, SD promotes autonomic nervous system (ANS) dysfunction, characterized by a shift toward sympathetic dominance, which can be indexed by reduced heart rate variability (HRV) and elevated resting heart rate (RHR).180–182 This is highly pertinent, as autonomic dysregulation is an established feature of AD that correlates with disease severity. Crucially, recent evidence directly links this peripheral dysautonomia to central network integrity; in older adults at risk for dementia, reduced HRV during slow-wave sleep is associated with compromised functional connectivity within the brain's central autonomic network. 182 Furthermore, large-scale clinical data demonstrate that elevated RHR, a simple marker of sympathetic tone, significantly improves the prediction of dementia risk. 181 Concurrently, SD activates the hypothalamic-pituitary-adrenal (HPA) axis, elevating circulating glucocorticoids. Chronic exposure to high levels of these stress hormones, a state common in both persistent sleep disturbance and AD, is known to be neurotoxic to the hippocampus and to directly impair synaptic plasticity. These neuroendocrine and autonomic disturbances are interwoven with other AD risk factors, such as hypertension, which itself is linked to poor sleep and cognitive decline in the elderly.179,183 These neuroendocrine and autonomic disturbances are interwoven with metabolic dysregulation, circadian rhythm disruption, and systemic inflammation—all of which are recognized as key drivers of AD pathogenesis. 179 However, these systemic factors, rarely measured in preclinical SD studies, may explain why some rodent findings fail to translate to humans.
Crucially, emerging clinical evidence suggests that these systemic factors may represent more than just accelerators of core AD pathology. A landmark study found that the emergence of nighttime behavioral disturbances was more closely associated with the trajectory of cognitive decline than the burden of amyloid plaques or tau tangles. 160 This finding strongly implies that sleep disruption may drive cognitive impairment through mechanisms that are, at least in part, independent of the classical AD histopathological cascade. These mechanisms likely involve the systemic factors discussed—such as chronic inflammation, cerebrovascular compromise secondary to autonomic dysfunction, and metabolic stress—which exert their own direct, detrimental effects on neuronal function and synaptic integrity. In this integrated pathological model, the synaptic protein loss identified in our meta-analysis should be viewed not as an isolated event, but as a critical downstream marker of a complex, multi-system failure initiated and perpetuated by sleep disruption.
Limitations and future perspectives
In this study, our efforts to address study selection bias by integrating databases and pooling a relatively large number of studies. It is also crucial to acknowledge that despite efforts to stratify studies based on various variables such as species, strain, and SD time, these factors might contribute to heterogeneous results. Interactions between these variables could further complicate the interpretation of findings. Utilizing multilevel regression analysis in future research could be beneficial, as it might help identify links between different variables and determine the credibility of evidence concerning mechanisms and interventions related to cognitive impairment in rodent models of sleep disorders. This analytical approach could offer more nuanced insights and enhance the reliability of conclusions drawn from these studies.
While this meta-analysis provides a comprehensive synthesis of the existing literature, its conclusions are inherently constrained by the limitations of the primary studies. Beyond the methodological heterogeneity we have detailed, a more fundamental limitation of the synthesized literature is its predominant focus on neuro-centric outcomes. This narrow scope overlooks the profound systemic physiological dysregulation induced by SD. The vast majority of preclinical studies included herein did not systematically measure or report on a host of modifiable physiological and behavioral factors, including cardiovascular metrics (e.g., RHR, blood pressure, HRV), neuroendocrine responses (e.g., corticosterone levels), or other homeostatic variables (e.g., physical activity, feeding patterns, circadian rhythm markers). These factors are not merely potential confounders but are likely key mechanistic intermediates linking sleep loss to cognitive impairment and synaptic pathology. The failure of the primary literature to capture these systemic effects limits our ability to model the full mechanistic complexity of how sleep loss impacts the brain and restricts the translational applicability of these preclinical findings to the multi-faceted nature of human AD.
However, this limitation illuminates a clear path forward. Our analysis underscores the need for a paradigm shift in preclinical SD research—a move from single-domain readouts to a multi-system assessment approach that embraces the complexity of brain-body interactions. To enhance the translational validity of future studies, we strongly advocate for an integrative experimental design. Future studies should, as a standard, co-assess cognitive performance, central synaptic markers, and a panel of peripheral physiological variables. An ideal experimental design would integrate: (1) validated behavioral paradigms for hippocampus-dependent cognition, such as the MWM and NOR; (2) post-mortem analysis of key synaptic proteins in vulnerable brain regions, including PSD-95 and synaptophysin; and, crucially, (3) continuous, longitudinal monitoring of systemic physiological parameters. This should include telemetric monitoring of cardiovascular variables (HRV, RHR, blood pressure) and activity rhythms, alongside periodic measurement of neuroendocrine stress markers (corticosterone).
This integrative approach reframes systemic variables not as confounders to be controlled, but as critical mechanistic pathways to be investigated. The work of Zagaar et al. provides a compelling blueprint for this model, demonstrating that a systemic, modifiable intervention (regular exercise) could completely prevent SD-induced deficits in cognition and synaptic plasticity by preserving levels of key molecules like BDNF. 183 Adopting such a holistic framework will be essential to unravel the complex interplay between central and peripheral factors, enhance the translational relevance of preclinical findings to human AD, and ultimately identify novel, system-level therapeutic targets to mitigate the devastating impact of sleep disorders on brain health.
Conclusions
Our meta-analysis synthesizes a diverse body of research, confirming the consistent effects of SD on cognitive and synaptic outcomes while also demonstrating that heterogeneity in the results stems primarily from variations in study design. This comprehensive review provides critical, evidence-based guidance for choosing the most appropriate rodent models, behavioral paradigms, and biochemical indicators for experimental studies of sleep deprivation. Furthermore, our work provides critical, evidence-based guidance for standardizing experimental designs. Specifically, for assessing cognitive outcomes, our findings reaffirm that the MWM (for spatial memory) and NOR (for recognition memory) are robust behavioral paradigms. For biochemical endpoints, PSD-95 and synaptophysin are confirmed as sensitive and reliable indicators of synaptic injury in response to SD, particularly within the hippocampus and prefrontal cortex. This will contribute significantly to a deeper understanding of the molecular mechanisms that underlie the intricate relationship between sleep disorders and cognitive function.
For the readership of the Journal of Alzheimer's Disease Reports, the implications of these findings are particularly significant. Sleep disturbance is not merely a symptom but a well-documented risk factor for AD, while synaptic loss is a core pathological hallmark that correlates strongly with cognitive decline. Our findings offer a robust preclinical framework that substantiates this clinical link. Specifically, by demonstrating that paradigms mimicking fragmented sleep—a condition endemic in early AD patients—reliably induce the loss of essential synaptic proteins in AD-vulnerable brain regions, this study provides a powerful molecular basis for the pernicious feedback loop wherein sleep disruption actively accelerates AD-related neurodegeneration.
Crucially, this work posits that the synaptic decay observed in these models should not be viewed as an isolated, neuro-centric event. Rather, it is a central marker of a complex, brain-body pathology. The adverse effects of sleep deprivation on the brain are likely amplified by concurrent systemic physiological dysregulation—including autonomic nervous system imbalance and HPA axis hyperactivity—which are characteristic of both chronic sleep loss and the AD state. To enhance translational validity, we therefore strongly recommend that future preclinical studies integrate their central assessments with measurements of these systemic and modifiable factors, such as HRV, corticosterone levels, and physical activity. By incorporating such systemic measures, future research can capture the full spectrum of SD's effects and uncover novel intervention targets. Ultimately, this meta-analysis underscores the necessity of viewing sleep disturbance as a critical, modifiable, and multi-systemic risk factor in AD. A deeper understanding of this integrated pathology is essential for developing novel therapeutic strategies that target not only central neuronal integrity but also the systemic physiological stability required to preserve cognitive health in an aging population.
Supplemental Material
sj-docx-1-alr-10.1177_25424823251391704 - Supplemental material for Effects of sleep deprivation on cognition and synaptic associated proteins in rodents: A systematic review and meta-analysis
Supplemental material, sj-docx-1-alr-10.1177_25424823251391704 for Effects of sleep deprivation on cognition and synaptic associated proteins in rodents: A systematic review and meta-analysis by Hongqi Wang, Xin Mao, Siyu Liu, Jiacheng Zhang, Enze Li, Yizhi Song, Hui Li, Lirong Chang and Yan Wu in Journal of Alzheimer's Disease Reports
Supplemental Material
sj-pdf-2-alr-10.1177_25424823251391704 - Supplemental material for Effects of sleep deprivation on cognition and synaptic associated proteins in rodents: A systematic review and meta-analysis
Supplemental material, sj-pdf-2-alr-10.1177_25424823251391704 for Effects of sleep deprivation on cognition and synaptic associated proteins in rodents: A systematic review and meta-analysis by Hongqi Wang, Xin Mao, Siyu Liu, Jiacheng Zhang, Enze Li, Yizhi Song, Hui Li, Lirong Chang and Yan Wu in Journal of Alzheimer's Disease Reports
Supplemental Material
sj-xlsx-3-alr-10.1177_25424823251391704 - Supplemental material for Effects of sleep deprivation on cognition and synaptic associated proteins in rodents: A systematic review and meta-analysis
Supplemental material, sj-xlsx-3-alr-10.1177_25424823251391704 for Effects of sleep deprivation on cognition and synaptic associated proteins in rodents: A systematic review and meta-analysis by Hongqi Wang, Xin Mao, Siyu Liu, Jiacheng Zhang, Enze Li, Yizhi Song, Hui Li, Lirong Chang and Yan Wu in Journal of Alzheimer's Disease Reports
Footnotes
Acknowledgements
The authors have no acknowledgments to report.
Ethical considerations
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Author contribution(s)
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by grants from the Scientific Research Key Program of the Beijing Municipal Commission of Education (KZ202110025032), the National Natural Science Foundation of China (81771370, 82071514), the Beijing Natural Science Foundation (7252002) and Beijing University of Agriculture’s internal scientific research special project (KYZX-2024001).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The data that support the findings of this study are available on request from the corresponding author.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
