Artificial Intelligence in Pressure Injury Diagnosis: A Critical Appraisal for Clinical Practice

Abstract

Significance:

Pressure injury is one of the most common health problems among hospitalized patients worldwide, and accurate and timely diagnosis is crucial for its treatment. Research on the application of artificial intelligence in the diagnosis of pressure injury is increasing, but there is currently no comprehensive meta-analysis to evaluate the accuracy of artificial intelligence in diagnosing different pressure injury stages.

Recent Advances:

This study synthesizes evidence on artificial intelligence diagnosis of pressure injury, focusing on evaluating diagnostic performance across different stages using core metrics including sensitivity, specificity, and the area under the summary receiver operating characteristic (SROC) curve.

Critical Issues:

Key findings from 21 included studies (12 contributing 47 eligible datasets) indicate high overall diagnostic accuracy of artificial intelligence for pressure injury, with sensitivity of 0.74 (95% confidence interval [CI]: 0.69–0.78), specificity of 0.93 (95% CI: 0.91–0.94), and area under the SROC curve of 0.92 (95% CI: 0.90–0.94). Moreover, the area under the SROC curve varies across different stages of pressure injury, with area under the curve values for stage 1, stage 2, stage 3, stage 4, unstageable, and deep tissue pressure injury of 0.95 (0.93–0.97), 0.85 (0.82–0.88), 0.88 (0.84–0.90), 0.94 (0.92–0.96), 0.96 (0.94–0.97), and 0.98 (0.96–0.99), respectively.

Future Directions:

Artificial intelligence models based on pressure injury image data show substantial potential for clinical application in pressure injury diagnosis. However, the need for high-quality studies with rigorous reporting and external validation remains critical to address current limitations and advance clinical translation.

Keywords

artificial intelligence pressure injury diagnosis meta-analysis systematic review

SCOPE AND SIGNIFICANCE

The high incidence and low cure rate of pressure injury (PI) impose a huge burden of disease on patients. More timely and accurate artificial intelligence (AI) diagnostic tools for PI can help to improve patient prognosis, reduce health care costs, and guide clinical practice. However, their performance and quality require careful attention. This review systematically examines and evaluates existing AI for PI diagnosis, identifies critical issues and gaps, and provides constructive recommendations for future model development to enhance their clinical applicability and reliability.

Lin Han, PhD

TRANSLATIONAL RELEVANCE

PI is one of the most common health problems among hospitalized patients worldwide, highlighting the critical need for accurate and early diagnosis. This systematic review and meta-analysis reveal flaws in current AI diagnostic models for PI, including small sample sizes, high risk of bias, lack of adequate external validation, and limited clinical applicability. Overcoming these hurdles may lay the foundation for AI-driven diagnostics from research labs to the bedside. Development of reliable, well-validated, and clinically integrated AI tools for PI diagnosis may facilitate early guidance for interventions, enhance the overall quality of care, and improve patients’ quality of life.

CLINICAL RELEVANCE

Early diagnosis of PI based on AI can prompt medical staff to take timely preventive and therapeutic measures, thereby effectively reducing the risk of PI deterioration. In addition, the use of AI models can standardize the diagnostic process and provide consistent diagnostic results, which helps to reduce diagnostic differences among different medical institutions and ensures that all patients have access to high-quality care. The improvement of this diagnostic level can not only optimize the treatment plan but also significantly improve the quality of life and prognosis of patients.

INTRODUCTION

PI, also known as pressure ulcer, which is defined as localized damage to the skin and/or underlying tissue resulting from pressure alone or in combination with shear, is one of the most common health problems among hospitalized patients worldwide.¹ Surveys have shown that the prevalence of PI in hospitalized patients ranges from 1.1% to 12.8%,² and the prevalence in the elderly population is even higher,³ ranging from 3.3% to 35.7%.⁴ Nonetheless, PI has a low healing rate, and the wound healing process is complex and lengthy,⁵ resulting in prolonged hospitalization and ongoing treatment,⁶ with many physical, emotional, and psychological impacts on the patient, causing a lot of suffering. Studies have shown that PI is associated with an increased risk of death in patients.^7,8 In addition, PI poses a significant economic burden to patients and a huge challenge to the health care system.⁹ Given that PI is asymptomatic, rapidly progressive, and difficult to treat in its early stages, its early and accurate diagnosis can help to initiate timely interventions and lessen the burden of disease on patients.

PI is caused by continuous pressure on the skin, which thus leads to tissue damage. If not detected and treated in time, microenvironmental changes such as ischemia and hypoxia in local tissues will accelerate tissue necrosis and eventually lead to deep tissue injury.¹⁰ Therefore, early and accurate diagnosis of PI can effectively promote wound healing.¹¹ The diagnosis of PI requires a comprehensive assessment that takes into account a variety of complex factors.¹² Health care professionals usually diagnose PI by visual inspection and palpation, which include skin color, wound site, PI staging, tissue type, and the presence of underlying infection,¹³ and may also use probes or other tools to measure wound depth.¹⁴ However, the current diagnostic approach is highly subjective and may result in diagnostic errors due to inexperience, limited information, and observer bias.^15,16 In addition, the patient’s skin color, age, and health status may interfere with visual judgment.¹⁷ Therefore, more timely and accurate diagnostic tools for PI are needed to improve patient prognosis and reduce health care costs.

With AI now widely applied in medicine, its rapid development offers a potential solution to the high subjectivity and low efficiency of current PI diagnostic methods, as well as to the difficulty of detecting PI at an early stage. Jiang et al. synthesized findings from nine studies and revealed that AI facilitates monitoring the progression and healing trajectory of PI through wound images.¹⁸ However, Pelin K et al. assessed the performance of AI in staging PI using real patient images and compared it with manual staging by expert nurses. The results indicated that expert nurses demonstrated superior accuracy and specificity across most PI stages.¹⁹ In addition, due to different sample sizes and fewer large-sample studies on AI diagnosis of PI, most of the image sources are limited to public databases or single organizations. Therefore, despite the non-negligible advantages and room for development of AI methods over traditional diagnostic methods, there is not yet a high degree of consensus on whether they can be applied to actual clinical and health care work.

Recently, Qianwen Chao et al. published a systematic review and meta-analysis evaluating methods for staging PI.²⁰ Although they analyzed 8 studies involving 24 models, they did not report the accuracy of AI for diagnosing different PI stages. The objective of this study is to provide a comprehensive overview of the existing knowledge regarding the application of AI algorithms in the diagnosis of PI. Furthermore, the study aims to assess whether there are variations in accuracy among different PI stages.

MATERIALS AND METHODS

This systematic review was prospectively registered with PROSPERO (CRD420251029022). Our study was prepared by using guidelines from the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies. The comprehensive overview of this study is illustrated in Fig. 1.

Figure 1.

The summary graphic illustration of this study.

Terminology definitions

Convolutional neural networks (CNNs) are the most prominent architectures used for image classification tasks. Moreover, CNN architectures offer superior performance in medical image classification. You Only Look Once (YOLO) is the first model to use a one-step object detection approach that simultaneously detects bounding boxes and object classes. Compared with conventional two-step object detectors, YOLO models have shown excellent performance with short object detection time. Area under the curve (AUC): the numerical index represents the area under the summary receiver operating characteristic (SROC) curve. Sensitivity (SEN): the ratio of true positive (TP) to the total number of actual positive samples. Specificity (SPE): the ratio of true negative (TN) to the total number of actual negative samples.

AI functions as a second pair of eyes: trained on thousands of wound images, it can precisely identify specific patterns from a vast number of images and skillfully grasp various wound features. The work of AI in PI diagnosis is shown in Fig. 2. The application of AI in PI diagnosis is illustrated in a video provided in the Supplementary Video.

Figure 2.

The work of AI in PI diagnosis. AI, artificial intelligence; PI, pressure injury.

Search strategy

A search was performed to identify studies that developed and/or validated an AI algorithm for the purposes of PI diagnosis: a comprehensive search of the literature was performed using eight electronic databases. PubMed, the Cochrane Library, Web of Science, Embase, China Knowledge Resource Integrated Database, Wanfang Database, China Biology Medicine, and Weipu Database were searched from the oldest publications available in each of the databases through May 31, 2025. The search strategies were performed through a combination of Mesh terms and free words. The following Mesh terms and free words were used: “pressure ulcer,” “pressure injury,” “pressure sore,” “pressure damage,” “decubitus ulcer,” “bed ulcer,” “bed sore,” “bedsore,” “skin injury,” “artificial intelligence,” “AI,” “deep learning,” “machine learning,” “computer assisted,” “image analysis,” “image software,” “computer diagnosis,” “diagnostic algorithm.” The precise search strategies for databases are shown in the Supplementary Appendix. Additionally, reference lists included in the identified articles were manually searched to identify additional relevant publications. Gray literature was searched as well. Some authors were contacted via e-mail to obtain further details or help to resolve any uncertainties. The study did not require the approval of an Ethics Committee since it is based entirely on previously published studies.

Study selection

After the removal of duplicate studies, two investigators independently assessed the eligible publications by screening titles and abstracts, using the inclusion and exclusion criteria. Full-text articles were retrieved when at least one reviewer decided that an abstract was eligible for inclusion. Each publication was assessed independently by both investigators for final study inclusion. Disagreements were resolved by discussion.

The criteria for inclusion of a study in the systematic review were as follows: primary research studies that developed and/or validated an AI algorithm for PI diagnosis or classification in PI images, published in English or Chinese, and involving human subjects. Studies with small sample sizes and limited scope were included to ensure the completeness of evidence and the comprehensiveness of research coverage. The exclusion criteria were as follows: conference abstracts, letters to the editor, review articles, studies with poor reporting quality or insufficient validation, and studies containing incomplete data. We excluded duplicates by using Endnote X9. We did not place any limits on the target population, study setting, or comparator group.

Data extraction

Data were extracted from the included studies by two independent investigators. Titles and abstracts were screened before full-text screening. Data were extracted by using a predefined data extraction sheet. A list of excluded studies, including the reason for exclusion, was recorded in a Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram. Any further papers identified through reference lists underwent the same process of screening and data extraction in duplicate.

The following information was recorded: (1) General information of the study: the name of the first author, country, publication year, sample size, sample source, staging criteria for PI, model basis; accuracy, precision, recall, specificity, and F1 index; (2) Information for systematic review and meta-analysis: the number of TP cases, the number of false positive (FP) cases, the number of TN cases, and the number of false negative (FN) cases in each stage. If a study provides multiple TP, FP, TN, and FN columns for the same or different AI algorithms, we will assume that they are independent of each other. If the retrieved literature does not provide the above original data, the data will be transformed through SEN and SPE. Taking stage 1 PI as an example, the specific calculation proceeds as follows: the four contingency-table counts (TP, TN, FP, FN) can be derived from four source variables: SEN, SPE, the reference-standard-positive count for stage 1 PI (TP + FN), and the total study sample size. Furthermore, the TP, FP, TN, and FN of different models in diagnosing PI were calculated, respectively, through TP, FP, TN, and FN at different stages. For example, the TP of each model is equal to the sum of the TP values of different PI stages.

Quality appraisal

The quality of the included studies was evaluated independently by two investigators using the Improved QUADAS-2 tool. In the systematic review of dental caries imaging diagnosis based on deep learning published by Mohammad Rahimi et al.,²¹ the researchers improved the QUADAS-2 tool in response to the characteristics of the deep learning model studies. The improved QUADAS-2 tool covers four areas. It addresses issues such as data imbalance and insufficient generalization caused by limited dataset diversity; emphasizes data exclusion bias; considers the independence of test set data in validity evaluation; and further accounts for the reproducibility of research results, the robustness of diagnostic models, and measurement errors that may occur during annotation.

Two investigators independently performed quality assessments. If there were discrepancies between the two investigators’ evaluations, a third party adjudicated the results. The quality assessment and graphic production of this study were completed using RevMan 5.3 software. As shown in Fig. 3, the greener the color of each criterion, the lower the risk of bias. To further enhance reporting transparency, we assessed the included studies against the TRIPOD-AL reporting standards. A detailed evaluation checklist is provided in Supplementary Table SA1a.

Figure 3.

Summary chart of the results of the risk of bias and clinical applicability evaluation of the included studies.

Data analysis

All statistical analyses were performed using Stata 16.0. First, the SROC curve was constructed to examine threshold effects, with a “shoulder-arm” configuration considered indicative of such an effect. Non-threshold heterogeneity was then assessed with the Cochran Q-test for the diagnostic odds ratio (DOR), using a threshold of p < 0.1. Pooled SEN, SPE, DOR, positive likelihood ratio (+LR), negative likelihood ratio (–LR), and their 95% confidence intervals (95% CI) were calculated, and the area under the SROC curve (AUC) was determined. Publication bias was evaluated with Deeks’ funnel plot.

The Bayesian multilevel random-effects model was constructed using R software version 4.5.1 (brms package). Sources of heterogeneity identified by meta-regression were incorporated into the model structure as grouping variables, and the variance characterized by high I² values was systematically decomposed through random-effect terms. Meanwhile, 95% CI and 95% prediction intervals (95% PI) were reported to comprehensively quantify and provide a quantitative reference for the potential fluctuation range of future research results.

RESULTS

Study process

The initial search retrieved 1,298 articles, of which 835 were duplicates. After screening titles and abstracts, 141 articles were selected on the basis of inclusion criteria. These publications were further evaluated in detail. Ultimately, a total of 21 studies (2 in Chinese and 19 in English) met the inclusion criteria and were utilized for the meta-analysis (Fig. 4).

Figure 4.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram for the study selection process.

Characteristics of the included studies

The characteristics of the 21 studies are summarized in Table 1. The studies included 19 English and 2 Chinese literature studies with publication years from 2022 to 2025. Of the 21 studies included, 14 constructed models based on CNN, and 7 constructed models based on YOLO. The accuracy, precision, recall, sensitivity, specificity, and F1 index of the AI models in the 21 included studies were summarized in Table 2. The top three models in terms of accuracy reported in the included studies are pressure ulcer cluster vision transformer (PUC-ViT), SemiViT, and DenseNet121, with accuracies of 0.9776, 0.9399, and 0.9371, respectively. PUC-ViT and SemiViT enhance the performance of the proposed model by preprocessing image data based on previously acquired images, dividing the original images into smaller patches, and embedding them using a vision transformer architecture commonly used in computer vision. DenseNet121 is a common deep CNN architecture.

Table 1.

Characteristics of included studies

First Author	Year	Country	Sample Source	Total Number of Samples	Staging Criteria for PI	Model Basis
Jenny W²²	2024	China	Changshu First People’s Hospital	142 (test set)	NPUAP	YOLOv8 nano(YOLOv8n), YOLOv8 small(YOLOv8s), YOLOv8 medium(YOLOv8m), YOLOv8 large(YOLOv8l), YOLOv8 extra large(YOLOv8x)
Jian C²³	2024	China	Changshu First People’s Hospital	39 (test set)	NPUAP	CNN (DenseNet121, EfficientNet, ResNet101, ResNet50)
Ay B²⁴	2022	Turkey	Pressure Injury Images Dataset (PIID)	217 (test set)	NPUAP	CNN (DenseNet121, InceptionV3, MobilNetV2, ResNet50, ResNet152, VGG16)
Lau CH²⁵	2022	China	Medetec Wound Database	144 (validation set)	NPUAP	YOLOv4
Fergus P²⁶	2023	United Kingdom	Medetec Wound Database	216 (test set)	NPUAP	Faster Region-based Convolutional Neural Network (Faster R-CNN)
Kim J²⁷	2023	Korea	Severance Hospital and Gangnam Severance Hospital	484 (test set)	NPUAP	CNN (SE-ResNext101)
Seo S²⁸	2023	Korea	SMG-SNU Boramae Medical Center	281 (test set)	NPUAP	CNN (VGG16, ResNet50, ResNet152, DenseNet201, EfficientNet-B4)
Swerdlow M²⁹	2023	USA	eKare Inc.	121 (test set)	NPUAP	Mask-R-CNN
Chang Y³⁰	2024	Korea	Medetec pressure ulcer dataset, Roboflow, and Google	324 (test set)	NPUAP	YOLOv8 nano(YOLOv8n), YOLOv8 small(YOLOv8s), YOLOv8 medium(YOLOv8m), YOLOv8 large(YOLOv8l), YOLOv8 extra large(YOLOv8x)
Ikuta K³¹	2024	Japan	Tottori University Hospital	108 (test set)	DESIGN-R	Categorical classification model Binary classification model Combined classification model
Wang D³²	2024	China	Wuhan University Renmin Hospital, Union Hospital Tongji Medical College Huazhong University of Science and Technology, Zhongshan Hospital Xiamen University, PIID	303 (validation set)	NPUAP	CNN (ResNeXt50+wFPN, ResNeXt50, EfficientNetV2-s, DenseNet161, Swin-Transformer-tiny, ResNeXt-s)
Zalluhoğlu C³³	2024	Turkey	Ankara Pursaklar State Hospital	295 (test set)	NPUAP	CNN (AlexNet, DenseNet169, EfficientNet-B0, EfficientNet-B1, EfficientNet-B2, EfficientNet-B3, EfficientNet-B4, EfficientNet-B5, GoogleNet, MobileNetv2, ResNet50, Vgg16)
Liu X³⁴	2024	China	Northern Jiangsu People’s Hospital	200 (test set)	NPUAP	YOLOv8 nano(YOLOv8n), YOLOv8 small(YOLOv8s), YOLOv8 medium(YOLOv8m), YOLOv8 large(YOLOv8l), YOLOv8 extra large(YOLOv8x)
Liu H³⁵	2024	China	Jiulongpo District People’s Hospital	100 (test set)	NPUAP	Mask-R-CNN
Gui Z³⁶	2024	China	Ningbo No. 2 Hospital	277 (validation set)	NPUAP	Inception (Xception, Inception-v4, SE-Inception)
Chen CC³⁷	2024	China	Hospital	280 (test set)	NPUAP	YOLOv7
Aldughayfiq B¹³	2023	Saudi Arabia	Google image	Unclear	NPUAP	YOLOv5
Cho YB³⁸	2025	Korea	Three hospitals in City D, South Korea	1,685 (test set)	NPUAP	CNN Vision transformer (ViT, ViTMixup, SemiViT, PUC-ViT)
Tusar MH³⁹	2025	USA	From physicians, Google search, books, and databases such as Medetec and Kaggle	720 (total set)	NPUAP	YOLOv8 nano(YOLOv8n), YOLOv8 small(YOLOv8s), YOLOv8 medium(YOLOv8m), YOLOv8 large(YOLOv8l), YOLOv8 extra large(YOLOv8x)
Lei C⁴⁰	2025	China	A tertiary hospital in Chengdu	7,677 (total set)	NPUAP	CNN (AlexNet, VGGNet16, ResNet18, DenseNet121)
Huang Y-S⁴¹	2025	China	Changhua Christian Hospital	218 (validation set)	NPUAP	SE-Swin transformer

CNN, Convolutional Neural Network; NPUAP, National Pressure Ulcer Advisory Panel; PI, pressure injury.

Table 2.

Characteristics of AI models included studies

First Author	Model		Accuracy	Precision	Recall	Specificity	F1 Index
Jenny W²²	YOLO	YOLOv8l	0.9318	—	0.7652	0.9629
Jian C²³	CNN	ResNet101	0.605	0.482	0.446	—	0.429
		EfficientNet	0.816	0.793	0.790	—	0.780
		ResNet50	0.816	0.757	0.809	—	0.763
		DenseNet121	0.895	0.929	0.889	—	0.904
Ay B²⁴	CNN	DenseNet121	0.6728	—	—	—	—
		InceptionV3	0.6313	—	—	—	—
		MobileNetV1	—	—	—	—	—
		MobileNetV2	0.5484	—	—	—	—
		ResNet152	0.7742	—	—	—	—
		ResNet50	0.7645	—	—	—	—
		VGG16	0.7189	—	—	—	—
Lau CH²⁵	YOLO	YOLOv4	0.632	—	—	—	—
Fergus P²⁶	Faster R-CNN	IOU@.50CS@.30(UUI)	—	0.4516	0.3900	—	0.3889
		IOU@.50CS@.50(UUI)	—	0.5341	0.3639	—	0.4170
		IOU@.50CS@.75(UUI)	—	0.5801	0.3099	—	0.3947
		IOU@.50CS@.90(UUI)	—	0.7091	0.2824	—	0.4000
		IOU@.50CS@.30(UCI)	—	0.4203	0.6675	—	0.4866
		IOU@.50CS@.50(UCI)	—	0.5098	0.6771	—	0.5584
		IOU@.50CS@.75(UCI)	—	0.6796	0.6997	—	0.6786
		IOU@.50CS@.90(UCI)	—	0.7762	0.6410	—	0.6956
Kim J²⁷	CNN	SE-ResNext101			0.717	0.943	0.715
Seo S²⁸	CNN	VGG16	0.8374	0.7838	0.7422	—	0.7552
		ResNet50	0.8293	0.7969	0.7936	—	0.7871
		ResNet152	0.8415	0.7856	0.8196	—	0.7951
		DenseNet201	0.8211	0.7721	0.7889	—	0.7791
		EfficientNet-B4	0.9146	0.9123	0.8819	—	0.8941
Swerdlow M²⁹	CNN	Mask-R-CNN	0.926			—
Chang Y³⁰	YOLO	YOLOv8n	0.810	0.910	0.870	—	0.889
		YOLOv8s	0.814	0.922	0.845	—	0.882
		YOLOv8m	0.846	0.897	0.891	—	0.894
		YOLOv8l	0.789	0.895	0.861	—	0.878
		YOLOv8x	0.796	0.905	0.859	—	0.881
Ikuta K³¹	CNN	CCM	0.750	0.821	0.750	—	0.758
		BCM	0.839	0.847	0.839	—	0.838
		ComCM	0.868	0.875	0.868	—	0.868
Wang D³²	CNN	ResNeXt50+wFPN	0.815	0.808	0.816	—	0.811
		DenseNet161	0.795	0.784	0.790	—	0.782
		EfficientNetV2-s	0.788	0.781	0.775	—	0.776
		ResNeXt50	0.781	0.771	0.777	—	0.773
		ResNeXt-s	0.762	0.751	0.763	—	0.751
		Swin-Transformer-tiny	0.755	0.761	0.747	—	0.753
Zalluhoğlu C³³	CNN	AlexNet	0.76	0.76	0.74	—	0.74
		DenseNet169	0.79	0.8	0.74	—	0.76
		EfficientNet-B0	0.78	0.76	0.75	—	0.75
		EfficientNet-B1	0.74	0.71	0.73	—	0.72
		EfficientNet-B2	0.74	0.71	0.73	—	0.72
		EfficientNet-B3	0.75	0.73	0.73	—	0.73
		EfficientNet-B4	0.66	0.61	0.61	—	0.60
		EfficientNet-B5	0.70	0.66	0.71	—	0.68
		GoogleNet	0.73	0.72	0.68	—	0.69
		MobileNetv2	0.73	0.72	0.68	—	0.69
		ResNet50	0.74	0.73	0.69	—	0.71
		Vgg16	0.79	0.78	0.73	—	0.77
Liu X³⁴	YOLO	YOLOv8n	—	0.252	0.230	—	—
		YOLOv8s	—	0.932	0.652	—	—
		YOLOv8m	—	0.876	0.654	—	—
		YOLOv8l	—	0.958	0.630	—	—
		YOLOv8x	—	0.958	0.612	—	—
Liu H³⁵	CNN	Mask-R-CNN	—	0.6029	—	—
Gui Z³⁶	CNN	Xception	0.90	—	0.88	—	0.89
		Inception-v4	0.92	—	0.91	—	0.91
		SE-Inception	0.93	—	0.92	—	0.93
Chen CC³⁷	YOLO	YOLOv7	—	—	0.9238	—
Aldughayfiq B¹³	YOLO	YOLOv5	—	0.781	0.685	—
Cho YB³⁸	CNN	CNN	0.9134	0.9043	0.7798	—	0.8733
		ViT (Vision Transformer)	0.9199	0.9019	0.8624	—	0.8740
		ViTMixup	0.9258	0.9400	0.8624	—	0.9203
		SemiViT	0.9399	0.9035	0.9450	—	0.9371
		PUC-ViT	0.9776	0.9602	0.9083	—	0.9546
Tusar MH³⁹	YOLO	YOLOv8n	0.76	—	—	—	—
		YOLOv8s	0.80	—	—	—	—
		YOLOv8m	0.79	—	—	—	—
		YOLOv8l	0.79	—	—	—	—
		YOLOv8x	0.79	—	—	—	—
Lei C⁴⁰	CNN	AlexNet	0.8744	0.9748	0.8562	—	0.9471
		VGGNet16	0.8242	0.9640	0.9590	—	0.9677
		ResNet18	0.9242	0.9843	0.9164	—	0.9826
		DenseNet121	0.9371	0.9872	0.9153	—	0.9709
Huang Y-S⁴¹	CNN	SE-Swin transformer	0.8710	0.866	0.8570	—	0.8590

BCM, binary classification model; CCM, categorical classification model; ComCM, combined classification model; UCI, using cropped images; UUI, using uncropped images; PUC-ViT, pressure ulcer cluster vision transformer; YOLO, You Only Look Once.

This review systematically evaluated 21 included studies, from which TP, FP, TN, and FN data could be extracted for 12 studies—all of which met the criteria for meta-analysis. Therefore, meta-analysis was performed on the 12 studies containing 47 datasets. The summary of the datasets included in the meta-analysis is shown in Table 3.

Table 3.

The datasets included in the meta-analysis

First Author	Model	Stage 1				Stage 2				Stage 3				Stage 4				DTPI				Unstageable
First Author	Model	TP	FP	FN	TN	TP	FP	FN	TN	TP	FP	FN	TN	TP	FP	FN	TN	TP	FP	FN	TN	TP	FP	FN	TN
Jenny W^b	YOLOv81	25	4	2	111	23	3	10	106	26	5	8	103	16	6	6	114	10	5	2	125	9	5	5	123
Jian C^a	desnet121	4	2	0	28	7	0	1	26									6	0	1	27
Ay B^b	DenseNet121	35	15	11	156	38	15	24	140	26	11	29	151	43	18	11	145
Ay B^b	InceptionV3	32	5	14	166	32	16	30	139	29	40	26	122	32	17	22	146
Ay B^b	MobileNetV1	29	11	17	160	34	21	28	134	26	27	29	135	23	13	31	150
Ay B^b	MobileNetV2	23	11	23	160	32	24	30	131	24	26	31	136	33	18	21	145
Ay B^b	ResNet152	41	6	5	165	51	11	11	144	33	16	22	146	41	14	13	149
Ay B^b	ResNet50	38	6	8	165	49	16	13	139	28	11	27	151	45	17	9	146
Ay B^b	VGG16	33	8	13	163	48	23	14	132	29	9	26	153	44	17	10	146
Lau C H^b	YOLOv4	22	6	8	108	10	5	17	112	23	17	7	97	21	8	9	106					15	0	12	117
Lau C H^a	YOLOv4									1	0	2	7	5	0	0	5					2	1	0	7
Fergus P^b	IOU@.50CS@.30 (UUI)	2	8	3	203	34	61	59	62	4	13	7	192					10	3	20	183	33	15	44	124
Fergus P^b	IOU@.50CS@.50 (UUI)	2	4	3	207	27	39	66	84	3	6	8	199					11	2	19	184	31	12	46	127
Fergus P^b	IOU@.50CS@.75 (UUI)	2	2	3	209	22	25	71	98	3	6	8	199					9	1	21	185	28	9	49	130
Fergus P^b	IOU@.50CS@.90 (UUI)	2	1	3	210	14	17	79	106	3	3	8	202					8	1	22	185	26	4	51	135
Fergus P^b	IOU@.50CS@.30 (UCI)	4	20	1	191	66	105	27	18	5	21	6	184					20	8	10	178	57	24	20	115
Fergus P^b	IOU@.50CS@.50 (UCI)	4	9	1	202	62	62	31	61	6	17	5	188					19	7	11	179	57	19	20	120
Fergus P^b	IOU@.50CS@.75 (UCI)	3	5	2	206	62	32	31	91	8	6	3	199					21	1	9	185	62	12	15	127
Fergus P^b	IOU@.50CS@.90 (UCI)	3	3	2	208	59	17	34	106	6	3	5	202					19	0	11	186	61	4	16	135
Kim J^b	SE-ResNext101	38	25	23	398	132	21	59	272	3	14	15	452	32	14	1	437	25	17	21	421	42	31	18	393
Seo S^a	VGG16					25	12	10	234	39	10	9	223	49	3	6	223	5	3	9	264	88	12	6	175
Seo S^a	ResNet50					27	18	8	228	27	5	21	228	49	5	6	221	11	4	3	263	90	10	4	177
Seo S^a	ResNet152					23	10	12	236	31	9	17	224	50	5	5	221	13	8	1	259	90	7	4	180
Seo S^a	DenseNet201					25	9	10	237	37	13	11	220	48	5	7	221	10	7	4	260	82	10	12	177
Seo S^a	EfficientNet-B4					25	5	10	241	43	13	5	220	53	1	2	225	12	0	2	267	92	2	2	185
Swerdlow M^a	Mask-R-CNN	8	0	3	110	36	4	0	81	34	3	4	80	34	2	2	83
Chang Yujee^a	YOLOv8m	46	2	8	268	43	9	8	264	43	7	8	266	44	3	2	275	49	6	5	264	50	7	6	261
Ikuta K^a	CCM	57	1	51	431	62	36	46	396	87	1	21	431	101	0	7	432
Ikuta K^a	ComCM	107	0	1	432	108	8	0	424	100	2	8	430	106	1	2	431
Wang D^a	ResNeXt50+wFPN	67	26	10	200	82	39	22	160	48	52	15	188	49	51	9	194
Wang D^a	DenseNet161	65	22	12	204	77	44	27	155	35	52	28	188	53	100	5	145
Wang D^a	EfficientNetV2-s	68	22	9	204	87	75	17	124	36	49	27	191	47	54	11	191
Wang D^a	ResNeXt50	71	28	6	198	82	38	22	161	38	45	25	195	35	83	23	162
Wang D^a	ResNeXt-s	65	24	12	202	82	44	22	155	43	63	20	177	46	74	12	171
Wang D^a	Swin-Transformer-tiny	58	34	19	192	76	59	28	140	46	71	17	169	41	34	17	211
Zalluhoğlu C^b	AlexNet	39	8	6	242	33	10	27	225	126	38	24	107	27	14	13	241
Zalluhoğlu C^b	DenseNet169	37	6	8	244	30	9	30	226	135	38	15	107	30	9	10	246
Zalluhoğlu C^b	EfficientNet-B0	36	6	9	244	34	19	26	216	128	28	23	117	32	14	8	241
Zalluhoğlu C^b	EfficientNet-B1	35	13	10	237	33	21	27	214	119	30	32	115	31	14	9	241
Zalluhoğlu C^b	EfficientNet-B2	36	12	9	238	34	24	26	211	119	30	32	115	30	13	10	242
Zalluhoğlu C^b	EfficientNet-B3	36	14	9	236	31	19	29	216	123	31	27	114	31	10	9	245
Zalluhoğlu C^b	EfficientNet-B4	30	18	15	232	15	14	45	221	123	48	27	97	28	20	12	235
Zalluhoğlu C^b	EfficientNet-B5	38	17	7	233	28	23	32	212	107	27	44	118	33	23	7	232
Zalluhoğlu C^b	GoogleNet	33	10	12	240	24	12	36	223	129	45	21	100	29	13	11	242
Zalluhoğlu C^b	MobileNetv2	33	12	12	238	29	15	31	220	128	43	23	103	26	10	14	245
Zalluhoğlu C^b	ResNet50	31	9	14	241	36	24	24	211	125	35	26	110	26	8	14	247
Zalluhoğlu C^b	Vgg16	36	3	9	247	31	16	29	219	132	37	18	108	29	10	11	245

Directly reported datasets.

Derived datasets.

DTPI, deep tissue pressure injury; FN, false negative; FP, false positive; TN, true negative; TP, true positive.

Heterogeneity test

To ensure the data quality of the included studies, the included studies were assessed individually using the Improved QUADAS-2 tool. The results showed that the overall study data quality was high. By analyzing the threshold effects of all 47 datasets, the SROC curves constructed for SEN and SPE did not show a “shoulder-arm” distribution, suggesting that there was no threshold effect among the included studies (Fig. 5).

Figure 5.

SROC curves of AI in PI diagnosis. SROC, summary receiver operating characteristic.

The data included in the study were analyzed for non-threshold effect heterogeneity, and the Cochran Q-test for DOR resulted in p < 0.1 suggesting the presence of non-threshold effect heterogeneity.

Meta-analysis results

According to Fig. 6, the pooled SEN was 0.74 (95% CI: 0.69–0.78). The Q-test (p < 0.05) indicated that the pooled SEN results of the included studies were statistically significant, and I² = 96.51% (95% CI: 95.96–97.06) >50%, suggesting significant heterogeneity in SEN. The pooled SPE was 0.93 (95% CI: 0.91–0.94), along with p < 0.05 for the Q-test, indicating that the pooled SPE results of the included studies were statistically significant, and I² = 98.03% (95% CI: 97.77–98.29) >50%, suggesting that the heterogeneity of SPE was higher. The pooled DOR was 35.72 (95% CI: 24.12–52.88), and the pooled diagnostic score was 3.58 (95% CI: 3.18–3.97); the pooled +LR was 10.14 (95% CI: 7.91–13.00), and the pooled −LR was 0.28 (95% CI: 0.24–0.34). The SROC curve (Fig. 5) was plotted, and the AUC was 0.92 (95% CI: 0.90–0.94), suggesting that the accuracy of AI in identifying the staging of PI was high. Figure 7 presents the likelihood ratio plot of AI for PI diagnosis.

Figure 6.

Forest plot of sensitivity and specificity of AI in PI diagnosis.

Figure 7.

Likelihood ratio plotting of AI in PI diagnosis.

Results of meta-analysis for AI diagnosis of different stages of PI

In this study, a meta-analysis was specifically conducted for each stage of PI diagnosed by AI, and the accuracy of each stage was summarized (Table 4). This study analyzed the data using the revised National Pressure Ulcer Advisory Panel (NPUAP) definitions and staging of PI.⁴² The results showed that the best-performing stage for AI diagnosis was deep tissue pressure injury (DTPI), with an AUC of 0.98 (95% CI: 0.96–0.99). Figure 8 shows the Receiver Operating Characteristic (ROC) for each stage of AI diagnosis.

Figure 8.

The ROC for each staging of AI in PI diagnosis: (A) stage 1, (B) stage 2, (C) stage 3, (D) stage 4, (E) unstageable, and (F) DTPI. DTPI, deep tissue pressure injury.

Table 4.

The accuracy of the PI stages

PI Stages	Data Sets	SEN (95% CI)	SPE (95% CI)	+LR (95% CI)	−LR (95% CI)	DOR (95% CI)	AUC (95% CI)
Stage 1	41	0.78 (0.73–0.82)	0.96 (0.95–0.97)	20.84 (15.64–27.77)	0.23 (0.19–0.28)	91.05 (62.66–132.30)	0.95 (0.93–0.97)
Stage 2	46	0.64 (0.57–0.70)	0.89 (0.86–0.92)	6.04 (4.38–8.34)	0.40 (0.33–0.49)	15.04 (9.45–23.94)	0.85 (0.82–0.88)
Stage 3	46	0.70 (0.64–0.75)	0.91 (0.87–0.93)	7.56 (5.54–10.31)	0.34 (0.28–0.40)	22.55 (15.22–33.39)	0.88 (0.84–0.90)
Stage 4	38	0.82 (0.77–0.86)	0.95 (0.92–0.97)	15.78 (10.29–24.21)	0.19 (0.15–0.25)	81.38 (43.89–150.91)	0.94 (0.92–0.96)
Unstageable	18	0.78 (0.66–0.87)	0.95 (0.92–0.96)	14.69 (9.80–22.02)	0.23 (0.14–0.38)	64.02 (28.44–144.10)	0.96 (0.94–0.97)
DTPI	17	0.64 (0.51–0.75)	0.98 (0.98–0.99)	41.05 (26.25–64.17)	0.37 (0.26–0.51)	112.12 (61.01–206.05)	0.98 (0.96–0.99)

+LR, positive likelihood ratio; −LR, negative likelihood ratio; AUC, area under the curve; CI, confidence interval; DOR, diagnostic odds ratios; PI, pressure injury; SEN, sensitivity; SPE, specificity.

Results of meta-regression and Bayesian multilevel random-effects model

To explore the potential sources of heterogeneity, we conducted meta-regression using Stata 16.0. We included the sources of images, sample size, diagnostic staging criteria, algorithm choice, dataset source, device type, and patient setting in the meta-regression. The meta-regression analysis indicated that the sources of images, sample size, diagnostic staging criteria, algorithm choice, dataset source, and patient setting might be the sources of heterogeneity (Fig. 9).

Figure 9.

Meta-regression results of AI in PI diagnosis.

In the Bayesian multilevel random-effects model analyses, stratification was conducted across six dimensions based on the meta-regression findings: (1) whether the PI images were sourced from public databases, (2) whether the sample size exceeded 200, (3) whether the PI staging criteria were derived from the NPUAP, (4) whether the CNN algorithm was used, (5) whether the dataset source was multicenter, and (6) whether the patient setting was in-hospital. The results, including SEN and SPE with their respective 95% CI and 95% PI, as well as corresponding heterogeneity indices (I²), are presented in Table 5.

Table 5.

Results of Bayesian multilevel random-effects model analyses for artificial intelligence in pressure injury diagnosis

Subgroup	SEN				SPE
Subgroup	Pooled	95% CI	95% PI	I ²	Pooled	95% CI	95% PI	I ²
Images from public databases
Yes	0.550	0.232–0.805	0.026–0.982	0.080	0.668	0.221–0.930	0.036–0.991	0.047
No	0.657	0.320–0.865	0.042–0.988		0.702	0.250–0.940	0.043–0.993
Sample size
>200	0.587	0.256–0.833	0.031–0.984	0.024	0.656	0.189–0.932	0.034–0.991	0.083
≤200	0.603	0.270–0.845	0.034–0.985		0.738	0.261–0.955	0.046–0.994
Model base is CNN
Yes	0.545	0.223–0.805	0.025–0.981	0.083	0.644	0.143–0.935	0.026–0.991	0.313
No	0.660	0.315–0.868	0.041–0.988		0.893	0.438–0.986	0.106–0.998
Diagnostic criteria for NPUAP
Yes	0.591	0.258–0.838	0.030–0.985	0.026	0.860	0.388–0.982	0.092–0.997	0.253
No	0.598	0.261–0.839	0.034–0.985		0.643	0.158–0.939	0.025–0.991
Dataset source
Single-center	0.599	0.271–0.840	0.032–0.985	0.019	0.651	0.174–0.940	0.029–0.991	0.198
Multicenter	0.589	0.261–0.834	0.032–0.984		0.816	0.340–0.974	0.069–0.997
Patient setting
Inpatient	0.748	0.456–0.902	0.062–0.992	0.199	0.653	0.184–0.933	0.031–0.991	0.151
Home-based	0.483	0.211–0.740	0.024–0.974		0.782	0.298–0.963	0.061–0.995

95% CI, 95% confidence interval; 95% PI, 95% prediction interval; NPUAP, National Pressure Ulcer Advisory Panel; SEN, sensitivity; SPE, specificity.

Sensitivity analyses

Sensitivity analyses of the sources of heterogeneity in the included studies using Stata 16.0 software (Fig. 10) revealed that five studies had the potential to generate heterogeneity. Additionally, we conducted sensitivity analyses by excluding studies with small samples, low quality, or indirectly calculated datasets, and then recalculated the pooled effect size. The direction of the core indicators remained consistent compared with the primary analysis, indicating robust and reliable findings (Supplementary Figs. SA1 and SA2).

Figure 10.

Sensitivity analyses of AI in PI diagnosis. (a) Goodness of Fit, (b) Bivariate Normality, (c) Infuence Analysis, (d) Outlier Detection.

Publication bias

The included studies were evaluated for publication bias by Deeks’ funnel plot (Fig. 11). The p value was 0.42 (p > 0.05), indicating that there was no publication bias among the included studies.

Figure 11.

Publication bias of AI in PI diagnosis.

DISCUSSION

An increasing number of studies have been investigating the potential of AI in PI diagnosis. We conducted a systematic review of the methods, results, and quality of studies on AI for PI diagnosis. We not only performed a meta-analysis of the diagnostic results but also analyzed and pooled the diagnostic results for different PI stages. Our review highlighted three principal findings.

First, AI has demonstrated high reported diagnostic accuracy. Studies included in our review had an accuracy range of 60.5 − 97.8%, and meta-analysis showed a pooled SEN of 74% and SPE of 93%. +LR, −LR, DOR, and AUC were 10.14, 0.28, 35.72, and 0.92, respectively. In addition, reviews similarly reported that AI algorithms have significant advantages in PI diagnosis.⁴³ Therefore, AI has shown potential for the early detection and accurate diagnosis of PI, laying the groundwork for future investigations into its contribution to treatment-strategy formulation and outcome monitoring.

Second, the results of the meta-analysis showed that AI demonstrated significant potential in identifying the different stages of PI. Despite subtle differences between stages, overall, AI demonstrated good diagnostic performance in all stages of PI. Stage 1 presents with intact skin but with redness, pain, and congestion at the site of pressure.⁴² When the local pressure is relieved, stage 1 can usually heal on its own without further progression to stage 2.⁴⁴ However, in clinical practice, stage 1 deteriorates into more serious stages because it is not easy to detect and is not detected in a timely manner.^45,46 In this study, the accuracy of AI in identifying stage 1 was high, with an AUC of 0.95, which shows that AI has a significant advantage in identifying stage 1, and it can make up for the shortcomings of manual observation by virtue of its accurate identification ability, capture the early signs of PI in a timely manner, and provide reliable diagnostic references for the clinical personnel. In addition, in scenarios such as non-large health care facilities, elderly care facilities, and home-based settings, debridement conditions are usually not available for post-debridement staging of patients with the two specific types of PI, unstageable and DTPI.^22,47 In this study, meta-analysis of AI recognition of unstageable and DTPI was also performed, and the use of AI enabled early detection of such patients with PI in scenarios where strong PI expertise and debridement conditions are not available, drawing attention to them and transferring them to large hospitals for further treatment when necessary.

Third, there was high heterogeneity across studies, with SEN heterogeneity I² = 96.51% and SPE heterogeneity I² = 98.03%, both greater than 50%. We caution that the high pooled accuracy must be interpreted cautiously in light of this heterogeneity. On the one hand, AI models are often regarded as “black boxes,” and the choice of all possible transformations varies widely.⁴⁸ Interpretability remains a central challenge for clinical AI models. To secure clinician acceptance, models must supply evidence that is both understandable and trustworthy. We therefore recommend integrating explainable AI techniques such as heatmaps, attention maps, and saliency analysis in future work, enabling clinicians to rapidly verify the diagnostic rationale. Future studies should present explainability results alongside performance metrics. By improving model transparency, explainable models can increase clinician confidence in AI systems, facilitating broader acceptance and integration in clinical workflows. On the other hand, this study used 47 datasets of AI model construction, and the parameters of the PI images acquired varied from one study to another, leading to a large heterogeneity. In this study, the source of heterogeneity was tested for threshold effects, and the SROC curves for sensitivity and specificity did not show a “shoulder-arm” distribution, suggesting that the source of heterogeneity was not a threshold effect but mainly a non-threshold effect.

In this study, a Bayesian multilevel random-effects model was employed for subgroup analysis. Its core advantage lies in its ability to handle complex heterogeneity structures more flexibly and precisely. By incorporating the heterogeneity sources identified by meta-regression—namely, the sources of images, sample size, diagnostic staging criteria, algorithm choice, dataset source, and patient setting—as grouping variables into the model framework, and systematically decomposing the variation characterized by high I² values with the help of random-effects terms, it provides an analytical framework that is more in line with the data distribution characteristics for explaining the performance differences of AI in PI diagnosis.

Our results revealed that subgroups with images derived from public databases versus nonpublic databases exhibited certain differences in SEN and SPE. The quantity and quality of PI image data directly influence research outcomes. PI images, categorized as optical RGB (red–green–blue) images, are captured using various devices, including mobile phones, digital cameras, and tablet computers. Parameter variations across different devices make it challenging to ensure the homogeneity of PI images. Furthermore, the imaging process is affected by environmental factors, patient positioning, and individual habits of photographers, which further compromise image quality. Currently, most PI images used by researchers are sourced from historical PI images stored in departments and results from web searches, with substantial discrepancies among these images, directly impairing the learning efficacy of AI.¹⁸ It is recommended that future studies develop standardized imaging protocols to ensure the consistency of datasets in terms of illumination, resolution, and device type. Additionally, dataset-sharing initiatives are proposed to improve reproducibility and fairness.

Moreover, subgroup analysis results showed that the CNN model group had a SEN of 0.680 and a SPE of 0.868. However, it should be noted that among the algorithm types included in this study, CNN accounted for an extremely high proportion, while other types had a very low proportion. Such imbalance in subgroup sample sizes may affect the robustness of the results, so it is currently impossible to accurately determine the performance differences between different algorithms. We explicitly acknowledge this imbalance and therefore refrain from claiming that CNNs are intrinsically superior to other architectures. In addition, multiple pieces of research evidence have supported the application potential of other algorithms in PI diagnosis. For example, Bader Aldughayfiq et al. used YOLOv5 to identify four stages of PI lesions and non-PI lesions.¹³ Xuehua Liu et al. developed an intelligent machine vision system based on YOLO8, which not only can quickly identify PI stages but also may be extended to the diagnosis of other diseases closely related to color and texture features.³⁴ Beyond the YOLO series, traditional machine learning models such as Random Forest and Decision Tree have also shown good performance in identifying high-risk patients and classifying wounds based on clinical data. Moreover, emerging methods like transfer learning and explainable AI have demonstrated high performance in wound detection and classification tasks.⁴⁹ These diverse classification methods provide strong support for the early diagnosis of PI, the formulation of optimized treatment plans, and the identification of complications. Future studies should prospectively assemble balanced datasets that equally represent CNN, YOLO, transformers, and ensemble models, and conduct strictly matched head-to-head comparisons with systematically delineate the applicable scenarios and performance differences of each AI approach in PI diagnosis.

Several key factors should be considered when evaluating the application of AI tools in the medical field.

First, external validation is an essential step to determine the generalizability of models. This requires researchers to evaluate the model on external datasets different from the training dataset, thereby verifying the stability and reliability of its performance. However, current studies mostly focus on the development and internal validation of AI models, with limited external validation across diverse clinical environments and populations. Consequently, we underscore that without robust multicenter, prospective validation, the clinical translation of these AI tools remains premature. Multicenter external validation is recommended: first assess performance across diverse skin tones and age groups, then test usability in multiple health care settings, followed by a prospective study to confirm clinical value, and finally scale up with iterative refinement to achieve reliable clinical translation.

Second, the consistency of AI tools’ performance across different skin tones and age groups is of great significance. This is because differences in skin tones and ages may reduce the recognition accuracy for certain groups, thereby affecting the fairness and effectiveness of diagnostic results. However, the performance of AI in diagnosing PI across populations with different skin tones and ages is still in the exploratory stage. It is recommended that all future AI wound research mandatorily report demographic information and conduct stratified analysis by skin tone, so as to provide a basis for the generalizability and fairness of AI diagnostic tools.

Finally, with the rise of mobile health, the adaptability of AI tools on smartphones or tablets has become a key factor affecting the accessibility and convenience of their clinical application. The use of these tools on mobile devices enables medical professionals to access and utilize the technology anytime and anywhere, thereby improving the efficiency and scope of health care services. However, in resource-constrained environments, AI diagnostic technology faces multiple challenges, such as difficulty in deployment due to reliance on high-performance devices and stable networks, low operational proficiency of primary medical staff, and lack of technical maintenance support. It is recommended that future efforts focus on developing lightweight, mobile-compatible models that support offline operation, constructing locally adapted datasets to improve diagnostic accuracy, simplifying operational procedures, and strengthening training and remote technical support for medical staff.

At the level of clinical utility, the value of AI extends beyond diagnostic accuracy, with particular prominence in optimizing early detection, reducing misdiagnosis rates, and enhancing the timeliness of interventions. Studies have demonstrated that in outpatient settings, AI enables remote wound monitoring, which effectively improves patient compliance, enhances access to care, and optimizes clinical workflows.⁵⁰ Furthermore, compared with clinicians, AI can objectively identify and measure wound tissues, thereby facilitating early detection and reducing misdiagnosis.⁵¹ However, there remains a significant gap in the evaluation of core dimensions of clinical utility in existing research. Most studies included in this analysis focus solely on the consistency between algorithms and PI staging, lacking systematic assessments of integration with clinical workflows, operational convenience, and feedback from health care providers. This constitutes a critical gap that hinders the translation of AI from technical accuracy to practical clinical value. To address this gap, future research is recommended to adopt mixed-methods research (such as surveys, implementation studies, etc.) to evaluate the model’s usability, training needs, and real-world impact.

A fundamental strength of the current analysis is the adoption of robust methodology. The comprehensive literature search was performed in 8 electronic databases and included publications in both English and Chinese. This extensive effort, undertaken by two reviewers, enhanced the ability to accurately catalog the comprehensive information on AI for diagnosing PI and to analyze the accuracy of AI in identifying PI stages 1–4, unstageable, and DTPI, respectively. To the best of our knowledge, the current study is the first to estimate the accuracy of AI in diagnosing different stages of PI. It is expected that the results of this study will be an important reference for clinicians, nurses, medical device developers, health care administrators, and researchers in the field. Clinicians and nurses play a key role in the process of wound assessment, treatment, and ongoing monitoring of patients with diagnostic PI by AI. This application of AI has the potential to promote wound healing and improve patients’ quality of life. Medical device developers can leverage the findings of this study to improve and optimize AI diagnostic tools to increase diagnostic accuracy and efficiency. Health care administrators can use the findings to develop more effective health care policies and rationalize the allocation of resources. Researchers can focus on the pathogenesis, etiology, pathophysiology, and more effective interventions for PI, laying a solid foundation for future clinical practice and care.

LIMITATIONS

Our study had limitations. First, despite subgroup and meta-regression analyses conducted in this study, the high heterogeneity (I² > 90%) remains a significant concern. Second, the TP/FP/TN/FN data were not directly provided in some of the literature and needed to be calculated indirectly by available methods. As a result, a certain amount of human computational error was included, which may have impacted the results. Third, although CNN dominates in this study, the conclusion remains uncertain due to limitations in model comparison caused by sample imbalance or lack of standardization, making it difficult to draw robust and reliable conclusions regarding performance comparisons between different models.

CONCLUSIONS

In conclusion, the findings of this study indicate that AI models constructed based on PI image data exhibit promising performance in PI diagnosis. Although the included studies extracted PI image data from diverse centers and used various algorithms, our meta-analysis results suggest that these AI models show high diagnostic accuracy. However, the evidence should be interpreted with caution due to the lack of multicenter validation for the algorithms. Nonetheless, the existing studies offer valuable directions for future research on AI models in PI diagnosis. Future research should focus on external validation in prospective clinical settings and comparative analyses with clinicians to evaluate applicability and limitations.

TAKE-HOME MESSAGES

AI diagnosis of PI can identify early PI, which is helpful for timely intervention before the wound worsens.

AI can assist clinicians in making more informed decisions. In the absence of wound experts, AI can provide reliable diagnostic support, reducing diagnostic and treatment errors in clinical practice.

Clinicians should look for validated models, image quality, user-friendly interfaces, real-time analysis, dynamic monitoring, etc., in AI tools to enhance the accuracy and efficiency of diagnosis and support clinical decision-making.

AUTHORS’ CONTRIBUTIONS

Y.W.: Writing—original draft, methodology, literature screening, quality appraisal data curation, data analysis. X.L.: Writing—methodology, literature screening, quality appraisal data curation. J.P.: Writing—review and editing, methodology. H.Z.: Writing—review and editing, methodology. L.H.: Resources, methodology, writing—review and editing, funding acquisition.

Footnotes

ACKNOWLEDGMENTS AND FUNDING SOURCES

This study was supported by the following three projects: The National Natural Science Foundation of China (grant number 8246140071), The major project of Gansu Province Joint Scientific Research Fund (grant number 23JRRA1538), and The Natural Science Foundation of Gansu Province (grant number 25JRRA320).

AUTHOR DISCLOSURE AND GHOSTWRITING

The authors of this article have no financial conflicts of interest to disclose. No ghostwriters were involved in the writing of this article.

ABOUT THE AUTHORS

Yuting Wei, MNS, is a second-year PhD student at the First Clinical Medical College, Lanzhou University, with a master’s degree in nursing. Xiaodan Liu, BN, is a second-year master’s student at the School of Nursing, Lanzhou University. Juhong Pei, MNS, is a fourth-year PhD student at the First Clinical Medical College, Lanzhou University, with a master’s degree in nursing. Hongyan Zhang, MNS, is a nurse at Gansu Provincial Hospital, with a master’s degree in nursing. Lin Han, PhD, is the Vice President of Gansu Provincial Hospital, Dean of the School of Nursing at Lanzhou University, Doctoral Supervisor, and Master’s Supervisor.

Supplemental Material

Abbreviations and Acronyms

References

National Pressure Injury Advisory Panel EPUAPaPPPIA. Pressure Ulcers/Injuries: Definition ad Etiology. In: Prevention and Treatment of Pressure Ulcers/Injuries: Clinical Practice Guideline. The International Guideline: Fourth Edition Emily Haesler (Ed); 2025.

, Lin

, Thalib

, et al. Global prevalence and incidence of pressure injuries in hospitalised adult patients: A systematic review and meta-analysis. Int J Nurs Stud, 2020; 105:103546; doi: 10.1016/j.ijnurstu.2020.103546

, Zhao

, Wu

, et al. Global epidemiology, burden, and future projections of decubitus ulcers: A comprehensive analysis from 1990 to 2050. Wound Repair Regen, 2025; 33(3):e70048; doi: 10.1111/wrr.70048

Sugathapala

, Latimer

, Balasuriya

, et al. Prevalence and incidence of pressure injuries among older people living in nursing homes: A systematic review and meta-analysis. Int J Nurs Stud, 2023; 148:104605; doi: 10.1016/j.ijnurstu.2023.104605

Peña

, Martin

. Cellular and molecular mechanisms of skin wound healing. Nat Rev Mol Cell Biol, 2024; 25(8):599–616; doi: 10.1038/s41580-024-00715-1

Zajac

, Schubauer

, Simman

. The unavoidable pressure injury/ulcer: A review of skin failure in critically ill patients. J Wound Care, 2024; 33(Sup9):S18–S22; doi: 10.12968/jowc.2024.0079

Song

, Shen

, Cai

, et al. The relationship between pressure injury complication and mortality risk of older patients in follow-up: A systematic review and meta-analysis. Int Wound J, 2019; 16(6):1533–1544; doi: 10.1111/iwj.13243

Huang

, Cheng

, Yang

, et al. The influence of pressure injury risk on the association between left ventricular ejection fraction and all-cause mortality in patients with acute myocardial infarction 80 years or older. World J Emerg Med, 2023; 14(2):112–121; doi: 10.5847/wjem.j.1920-8642.2023.026

Zarei

, Madarshahian

, Nikkhah

, et al. Incidence of pressure ulcers in intensive care units and direct costs of treatment: Evidence from Iran. J Tissue Viability, 2019; 28(2):70–74; doi: 10.1016/j.jtv.2019.02.001

10.

Van Damme

, Van Hecke

, Remue

, et al. Physiological processes of inflammation and edema initiated by sustained mechanical loading in subcutaneous tissues: A scoping review. Wound Repair Regen, 2020; 28(2):242–265; doi: 10.1111/wrr.12777

11.

Hill

, Edney

, Hamer

, et al. Interventions for the treatment and prevention of pressure ulcers. Br J Community Nurs, 2022; 27(Sup6):S28–S36; doi: 10.12968/bjcn.2022.27.Sup6.S28

12.

Horup

, Soegaard

, Kjølhede

, et al. Static overlays for pressure ulcer prevention: A hospital-based health technology assessment. Br J Nurs, 2020; 29(12):S24–S28; doi: 10.12968/bjon.2020.29.12.S24

13.

Aldughayfiq

, Ashfaq

, Jhanjhi

, et al. YOLO-Based deep learning model for pressure ulcer detection and classification. Healthcare (Basel), 2023; 11(9):1222; doi: 10.3390/healthcare11091222

14.

Alves

, Eberhardt

, Soares

RSDA

, et al. Differential diagnosis in pressure ulcers and medical devices. ČEská a Slovenská Neurologie a Neurochirurgie, 2017; 2017(Suppl 1):S29; doi: 10.14735/amcsnn2017S29

15.

LeBlanc

, Woo

, Bassett

, et al. Professionals’ knowledge, attitudes, and practices related to pressure injuries in Canada. Adv Skin Wound Care, 2019; 32(5):228–233; doi: 10.1097/01.ASW.0000554444.52120.f6

16.

Bates-Jensen

, McCreath

, Harputlu

, et al. Reliability of the bates-jensen wound assessment tool for pressure injury assessment: The pressure ulcer detection study. Wound Repair Regen, 2019; 27(4):386–395; doi: 10.1111/wrr.12714

17.

Boyko

, Longaker

, Yang

. Review of the current management of pressure ulcers. Adv Wound Care (New Rochelle), 2018; 7(2):57–67; doi: 10.1089/wound.2016.0697

18.

Jiang

, Ma

, Guo

, et al. Using machine learning technologies in pressure injury management: Systematic review. JMIR Med Inform, 2021; 9(3):e25704; doi: 10.2196/25704

19.

Karaçay

, Goktas

, Yaşar

, et al. Investigation of pressure injuries with visual ChatGPT integration: A descriptive cross-sectional study. J Adv Nurs, 2025; doi: 10.1111/jan.16905

20.

Chao

, Pei

, Wei

, et al. Evaluation methods of pressure injury stages: A systematic review and meta-analysis. J Tissue Viability, 2025; 34(3):100894; doi: 10.1016/j.jtv.2025.100894

21.

Mohammad-Rahimi

, Motamedian

, Rohban

, et al. Deep learning for caries detection: A systematic review. J Dent, 2022; 122:104115; doi: 10.1016/j.jdent.2022.104115

22.

Jenny

, Yueping

, Kaijian

, et al. Construction of an artificial intelligence-assisted system for automatic detection of pressure injury based on the YOLO neural network. Chinese General Pract, 2024; 27(36):4582–4590.

23.

Jian

, Yueping

, Xiaodan

, et al. Research on pressure injury risk staging using deep learning methods based on convolutional neural networks. J Nurses Training, 2024; 39(17):1800–1806.

24.

, Tasar

, Utlu

, et al. Deep transfer learning-based visual classification of pressure injuries stages. Neural Comput Applications, 2022; 34(18):16157–16168; doi: 10.1007/s00521-022-07274-6

25.

Lau

, Yu

, Yip

, et al. An artificial intelligence-enabled smartphone app for real-time pressure injury assessment. Front Med Technol, 2022; 4:905074; doi: 10.3389/fmedt.2022.905074

26.

Fergus

, Chalmers

, Henderson

, et al. Pressure ulcer categorization and reporting in domiciliary settings using deep learning and mobile devices: A clinical trial to evaluate end-to-end performance. IEEE Access, 2023; 11(000):65138–65152; doi: 10.1109/ACCESS.2023.3289839

27.

Kim

, Lee

, Choi

, et al. Augmented decision-making in wound care: Evaluating the clinical utility of a deep-learning model for pressure injury staging. Int J Med Inform, 2023; 180:105266; doi: 10.1016/j.ijmedinf.2023.105266

28.

Seo

, Kang

, Eom

, et al. Visual classification of pressure injury stages for nurses: A deep learning model applying modern convolutional neural networks. J Adv Nurs, 2023; 79(8):3047–3056; doi: 10.1111/jan.15584

29.

Swerdlow

, Guler

, Yaakov

, et al. Simultaneous segmentation and classification of pressure injury image data using Mask-R-CNN. Comput Math Methods Med, 2023; 2023:3858997; doi: 10.1155/2023/3858997

30.

Chang

, Kim

, Shin

, et al. Diagnosis of pressure ulcer stage using on-device AI. Applied Sciences-Basel, 2024; 14(16):7124; doi: 10.3390/app14167124

31.

Ikuta

, Fukuoka

, Kimura

, et al. An ingenious deep learning approach for pressure injury depth evaluation with limited data. J Tissue Viability, 2024; 33(3):387–392; doi: 10.1016/j.jtv.2024.05.009

32.

Wang

, Guo

, Zhong

, et al. A novel deep-learning based weighted feature fusion architecture for precise classification of pressure injury. Front Physiol, 2024; 15:1304829; doi: 10.3389/fphys.2024.1304829

33.

Zalluhoğlu

, Akdoğan

, Karakaya

, et al. Region-Based Semi-Two-Stream convolutional neural networks for pressure ulcer recognition. J Imaging Inform Med, 2024; 37(2):801–813; doi: 10.1007/s10278-023-00960-4

34.

Liu

, Dou

, Guo

, et al. A novel technique for rapid determination of pressure injury stages using intelligent machine vision. Geriatr Nurs, 2025; 61:98–105; doi: 10.1016/j.gerinurse.2024.10.046

35.

Liu

, Hu

, Zhou

, et al. Application of deep learning to pressure injury staging. J Wound Care, 2024; 33(5):368–378; doi: 10.12968/jowc.2024.33.5.368

36.

Gui

, Wang

, Fan

, et al. Enhancing diagnostic accuracy with SE-Inception model integration in pressure ulcer detection. Ann Ital Chir, 2024; 95(4):609–620; doi: 10.62713/aic.3502

37.

Chen

, Wei

, Tseng

, et al. Applying object detection and large language model to establish a smart telemedicine diagnosis system with chatbot: A case study of pressure injuries diagnosis system. Telemed J E Health, 2024; 30(6):e1705–e1712; doi: 10.1089/tmj.2023.0715

38.

Cho

, Yoo

. Development of a pressure ulcer stage determination system for community healthcare providers using a vision transformer deep learning model. Medicine (Baltimore), 2025; 104(7):e41530; doi: 10.1097/md.0000000000041530

39.

Tusar

, Fayyazbakhsh

, Zendehdel

, et al. AI-Powered image-based assessment of pressure injuries using you Only Look once (YOLO) version 8 models. Adv Wound Care (New Rochelle), 2025; doi: 10.1089/wound.2024.0245

40.

Lei

, Jiang

, Xu

, et al. Convolutional neural network models for visual classification of pressure ulcer stages: Cross-Sectional study. JMIR Med Inform, 2025; 13:e62774; doi: 10.2196/62774

41.

Huang

Y-S

, Chen

C-M

, Liu

Y-S

, et al. Detection and diagnosis for pressure injury by using SE-Swin Cascade R-CNN. Ieee Multimedia, 2025; 32(1):4–15; doi: 10.1109/MMUL.2024.3524427

42.

Edsberg

, Black

, Goldberg

, et al. Revised national pressure ulcer advisory panel pressure injury staging system: Revised pressure injury staging system. J Wound Ostomy Continence Nurs, 2016; 43(6):585–597; doi: 10.1097/won.0000000000000281

43.

Rippon

, Fleming

, Chen

, et al. Artificial intelligence in wound care: Diagnosis, assessment and treatment of hard-to-heal wounds: A narrative review. J Wound Care, 2024; 33(4):229–242; doi: 10.12968/jowc.2024.33.4.229

44.

, Zhou

, Luo

, et al. Application of the care bundle in perioperative nursing care of the type A aortic dissection. Int J Gen Med, 2021; 14:5949–5958; doi: 10.2147/ijgm.S322755

45.

Alderden

, Zhao

, Zhang

, et al. Outcomes associated with stage 1 pressure injuries: A retrospective cohort study. Am J Crit Care, 2018; 27(6):471–476; doi: 10.4037/ajcc2018293

46.

Mengmeng

. Construction and Testing of a Risk Predictionmodel for Stage I Pressure Injuryin Critically ill Patients. Gansu University of Chinese Medicine: 2024.

47.

Aloweni

, Gunasegaran

, Lim

, et al. Socio-economic and environmental factors associated with community-acquired pressure injuries: A mixed method study. J Tissue Viability, 2024; 33(1):27–42; doi: 10.1016/j.jtv.2023.11.007

48.

Marcus

, Teuwen

. Artificial intelligence and explanation: How, why, and when to explain black boxes. Eur J Radiol, 2024; 173:111393; doi: 10.1016/j.ejrad.2024.111393

49.

Reifs Jiménez

, Casanova-Lozano

, Grau-Carrión

, et al. Artificial intelligence methods for diagnostic and decision-making assistance in chronic wounds: A systematic review. J Med Syst, 2025; 49(1):29; doi: 10.1007/s10916-025-02153-8

50.

Raizman

, Ramírez-GarciaLuna

, Newaz

, et al. Empowering patients and caregivers to use artificial intelligence and computer vision for wound monitoring: Nonrandomized, single-arm feasibility study. J Particip Med, 2025; 17:e69470; doi: 10.2196/69470

51.

Ramachandram

, Ramirez-GarciaLuna

, Fraser

RDJ

, et al. Fully automated wound tissue segmentation using deep learning on mobile devices: Cohort study. JMIR Mhealth Uhealth, 2022; 10(4):e36977; doi: 10.2196/36977

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

16.05 MB

0.00 MB

0.59 MB