Evaluation of the reliability,informational content,and quality of YouTube videos on cervical disc herniation

Abstract

Background

YouTube has become a widely used resource for patients and healthcare professionals seeking information on cervical disc herniation. However, the reliability and quality of online content show substantial variability.

Objective

To evaluate the reliability, informational content and quality of YouTube videos related to cervical disc herniation.

Methods

This cross-sectional study included YouTube videos related to cervical disc herniation, identified through searches using the terms “cervical disc herniation,” “disc herniation of cervical spine,” and “herniated cervical disc.” Video characteristics such as duration, views, likes, comments, and content creator profiles were recorded. Reliability and quality were assessed using the DISCERN instrument, the Journal of the American Medical Association (JAMA) benchmark criteria, and the Global Quality Scale (GQS).

Results

A total of 300 videos (100 for each search term) were initially retrieved. After excluding duplicates, non-English, irrelevant, or very short videos, 104 unique videos were included in the study. Of the included videos, 41.3% were uploaded by physicians, 30.7% by non-physician healthcare professionals, 14.4% by medical channels, 8.7% by healthcare institutions, and 4.8% by patients. The mean DISCERN score was 35.3 ± 7.6, the mean JAMA score was 2.4 ± 0.6, and the mean GQS score was 3.5 ± 0.8, indicating generally poor to fair quality. The median (range) scores were 35.0 (17.0–58.0) for DISCERN, 2.0 (1.0–4.0) for JAMA, and 4.0 (1.0–5.0) for GQS. According to JAMA, 55.8% of the videos scored 2 points, reflecting low reliability, while only 2.9% achieved the maximum score of 4. In GQS, 52.9% were rated as high quality, 38.5% as moderate, and 8.6% as low. Correlation analyses showed that longer video duration was positively associated with higher DISCERN (r = 0.41–0.57, p < 0.001), JAMA (r = 0.36, p < 0.001), and GQS scores (r = 0.35, p < 0.001).

Conclusion

Most of the included YouTube videos on cervical disc herniation demonstrated low to moderate reliability. These findings emphasize the need for healthcare professionals and patients to critically appraise such content, and they highlight the importance of developing higher-quality, evidence-based online educational resources.

Keywords

cervical disc herniation DISCERN instrument Global Quality Scale JAMA benchmark criteria reliability video quality YouTube videos

Introduction

Cervical disc herniation occurs as a result of degeneration or trauma of the intervertebral discs, leading to rupture of the annulus fibrosus and extrusion of the nucleus pulposus, which may compress the spinal cord and/or nerve roots.¹ It is a common condition particularly in the active population, often beginning with an acute painful phase and persisting with chronic symptoms. While it may remain asymptomatic in some individuals, it can manifest with a wide spectrum of clinical features, including neck pain, radiculopathy, and myelopathy.^2,3

In recent years, the internet has become one of the most frequently used sources of health-related information. Both patients and healthcare professionals increasingly consult online platforms for educational and informational purposes.⁴ The rapid expansion of smartphone and computer use has facilitated easy access to medical knowledge, with social media platforms playing an especially important role in this trend. YouTube, in particular, is among the most popular video-sharing platforms worldwide, with more than two billion monthly users across over 100 countries and 80 languages. Each minute, over 500 h of video content are uploaded, and users collectively spend more than one billion hours per day watching videos.^5,6

Patients often seek health-related content on social media based on their individual interests and concerns. However, a growing body of literature highlights a major challenge: the widespread dissemination of misinformation and low-quality health content online.^7,8 This issue has been identified as a significant public health concern, as misleading information can negatively influence patient decision-making and communication with healthcare providers.⁹ The impact is especially profound for vulnerable patient groups, such as those experiencing unexplained chronic pain, who may be more likely to rely on unverified or anecdotal information found online.

Accordingly, evaluating the quality, informational, and reliability content of online resources—particularly on widely used platforms such as YouTube—has become an increasingly important area of research. Previous studies have investigated the quality of online information regarding various medical conditions^10,11; however, to date, there is limited evidence focusing specifically on YouTube videos related to cervical disc herniation. Given the high prevalence of this condition and the widespread use of social media platforms for patient education, assessing the quality of such online content is essential for guiding patients and supporting evidence-based health communication.

Methods

Data collection

The study was designed as a cross-sectional analysis. On September 30, 2024, a search was conducted on the online video platform YouTube, according to Turkey local time, to simulate patient-oriented access to information, using the English terms “cervical disc herniation,” “disc herniation of cervical spine,” and “herniated cervical disc.” These keywords were selected based on their direct clinical relevance to the disease and their frequent use by patients seeking information on this topic. Broader or less specific terms were intentionally excluded because preliminary testing showed that they produced a high proportion of irrelevant videos and substantially reduced the number of eligible results, thereby limiting the feasibility of statistical analysis. Separate video lists were generated for each search term. To reduce potential bias, the simulation was performed with different keywords, and “Incognito mode” was enabled to prevent the influence of prior search history on new results. For each keyword, new user accounts were created, and only the “relevance” filter was applied in video searches. The search results were ranked according to YouTube's algorithm, which incorporates factors such as the number of views, viewer ratings, and upload date. From these ranked results, the first 100 videos for each keyword were retrieved and screened for eligibility to ensure consistency with YouTube's default relevance-based search order. If a duplicate video appeared within the top 100 results of a given search term, it was excluded and replaced by the next unique video in the ranking to ensure that each keyword contributed 100 distinct videos.

The inclusion criteria were defined as follows: (1) videos related to cervical disc herniation uploaded by physicians, non-physician healthcare professionals, healthcare institutions, medical channels, or patients; (2) acceptable audio and visual quality; (3) uploaded within the last 10 years; (4) video language in English. The exclusion criteria were defined as follows: (1) videos unrelated to the study topic; (2) videos in languages other than English; (3) duration shorter than 30 s;^12,13 (4) inadequate audio or visual quality; (5) duplicate videos; (6) videos in the format of academic lectures or conference presentations.

Descriptive characteristics of the videos

The following data were recorded for the included videos: video title, URL, days since upload, view rate, video duration (seconds), content creator profile, content type, and the number of views, likes, and comments. The profiles of the content creators were categorized into five groups: (1) physicians, (2) non-physician healthcare professionals, (3) medical channels, (4) healthcare institutions, and (5) patients. Physicians were further classified according to their specialties: physical medicine and rehabilitation, neurosurgery, orthopedics, radiology, and sports medicine. Non-physician healthcare professionals included physiotherapists and chiropractic practitioners. Based on their content, videos were categorized into four groups: (1) diagnostic information, (2) conservative treatment, (3) surgical treatment, and (4) patient experiences. Videos classified as diagnostic information encompassed disease definition and symptoms, while those categorized under conservative treatment included exercise programs, injection therapies, physical therapy modalities, and chiropractic practices.

Evaluation parameters and video analysis

The video contents were evaluated by two independent physical medicine and rehabilitation specialists (BA, TS), each with at least 10 years of experience, using the DISCERN instrument, the Journal of the American Medical Association (JAMA) benchmark criteria, and the Global Quality Scale (GQS).

To minimize interpretation variability, both physicians participated in a calibration session before the evaluation to ensure consistent application of the DISCERN, JAMA, and GQS scoring criteria. Intra- and inter-observer reliability of the content validity index (CVI) calculations was assessed using intraclass correlation coefficients (ICC) with 95% confidence intervals (CI), based on a two-way mixed effects model for absolute agreement (average measures). The ICC analysis was performed on a randomly selected sample of 50 videos. The ICC values indicated excellent agreement between the two experienced physicians: 0.91 for DISCERN Part 1, 0.94 for DISCERN Part 2, 0.92 for DISCERN Part 3, 0.91 for the DISCERN total score, 0.91 for the JAMA score, and 0.93 for the GQS.¹⁴

The DISCERN instrument is a widely recognized tool used by academics, healthcare professionals, and individuals to evaluate the conformity of health information sources to evidence-based medicine standards. Its structure, comprising three sections and an overall assessment question, enables a comprehensive analysis of both the content and the presentation of information. This is of considerable importance in assessing the ethical and scientific standards of patient-centered health information. The first part of the DISCERN instrument consists of eight questions that assess the reliability of health information, while the second part contains seven questions that evaluate the quality of information and the thoroughness with which treatment options are presented. The sixteenth question — “Based on the answers to all of these questions, rate the overall quality of the publication as a source of information about treatment choices” — represents the third part of the DISCERN instrument and provides an overall quality rating. The total DISCERN score corresponds to the sum of all 16 questions. Each item is rated on a scale from 1 to 5, with 1 representing the lowest and 5 the highest value. In the scoring system, the following ranges are used: scores between 16–26 indicate very poor quality, 27–38 indicate poor quality, 39–50 reflect moderate quality, 51–62 represent good quality, and 63–75 correspond to excellent quality of information.¹⁵ The JAMA benchmark criteria constitute an established tool used to evaluate the quality of online medical information. These criteria consist of four key domains: authorship, attribution, disclosure, and currency. Each domain is scored as 0 or 1, yielding a maximum total score of 4.¹⁶ The Global Quality Scale (GQS) is an instrument used to evaluate online health information and was developed for the purpose of providing an overall assessment of quality. This scale is particularly useful for assessing not only the overall quality of health-related content published on the internet but also the flow of information and ease of use. The GQS employs a 5-point scale to evaluate the quality of information. Scores are classified as follows: ≤2 points indicate low-quality content, 3 points indicate moderate quality, and ≥4 points indicate high-quality content.¹⁷

Statistical analysis

All analyses were performed using the IBM SPSS Statistics software, version 24.0 (IBM Corp., Armonk, NY, USA). Descriptive findings were presented as frequencies and percentages for qualitative variables, and as mean, standard deviation, median, minimum, and maximum values for quantitative variables. The normality of continuous variables within independent groups was assessed using the Shapiro–Wilk test. Since most variables did not follow a normal distribution (p < 0.05), non-parametric tests were predominantly used for comparisons. For continuous variables that followed a normal distribution, comparisons of means were performed using one-way analysis of variance (ANOVA). When ANOVA revealed significant differences, post-hoc pairwise comparisons were performed with Bonferroni correction. For continuous variables not conforming to a normal distribution, the Kruskal–Wallis H test was applied. Pairwise comparisons were conducted with Bonferroni correction, and when the Kruskal–Wallis test indicated statistical significance, the Mann–Whitney U test was used for subsequent pairwise analyses. Correlations between continuous variables were assessed using Pearson's correlation coefficient for normally distributed data and Spearman's rank correlation test for non-normally distributed data. A p-value of <0.05 was considered statistically significant.

Results

A total of 300 videos (100 for each search term) were initially retrieved. Duplicate videos, non-English content, irrelevant or non-informative videos, videos shorter than 30 s, and conference or lecture recordings were excluded. After applying these exclusion criteria, 104 unique videos meeting the inclusion criteria were included in the final analysis. The sequential exclusion steps are also illustrated in Figure 1. Surgical treatment–related content was predominantly produced by physicians, followed by medical channels and healthcare institutions. In videos addressing conservative treatment, non-physician healthcare professionals and physicians were the primary contributors, whereas medical channels and healthcare institutions accounted for a lower proportion. Videos providing diagnostic information were mainly produced by healthcare institutions and physicians, followed by medical channels. The descriptive characteristics of the videos are presented in Table 1.

Figure 1.

Flow diagram of the included and excluded videos in the study.

Table 1.

Descriptive characteristics of included YouTube videos (n = 104).

Source of upload	n	%
Patient	9	8.7
Medical Channel	11	10.6
Healthcare Institution	12	11.5
Non-physician healthcare personnel
Physiotherapist	14	13.5
Chiropractic Practitioner	15	14.4
Physician
Physiatrist	9	8.7
Neurosurgeon	12	11.5
Orthopedic Surgeon	17	16.3
Radiologist	3	2.9
Sports Medicine Specialist	2	1.9

Video content	n	%
Diagnostic Information	44	42.3
Patient Experiences	9	8.7
Conservative Treatment	38	36.5
Surgical Treatment	13	12.5

Video parameters	Mean ± SD	Median (Min—Max)
Video duration (sec)	329.7 ± 303.2	232.5 (41.0–1749.0)
Elapsed time (years)	4.5 ± 3.3	3.0 (0.0–14.0)
Elapsed time (days)	1658.7 ± 1214.6	1095.0 (14.0–5110.0)
View rate (%)	10225.5 ± 26645.1	847.7 (4.9–185741.5)
Number of comments	162.1 ± 555.3	7.5 (0.0–5143.0)
Number of views	135374.6 ± 313359.6	14580.5 (40.0–1925897.0)
Number of likes	2096.4 ± 5716.0	117.0 (0.0–39000.0)
Number of subscribers	789165.1 ± 1940016.0	26750.0 (12.0–9190000.0)

Evaluation scales	Mean ± SD	Median (Min–Max)
DISCERN (Part 1)	24.2 ± 5.3	24.5 (9.0–35.0)
DISCERN (Part 2)	8.7 ± 3.0	7.5 (7.0–22.0)
DISCERN (Part 3)	2.4 ± 1.0	2.0 (1.0–5.0)
DISCERN (Total)	35.3 ± 7.6	35.0 (17.0–58.0)
JAMA score	2.4 ± 0.6	2.0 (1.0–4.0)
GQS	3.5 ± 0.8	4.0 (1.0–5.0)

Comparison of video parameters by content type are showed in Table 2. When the content producer groups were evaluated, the video duration showed a statistically significant difference among the groups (p < 0.001). In pairwise comparisons, videos uploaded by patients, non-physician healthcare professionals, and physicians were found to have significantly longer durations compared to those uploaded by medical channels and healthcare institutions (all p < 0.05). No significant differences were observed among the groups in terms of time elapsed since video upload, viewing rate, number of comments, number of views, and number of likes (p = 0.28; p = 0.21; p = 0.30; p = 0.36; p = 0.64, respectively). A significant difference was found in the number of subscribers among the groups (p = 0.004). Specifically, the number of subscribers in the medical channel group was significantly higher than that of the patient group (p < 0.05).

Table 2.

Comparison of video parameters by content type.

Video Content	Mean ± SD	Median (Min–Max)	p-value
Video duration (sec)			0.001^a
Surgical treatment	252.8 ± 121.1	230.0 (104.0–591.0)
Patient experiences	517.9 ± 280.5	479.0 (179.0–899.0)
Conservative treatment	449.2 ± 385.2	285.0 (51.0–1749.0)
Diagnostic information	210.6 ± 193.7	127.0 (28.0–844.0)
Elapsed time (days)			0.63^a
Surgical treatment	1993.5 ± 1179.1	1290.0 (365.0–3650.0)
Patient experiences	1541.1 ± 599.1	1460.0 (730.0–2190.0)
Conservative treatment	1561.9 ± 1188.7	1095.0 (140.0–3650.0)
Diagnostic information	1667.4 ± 1348.3	1095.0 (365.0–1310.0)
View ratio (%)			0.05^a
Surgical treatment	6083.4 ± 8296.2	2411.2 (81.3–20962.0)
Patient experiences	761.9 ± 1098.7	286.2 (49.3–1106.6)
Conservative treatment	17210.9 ± 37039.1	2089.4 (15.6–185741.5)
Diagnostic information	7352.2 ± 20620.6	765.1 (5.5–131910.8)
Number of comments			0.67^a
Surgical treatment	52.8 ± 94.7	10.0 (0.0–329.0)
Patient experiences	52.0 ± 57.9	16.0 (0.0–132.0)
Conservative treatment	205.2 ± 376.7	7.0 (0.0–1800.0)
Diagnostic information	179.7 ± 778.0	6.5 (0.0–143.0)
Number of views			0.07^a
Surgical treatment	116718.0 ± 181986.3	68726.0 (890.0–667046.0)
Patient experiences	10684.6 ± 14642.2	6268.0 (90.0–45356.0)
Conservative treatment	179602.9 ± 335907.8	18419.0 (171.0–1380255.0)
Diagnostic information	128194.4 ± 351705.0	8484.0 (40.0–1925897.0)
Number of likes			0.21^a
Surgical treatment	1072.8 ± 1539.6	506.0 (16.0–4400.0)
Patient experiences	141.8 ± 157.7	112.0 (0.0–489.0)
Conservative treatment	3164.4 ± 6563.3	225.0 (0.0–30000.0)
Diagnostic information	1876.3 ± 6193.3	115.0 (0.0–39000.0)
Number of subscribers			0.02^a
Surgical treatment	1714734.6 ± 2825943.4	74800.0 (2850.0–6580000.0)
Patient experiences	51540.7 ± 143529.1	2000.0 (29.0–434000.0)
Conservative treatment	608777.3 ± 1278702.7	20150.0 (500.0–5150000.0)
Diagnostic information	822368.5 ± 2230049.9	42100.0 (12.0–19000000.0)

: Kruskal—Wallis H.

Bold values indicate statistical significance.

The evaluation of DISCERN scores revealed that 14 (13.5%) of the videos were classified as very poor (score range 16–26), 55 (52.9%) as poor (score range 27–38), 30 (28.8%) as fair (score range 39–50), and 5 (4.8%) as good (score range 51–62). Notably, no video was categorized as excellent (score range 63–75). According to the JAMA score, 5 (4.8%) videos received a score of 1, 58 (55.8%) received a score of 2, 38 (36.5%) received a score of 3, and 3 (2.9%) received a score of 4. In GQS, 9 (8.6%) videos were categorized as low quality, 40 (38.5%) as moderate quality, and 55 (52.9%) as high quality. A more detailed analysis showed that 2 (1.9%) videos scored 1 point, 7 (6.7%) scored 2 points, 40 (38.5%) scored 3 points, 49 (47.1%) scored 4 points, and 6 (5.8%) videos scored 5 points.

Comparison of DISCERN, JAMA, and GQS across content producers are presented in Table 3. In the pairwise comparisons, a significant difference was observed in DISCERN Part 1 between patients and healthcare institutions (p = 0.04); and in DISCERN Part 3 between patients and medical channels (p = 0.002) as well as between medical channels and physicians (p < 0.001). Additionally, in the GQS, a significant difference was observed between patients and physicians (p = 0.03).

Table 3.

Comparison of DISCERN, JAMA, and GQS across content producers.

Scores	Video source	Mean ± SD	Median (Min–Max)	p-value
DISCERN (Part 1)	Patient	20.3 ± 6.0	23.0 (11.0–27.0)	0.02^a
	Medical channel	22.3 ± 4.1	23.0 (15.0–27.0)
	Healthcare institution	26.8 ± 3.9	27.0 (20.0–33.0)
	Non-physician	23.6 ± 5.6	23.0 (9.0–33.0)
	Physician	25.0 ± 5.0	25.0 (14.0–35.0)
DISCERN (Part 2)	Patient	8.0 ± 0.9	8.0 (7.0–9.0)	0.37^b
	Medical channel	7.7 ± 0.9	7.5 (7.0–10.0)
	Healthcare institution	8.0 ± 2.2	7.0 (7.0–15.0)
	Non-physician	8.7 ± 3.8	7.0 (7.0–22.0)
	Physician	9.3 ± 3.2	8.0 (7.0–17.0)
DISCERN (Part 3)	Patient	2.6 ± 0.5	3.0 (2.0–3.0)	0.001 ^b
	Medical channel	1.5 ± 0.5	1.5 (1.0–2.0)
	Healthcare institution	2.0 ± 1.0	2.0 (1.0–4.0)
	Non-physician	2.2 ± 1.1	2.0 (1.0–4.0)
	Physician	2.8 ± 1.0	3.0 (1.0–5.0)
DISCERN (Total)	Patient	30.9 ± 7.0	34.0 (20.0–38.0)	0.05^a
	Medical channel	31.5 ± 5.0	32.0 (23.0–38.0)
	Healthcare institution	36.8 ± 6.1	35.0 (29.0–52.0)
	Non-physician	34.5 ± 8.7	35.0 (17.0–58.0)
	Physician	37.1 ± 7.2	39.0 (22.0–55.0)
JAMA	Patient	2.2 ± 0.7	2.0 (1.0–3.0)	0.89^b
	Medical channel	2.3 ± 0.7	2.0 (1.0–3.0)
	Healthcare institution	2.4 ± 0.7	2.0 (2.0–4.0)
	Non-physician	2.5 ± 0.7	2.0 (1.0–4.0)
	Physician	2.3 ± 0.6	2.0 (1.0–3.0)
GQS	Patient	2.8 ± 0.7	3.0 (2.0–4.0)	0.03^b
	Medical channel	3.4 ± 1.0	4.0 (1.0–4.0)
	Healthcare institution	3.6 ± 0.8	4.0 (2.0–5.0)
	Non-physician	3.4 ± 0.8	3.0 (1.0–5.0)
	Physician	3.7 ± 0.7	4.0 (2.0–5.0)

: One-way ANOVA.

: Kruskal—Wallis H.

Bold values indicate statistical significance.

Comparison of DISCERN, JAMA, and GQS across video content groups are showed in Table 4. In the pairwise comparisons, conservative treatment videos had higher DISCERN Part 1 scores than patient experience videos (p = 0.03). For DISCERN Part 2, both surgical and conservative treatment videos had higher scores than diagnostic information videos (p < 0.001 and p = 0.002, respectively). In DISCERN Part 3, surgical and conservative treatment videos also had higher scores than diagnostic information videos (all p < 0.001). Additionally, in the GQS, surgical treatment videos showed higher quality scores than patient experience and diagnostic information videos (p < 0.001 and p = 0.007, respectively), while conservative treatment videos scored higher than patient experience videos (p = 0.002). Also, correlation of video parameters with DISCERN, JAMA, and GQS scores are showed in Table 5.

Table 4.

Comparison of DISCERN, JAMA, and GQS across video content groups.

Scores	Video source	Mean ± SD	Median (Min–Max)	p-value
DISCERN (Part 1)	Surgical treatment	25.2 ± 3.4	25.0 (20.0–30.0)	0.02 ^a
	Patient experiences	20.3 ± 6.0	23.0 (11.0–27.0)
	Conservative treatment	25.6 ± 4.4	25.5 (18.0–34.0)
	Diagnostic information	23.4 ± 5.9	24.0 (9.0–35.0)
DISCERN (Part 2)	Surgical treatment	10.3 ± 3.3	9.0 (7.0–16.0)	<0.001 ^b
	Patient experiences	8.0 ± 0.9	8.0 (7.0–9.0)
	Conservative treatment	9.4 ± 3.9	8.0 (7.0–22.0)
	Diagnostic information	7.7 ± 1.8	7.0 (7.0–15.0)
DISCERN (Part 3)	Surgical treatment	3.0 ± 0.8	3.0 (2.0–4.0)	<0.001 ^b
	Patient experiences	2.6 ± 0.5	3.0 (2.0–3.0)
	Conservative treatment	2.9 ± 1.1	3.0 (1.0–5.0)
	Diagnostic information	1.8 ± 0.8	2.0 (1.0–3.0)
DISCERN (Total)	Surgical treatment	38.5 ± 4.7	40.0 (31.0–46.0)	0.002 ^a
	Patient experiences	30.9 ± 7.0	34.0 (20.0–38.0)
	Conservative treatment	37.9 ± 8.1	37.0 (26.0–58.0)
	Diagnostic information	33.0 ± 6.8	33.5 (17.0–45.0)
JAMA score	Surgical treatment	2.4 ± 0.5	2.0 (2.0–3.0)	0.61 ^b
	Patient experiences	2.2 ± 0.7	2.0 (1.0–3.0)
	Conservative treatment	2.5 ± 0.7	2.0 (1.0–4.0)
	Diagnostic information	2.3 ± 0.6	2.0 (1.0–3.0)
GQS	Surgical treatment	4.0 ± 0.4	4.0 (3.0–5.0)	0.001 ^b
	Patient experiences	2.8 ± 0.7	3.0 (2.0–4.0)
	Conservative treatment	3.6 ± 0.6	4.0 (3.0–5.0)
	Diagnostic information	3.3 ± 0.9	3.0 (1.0–5.0)

: One-way ANOVA.

: Kruskal—Wallis H.

Bold values indicate statistical significance.

Table 5.

Correlation of video parameters with DISCERN, JAMA, and GQS scores.

	DISCERN (Part 1)	DISCERN (Part 2)	DISCERN (Part 3)	DISCERN (Total)	JAMA score	GQS
Video duration (sec)	r = 0.41 p < 0.001	r = 0.46 p < 0.001	r = 0.57 p < 0.001	r = 0.44 p < 0.001	r = 0.36 p < 0.001	r = 0.35 p < 0.001
Elapsed time (years)	r = -0.04 p = 0.65	r = -0.08 p = 0.37	r = -0.04 p = 0.64	r = -0.10 p = 0.31	r = -0.25 p = 0.008	r = 0.04 p = 0.68
View ratio (%)	r = 0.30 p = 0.002	r = 0.054 p = 0.583	r = 0.158 p = 0.109	r = 0.23 p = 0.01	r = 0.23 p = 0.01	r = 0.36 p < 0.001
Number of comments	r = 0.31 p = 0.001	r = 0.17 p = 0.07	r = 0.17 p = 0.07	r = 0.26 p = 0.007	r = 0.30 p = 0.001	r = 0.31 p = 0.001
Number of views	r = 0.29 p = 0.002	r = 0.04 p = 0.64	r = 0.13 p = 0.18	r = 0.25 p = 0.009	r = 0.14 p = 0.15	r = 0.352 p < 0.001
Number of likes	r = 0.325 p = 0.001	r = 0.11 p = 0.26	r = 0.19 p = 0.04	r = 0.33 p = 0.001	r = 0.20 p = 0.03	r = 0.37 p < 0.001
Number of subscribers	r = 0.257 p = 0.009	r = 0.05 p = 0.60	r = 0.04 p = 0.67	r = 0.005 p = 0.95	r = 0.25 p = 0.008	r = 0.33 p = 0.001

r = Spearman correlation coefficient.

Bold values indicate statistical significance.

Discussion

YouTube is currently one of the most widely used video-sharing platforms worldwide for accessing information. Consequently, many individuals visit this platform to obtain knowledge about existing health problems and potential solutions. However, a feedback mechanism for evaluating the reliability and credibility of the information provided in these videos, which are produced by content creators from different groups, is lacking.

The literature in physical medicine and rehabilitation includes studies assessing the informational content and reliability of YouTube videos on topics such as physiotherapy exercises, chronic pain, carpal tunnel syndrome, multiple sclerosis, and osteoarthritis.^18,19 On YouTube, numerous videos are available across various fields of medicine. Sun et al. reported that while online information on breast cancer treatment was reliable, it was limited by a lack of originality and insufficient referencing.²⁰

In various studies analyzing videos related to psoriatic arthritis, anterior cruciate ligament injuries, fibromyalgia, and cerebral palsy, the majority of the content was found to be shared by healthcare professionals.^21–24 In the study conducted by Juyeon Oh et al., 47% of the videos were reported to have been uploaded by physicians. Similarly, in our study, 41.3% of the analyzed videos were published by physicians. This finding suggests that YouTube is more frequently used by physicians for patient education.²⁴ In the study by Ayo-Ajibola et al. on Graves’ disease, 8% of the videos were uploaded by patients. In our study, similar to these findings, patients constituted the source with the lowest contribution in terms of uploaded videos.²⁵ Patients may be less likely to share videos due to privacy concerns, the sensitive nature of their medical conditions, and associated emotional challenges. The number of views of the videos included in our study was found to be higher than those reported in studies analyzing videos on peripheral nerve stimulation, cervical spondylosis, femoroacetabular impingement, lumbar spine manipulation techniques, and cervical disc replacement. This finding indicates the high level of interest in videos related to cervical disc herniation and suggests that such videos are frequently used by patients as a source of information.^26–28

In the study by Gokcen et al., the authors used the more general English term “disc herniation” as a keyword in their search, and the evaluated content had a mean DISCERN score of 30.7 and a mean JAMA score of 1.8.¹¹ In addition, Bayram et al. evaluated YouTube videos related to transforaminal lumbar interbody fusion (TLIF) and reported mean GQS and JAMA scores of 1.82 ± 0.87 and 1.08 ± 0.80, respectively, indicating poor educational quality and limited reliability.²⁹ Similarly, Yaradılmış et al. showed that videos on spondylolisthesis achieved mean DISCERN, JAMA, and GQS scores of 35 ± 11.1, 2.7 ± 0.6, and 2.84 ± 1.05, respectively, again reflecting suboptimal informational value.³⁰ Consistent with these findings, our study also demonstrated that YouTube videos on cervical disc herniation exhibit low-to-moderate reliability and quality, suggesting that insufficiently referenced and non-standardized educational content remains a pervasive issue across spine-related topics. Rudisill et al. showed that YouTube videos on pediatric scoliosis, when analyzed using the JAMA score and GQS, had low reliability and educational quality. Similarly, Lama et al., in a comparable evaluation on cubital tunnel syndrome, reported that the videos contained poor and insufficient information and might provide incomplete knowledge to YouTube users.^31,32 In our study, the mean DISCERN score was 35.3, indicating that the content was generally inadequate but close to a moderate level of quality. Analysis with the JAMA score revealed considerable variability in terms of reliability. Notably, 55.8% of the videos received a score of 2, placing them in the low-reliability category, which demonstrates that the majority of videos on cervical disc herniation available on this platform suffer from substantial deficiencies in reliability. Furthermore, only 2.9% of the videos achieved the maximum score of 4, thereby meeting high-quality standards, suggesting that most health-related content on YouTube fails to reach rigorous standards and that only a limited proportion provides users with reliable information. Additionally, 36.5% of the videos were categorized as offering moderate reliability, which indicates that while the platform holds some potential in terms of informational reliability, this potential remains limited. Overall, these findings demonstrate that, while a few videos meet moderate to high-quality standards, most YouTube videos on cervical disc herniation exhibit significant shortcomings in terms of reliability of information. Although in our study more than half of the videos were rated as high quality according to the GQS, this scale primarily reflects presentation flow and perceived usefulness rather than scientific reliability. Therefore, videos with clear narration and visually organized content could receive relatively high GQS scores despite containing incomplete or unreliable medical information. In contrast, DISCERN and JAMA are objective, checklist-based instruments are designed to assess the presence of evidence-based components, including source citation, authorship identification, risk disclosure, and discussion of treatment alternatives. Therefore, they tend to yield lower scores unless the content meets these explicit scientific criteria. This discrepancy illustrates the phenomenon of “polish over proof,” where visually appealing and well-structured videos achieve high subjective ratings despite lacking scientific transparency or evidence-based context. For clinical conditions such as cervical disc herniation, this divergence between perceived and actual reliability underscores that subjective quality scores like GQS alone may be insufficient for evaluating medical content and should therefore be interpreted in conjunction with objective reliability measures such as DISCERN and JAMA. In our study, the mean GQS score was slightly lower than the proportion of high-rated videos, which may be explained by the variability in informational depth among the videos. Those providing limited medical detail or focusing mainly on general descriptions rather than clinical aspects tended to receive lower scores, thereby reducing the overall mean despite the predominance of well-presented videos.

Evaluating the reliability and informational quality of different types of content is important for understanding which categories provide more useful information to viewers. This pattern may reflect that treatment-related videos are more frequently produced by healthcare professionals or established medical sources, which generally provide more structured, accurate, and evidence-based explanations. In contrast, patient experience videos often emphasize subjective narratives rather than content reliability, leading to lower reliability and quality scores, while diagnostic information videos generally present only brief overviews without sufficient depth or reference support.

Beyond these content-type differences, our findings revealed that most videos provided insufficient coverage of non-surgical management strategies, rehabilitation-based treatment options, and prognosis. Only a minority included content consistent with evidence-based physical therapy or clinical practice guidelines. Moreover, detailed analysis of the JAMA benchmark indicated that the lowest-scoring domains were authorship and referencing. Most videos failed to disclose the creator's professional identity or institutional affiliation, and only a few provided verifiable references or citations to scientific sources. These deficiencies reduce the transparency, traceability, and educational reliability of the content.

Enhancing professional engagement in online education would further improve digital health literacy, allowing patients seeking information about cervical disc herniation or neck-related symptoms to more accurately distinguish reliable, evidence-based resources from misleading or anecdotal content.

Heisinger et al. demonstrated that video length is a critical factor of reliability and quality.³³ Consistent with our findings, studies have shown that high-quality videos tend to be longer in duration than low-quality ones. Krakowiak et al. reported a significant association between video length and GQS, JAMA, and DISCERN scores. Toprak et al. also identified a positive correlation between video length and both GQS and DISCERN scores.^34,35 This suggests that video length may provide an indication of reliability and quality.

In contrast to previous studies evaluating video-based educational content on spinal disorders, our study revealed multiple significant positive correlations between video metrics and quality indicators. Specifically, longer videos and those with higher numbers of views, likes, and comments were associated with higher DISCERN, JAMA, and GQS scores. This contrasts with Martyn et al., who found no correlation between video metrics and quality measures in cervical disc replacement content,³⁶ and with Yaradılmış et al., who reported only limited associations for spondylolisthesis-related videos.³⁰ This difference may be explained by the narrower clinical scope of our study, which focused solely on cervical disc herniation. The homogeneity of the topic likely enhanced consistency across video characteristics, allowing significant correlations between engagement metrics and quality indicators to emerge. However, these associations remain indirect and should not be interpreted as evidence of scientific reliability.

Our study has certain limitations. First, only videos in English were evaluated, which may have limited the assessment of cultural and geographical differences. In addition, content on other social media platforms related to cervical disc herniation was not analyzed. YouTube is a dynamic platform with a large volume of videos uploaded daily and continuously updated content, making it difficult to track comprehensively. Moreover, as YouTube's search algorithm prioritizes engagement metrics such as views, likes, and comments, analyzing only the top 100 videos for each search term may have led to an overrepresentation of popular or algorithmically promoted content rather than a fully representative sample of all available videos. For this reason, our analysis was limited to videos published within the last decade. Furthermore, there is currently no universally accepted national or international standardized tool for evaluating medical information presented in video-based formats. Therefore, in our study, we employed subjective assessment tools such as the JAMA, DISCERN, and GQS scoring systems, which have been widely used in previous research. Further studies involving the evaluation of a larger number of videos and the inclusion of multiple platforms are warranted.

In conclusion, in our study, the combined assessment of DISCERN, JAMA, and GQS scores indicated that most of the included videos were of low to moderate levels in terms of overall quality and reliability. This highlights the need for users to critically appraise the information presented in these videos and to approach them with caution as a source of medical knowledge. To improve the educational value of future content, professionals in physical medicine and rehabilitation should develop standardized, evidence-based videos that combine the strengths of existing evaluation frameworks—addressing DISCERN's focus on balanced treatment information, meeting JAMA benchmarks, and maintaining the clarity reflected by the GQS. Producing such high-quality content would promote accurate public understanding and support patients in making informed health decisions.

Footnotes

ORCID iD

Menekse Gok Simsek

Informed consent

This study did not involve human participants or patient data. Only publicly available YouTube videos were analyzed. Therefore, informed consent was not required.

Ethics approval

This study was approved by the Ethics Committee of Bakırkoy Dr Sadi Konuk Training and Research Hospital on December 18, 2023 (Decision No: 2023-24-10).

Author contributions

Significant contribution to conception and design: Menekse Gok Simsek

Data Acquisition: Menekse Gok Simsek

Data Analysis and Interpretation: Menekse Gok Simsek, Banu Aydeniz, Turker Suleymanoglu

Manuscript Drafting: Menekse Gok Simsek

Significant intellectual content revision of the manuscript: Emine Isil Ustun, Sibel Caglar

Have given final approval of the submitted manuscript (mandatory participation for all authors): Menekse Gok Simsek, Emine Isil Ustun, Banu Aydeniz, Turker Suleymanoglu, Sibel Caglar

Statistical analysis: Emine Isil Ustun, Menekse Gok Simsek

Obtaining funding: None

Supervision of administrative, technical, or material support: Sibel Caglar

Research group leadership: Sibel Caglar

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The datasets generated during the current study are not publicly available due to ethics restrictions but are available from the corresponding author on reasonable request.

References

Margetis

Al Khalili

. Cervical Disc Herniation. 2025.

Kim

Y-K

Kang

Lee

, et al. Differences in the incidence of symptomatic cervical and lumbar disc herniation according to age, sex and national health insurance eligibility: a pilot study on the disease’s association with work. Int J Environ Res Public Health 2018; 15: 2094. Epub ahead of print 25 September 2018.

Caridi

Pumberger

Hughes

. Cervical radiculopathy: a review. HSS J 2011; 7: 265–272.

Chou

W-YS

Klein

WMP

. Addressing health-related misinformation on social media. JAMA 2018; 320: 2417.

Almobarak

. A content analysis of YouTube videos on palliative care: understanding the quality and availability of online resources. Palliat Care Soc Pract 2024; 18: 26323524241231820.

Sui

Rhodes

. What to watch: practical considerations and strategies for using YouTube for research. Digit Health 2022; 8: 20552076221123708.

Bode

Vraga

. In related news, that was wrong: the correction of misinformation through related stories functionality in social Media. J Commun 2015; 65: 619–638.

Bode

Vraga

. See something, say something: correction of global health misinformation on social Media. Health Commun 2018; 33: 1131–1140.

Suarez-Lledo

Alvarez-Galvez

. Prevalence of health misinformation on social Media. Systematic Review. J Med Internet Res 2021; 23: e17187.

10.

Mohile

Jenkins

Markowitz

, et al. YouTube as an information source for lumbar disc herniations: a systematic review. World Neurosurg 2023; 172: e250–e255.

11.

Gokcen

Gumussuyu

. A quality analysis of disc herniation videos on YouTube. World Neurosurg 2019; 124: e799–e804.

12.

Kutluturk

Aykut

Durmus

. The use of online videos for vitreoretinal surgery training: a comprehensive analysis. Beyoglu eye Journal 2022; 7: 9–17.

13.

Sahin

Seyyar

. Assessing the scientific quality and reliability of YouTube videos about chemotherapy. Medicine (Baltimore) 2023; 102: e35916.

14.

Koo

. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016; 15: 155–163.

15.

Charnock

Shepperd

Needham

, et al. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health (1978) 1999; 53: 105–111.

16.

Silberg

Lundberg

Musacchio

. Assessing, controlling, and assuring the quality of medical information on the Internet: caveant lector et viewor–Let the reader and viewer beware. JAMA 1997; 277: 1244–1245.

17.

Bernard

Langille

Hughes

, et al. A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 2007; 102: 2070–2077.

18.

Jang

Bang

Park

, et al. Impact of changes in clinical practice guidelines for intra-articular injection treatments for knee osteoarthritis on public interest and social media. Osteoarthritis Cartilage 2023; 31: 793–801.

19.

Gupta

Beletsky

Shen

, et al. YouTube as a source of medical information about peripheral nerve stimulation. Neuromodulation 2025; 28: 1327–1331. Epub ahead of print 6 November 2024.

20.

Sun

Luo

Bian

, et al. Assessing the quality of online health information about breast cancer from Chinese language websites: quality assessment survey. JMIR Cancer 2021; 7: e25783.

21.

Kim

, et al. Analysis of YouTube-based therapeutic content for children with cerebral palsy. Children (Basel) 2024; 11: 814. Epub ahead of print 2 July 2024.

22.

Onder

Zengin

. Quality of healthcare information on YouTube: psoriatic arthritis. Z Rheumatol 2023; 82: 30–37.

23.

Cole

Bach

Theismann

, et al. Physician-led YouTube videos related to anterior cruciate ligament injuries provide higher-quality educational content compared to other sources. J ISAKOS 2025; 10: 100367.

24.

You

. Febrile seizure: what information can caregivers access through YouTube? Seizure 2021; 91: 91–96.

25.

Ayo-Ajibola

Davis

Theriault

, et al.

Evaluation of YouTube as A source for Graves’ disease information: is high-quality guideline-based information available?

OTO Open 2024; 8: e118.

26.

Gupta

Beletsky

Shen

, et al. YouTube as a source of medical information about peripheral nerve stimulation. Neuromodulation 2025; 28: 1327–1331. Epub ahead of print 6 November 2024.

27.

Crutchfield

Frank

Anderson

, et al. A systematic assessment of YouTube content on femoroacetabular impingement: an updated review. Orthop J Sports Med 2021; 9: 23259671211016340.

28.

Wang

Yan

, et al. YouTube online videos as a source for patient education of cervical spondylosis-a reliability and quality analysis. BMC Public Health 2023; 23: 1831.

29.

Bayram

Pınar

. Assessment of the quality and reliability of YouTube as an information source for transforaminal interbody fusion. Cureus 2023; 15: e50210.

30.

Yaradılmış

Evren

Okkaoğlu

, et al. Evaluation of quality and reliability of YouTube videos on spondylolisthesis. Interdisciplinary Neurosurgery 2020; 22: 100827.

31.

Rudisill

Saleh

Hornung

, et al. YouTube as a source of information on pediatric scoliosis: a reliability and educational quality analysis. Spine Deform 2023; 11: –9.

32.

Lama

Hartnett

Donnelly

, et al. YouTube as a source of patient information for cubital tunnel syndrome: an analysis of video reliability, quality, and content. Hand (N Y) 2024; 19: 986–994.

33.

Heisinger

Huber

Matzner

, et al. Online videos as a source of physiotherapy exercise tutorials for patients with lumbar disc herniation-A quality assessment. Int J Environ Res Public Health 2021; 18: 5815. Epub ahead of print 28 May 2021.

34.

Toprak

Tokat

. A quality analysis of nocturnal enuresis videos on YouTube. J Pediatr Urol 2021; 17: 449.e1–449.e6.

35.

Krakowiak

Rak

Krakowiak

, et al. YouTube as a source of information on carbon monoxide poisoning: a content-quality analysis. Int J Occup Med Environ Health 2022; 35: 285–295.

36.

Martyn

TLB

Baker

. Assessment of the quality of information of YouTube videos regarding cervical disc replacement. Int J Spine Surg 2022; 16: 272–277.