Abstract
Background
YouTube has become a widely used resource for patients and healthcare professionals seeking information on cervical disc herniation. However, the reliability and quality of online content show substantial variability.
Objective
To evaluate the reliability, informational content and quality of YouTube videos related to cervical disc herniation.
Methods
This cross-sectional study included YouTube videos related to cervical disc herniation, identified through searches using the terms “cervical disc herniation,” “disc herniation of cervical spine,” and “herniated cervical disc.” Video characteristics such as duration, views, likes, comments, and content creator profiles were recorded. Reliability and quality were assessed using the DISCERN instrument, the Journal of the American Medical Association (JAMA) benchmark criteria, and the Global Quality Scale (GQS).
Results
A total of 300 videos (100 for each search term) were initially retrieved. After excluding duplicates, non-English, irrelevant, or very short videos, 104 unique videos were included in the study. Of the included videos, 41.3% were uploaded by physicians, 30.7% by non-physician healthcare professionals, 14.4% by medical channels, 8.7% by healthcare institutions, and 4.8% by patients. The mean DISCERN score was 35.3 ± 7.6, the mean JAMA score was 2.4 ± 0.6, and the mean GQS score was 3.5 ± 0.8, indicating generally poor to fair quality. The median (range) scores were 35.0 (17.0–58.0) for DISCERN, 2.0 (1.0–4.0) for JAMA, and 4.0 (1.0–5.0) for GQS. According to JAMA, 55.8% of the videos scored 2 points, reflecting low reliability, while only 2.9% achieved the maximum score of 4. In GQS, 52.9% were rated as high quality, 38.5% as moderate, and 8.6% as low. Correlation analyses showed that longer video duration was positively associated with higher DISCERN (r = 0.41–0.57, p < 0.001), JAMA (r = 0.36, p < 0.001), and GQS scores (r = 0.35, p < 0.001).
Conclusion
Most of the included YouTube videos on cervical disc herniation demonstrated low to moderate reliability. These findings emphasize the need for healthcare professionals and patients to critically appraise such content, and they highlight the importance of developing higher-quality, evidence-based online educational resources.
Keywords
Introduction
Cervical disc herniation occurs as a result of degeneration or trauma of the intervertebral discs, leading to rupture of the annulus fibrosus and extrusion of the nucleus pulposus, which may compress the spinal cord and/or nerve roots. 1 It is a common condition particularly in the active population, often beginning with an acute painful phase and persisting with chronic symptoms. While it may remain asymptomatic in some individuals, it can manifest with a wide spectrum of clinical features, including neck pain, radiculopathy, and myelopathy.2,3
In recent years, the internet has become one of the most frequently used sources of health-related information. Both patients and healthcare professionals increasingly consult online platforms for educational and informational purposes. 4 The rapid expansion of smartphone and computer use has facilitated easy access to medical knowledge, with social media platforms playing an especially important role in this trend. YouTube, in particular, is among the most popular video-sharing platforms worldwide, with more than two billion monthly users across over 100 countries and 80 languages. Each minute, over 500 h of video content are uploaded, and users collectively spend more than one billion hours per day watching videos.5,6
Patients often seek health-related content on social media based on their individual interests and concerns. However, a growing body of literature highlights a major challenge: the widespread dissemination of misinformation and low-quality health content online.7,8 This issue has been identified as a significant public health concern, as misleading information can negatively influence patient decision-making and communication with healthcare providers. 9 The impact is especially profound for vulnerable patient groups, such as those experiencing unexplained chronic pain, who may be more likely to rely on unverified or anecdotal information found online.
Accordingly, evaluating the quality, informational, and reliability content of online resources—particularly on widely used platforms such as YouTube—has become an increasingly important area of research. Previous studies have investigated the quality of online information regarding various medical conditions10,11; however, to date, there is limited evidence focusing specifically on YouTube videos related to cervical disc herniation. Given the high prevalence of this condition and the widespread use of social media platforms for patient education, assessing the quality of such online content is essential for guiding patients and supporting evidence-based health communication.
Methods
Data collection
The study was designed as a cross-sectional analysis. On September 30, 2024, a search was conducted on the online video platform YouTube, according to Turkey local time, to simulate patient-oriented access to information, using the English terms “cervical disc herniation,” “disc herniation of cervical spine,” and “herniated cervical disc.” These keywords were selected based on their direct clinical relevance to the disease and their frequent use by patients seeking information on this topic. Broader or less specific terms were intentionally excluded because preliminary testing showed that they produced a high proportion of irrelevant videos and substantially reduced the number of eligible results, thereby limiting the feasibility of statistical analysis. Separate video lists were generated for each search term. To reduce potential bias, the simulation was performed with different keywords, and “Incognito mode” was enabled to prevent the influence of prior search history on new results. For each keyword, new user accounts were created, and only the “relevance” filter was applied in video searches. The search results were ranked according to YouTube's algorithm, which incorporates factors such as the number of views, viewer ratings, and upload date. From these ranked results, the first 100 videos for each keyword were retrieved and screened for eligibility to ensure consistency with YouTube's default relevance-based search order. If a duplicate video appeared within the top 100 results of a given search term, it was excluded and replaced by the next unique video in the ranking to ensure that each keyword contributed 100 distinct videos.
The inclusion criteria were defined as follows: (1) videos related to cervical disc herniation uploaded by physicians, non-physician healthcare professionals, healthcare institutions, medical channels, or patients; (2) acceptable audio and visual quality; (3) uploaded within the last 10 years; (4) video language in English. The exclusion criteria were defined as follows: (1) videos unrelated to the study topic; (2) videos in languages other than English; (3) duration shorter than 30 s;12,13 (4) inadequate audio or visual quality; (5) duplicate videos; (6) videos in the format of academic lectures or conference presentations.
Descriptive characteristics of the videos
The following data were recorded for the included videos: video title, URL, days since upload, view rate, video duration (seconds), content creator profile, content type, and the number of views, likes, and comments. The profiles of the content creators were categorized into five groups: (1) physicians, (2) non-physician healthcare professionals, (3) medical channels, (4) healthcare institutions, and (5) patients. Physicians were further classified according to their specialties: physical medicine and rehabilitation, neurosurgery, orthopedics, radiology, and sports medicine. Non-physician healthcare professionals included physiotherapists and chiropractic practitioners. Based on their content, videos were categorized into four groups: (1) diagnostic information, (2) conservative treatment, (3) surgical treatment, and (4) patient experiences. Videos classified as diagnostic information encompassed disease definition and symptoms, while those categorized under conservative treatment included exercise programs, injection therapies, physical therapy modalities, and chiropractic practices.
Evaluation parameters and video analysis
The video contents were evaluated by two independent physical medicine and rehabilitation specialists (BA, TS), each with at least 10 years of experience, using the DISCERN instrument, the Journal of the American Medical Association (JAMA) benchmark criteria, and the Global Quality Scale (GQS).
To minimize interpretation variability, both physicians participated in a calibration session before the evaluation to ensure consistent application of the DISCERN, JAMA, and GQS scoring criteria. Intra- and inter-observer reliability of the content validity index (CVI) calculations was assessed using intraclass correlation coefficients (ICC) with 95% confidence intervals (CI), based on a two-way mixed effects model for absolute agreement (average measures). The ICC analysis was performed on a randomly selected sample of 50 videos. The ICC values indicated excellent agreement between the two experienced physicians: 0.91 for DISCERN Part 1, 0.94 for DISCERN Part 2, 0.92 for DISCERN Part 3, 0.91 for the DISCERN total score, 0.91 for the JAMA score, and 0.93 for the GQS. 14
The DISCERN instrument is a widely recognized tool used by academics, healthcare professionals, and individuals to evaluate the conformity of health information sources to evidence-based medicine standards. Its structure, comprising three sections and an overall assessment question, enables a comprehensive analysis of both the content and the presentation of information. This is of considerable importance in assessing the ethical and scientific standards of patient-centered health information. The first part of the DISCERN instrument consists of eight questions that assess the reliability of health information, while the second part contains seven questions that evaluate the quality of information and the thoroughness with which treatment options are presented. The sixteenth question — “Based on the answers to all of these questions, rate the overall quality of the publication as a source of information about treatment choices” — represents the third part of the DISCERN instrument and provides an overall quality rating. The total DISCERN score corresponds to the sum of all 16 questions. Each item is rated on a scale from 1 to 5, with 1 representing the lowest and 5 the highest value. In the scoring system, the following ranges are used: scores between 16–26 indicate very poor quality, 27–38 indicate poor quality, 39–50 reflect moderate quality, 51–62 represent good quality, and 63–75 correspond to excellent quality of information. 15 The JAMA benchmark criteria constitute an established tool used to evaluate the quality of online medical information. These criteria consist of four key domains: authorship, attribution, disclosure, and currency. Each domain is scored as 0 or 1, yielding a maximum total score of 4. 16 The Global Quality Scale (GQS) is an instrument used to evaluate online health information and was developed for the purpose of providing an overall assessment of quality. This scale is particularly useful for assessing not only the overall quality of health-related content published on the internet but also the flow of information and ease of use. The GQS employs a 5-point scale to evaluate the quality of information. Scores are classified as follows: ≤2 points indicate low-quality content, 3 points indicate moderate quality, and ≥4 points indicate high-quality content. 17
Statistical analysis
All analyses were performed using the IBM SPSS Statistics software, version 24.0 (IBM Corp., Armonk, NY, USA). Descriptive findings were presented as frequencies and percentages for qualitative variables, and as mean, standard deviation, median, minimum, and maximum values for quantitative variables. The normality of continuous variables within independent groups was assessed using the Shapiro–Wilk test. Since most variables did not follow a normal distribution (p < 0.05), non-parametric tests were predominantly used for comparisons. For continuous variables that followed a normal distribution, comparisons of means were performed using one-way analysis of variance (ANOVA). When ANOVA revealed significant differences, post-hoc pairwise comparisons were performed with Bonferroni correction. For continuous variables not conforming to a normal distribution, the Kruskal–Wallis H test was applied. Pairwise comparisons were conducted with Bonferroni correction, and when the Kruskal–Wallis test indicated statistical significance, the Mann–Whitney U test was used for subsequent pairwise analyses. Correlations between continuous variables were assessed using Pearson's correlation coefficient for normally distributed data and Spearman's rank correlation test for non-normally distributed data. A p-value of <0.05 was considered statistically significant.
Results
A total of 300 videos (100 for each search term) were initially retrieved. Duplicate videos, non-English content, irrelevant or non-informative videos, videos shorter than 30 s, and conference or lecture recordings were excluded. After applying these exclusion criteria, 104 unique videos meeting the inclusion criteria were included in the final analysis. The sequential exclusion steps are also illustrated in Figure 1. Surgical treatment–related content was predominantly produced by physicians, followed by medical channels and healthcare institutions. In videos addressing conservative treatment, non-physician healthcare professionals and physicians were the primary contributors, whereas medical channels and healthcare institutions accounted for a lower proportion. Videos providing diagnostic information were mainly produced by healthcare institutions and physicians, followed by medical channels. The descriptive characteristics of the videos are presented in Table 1.

Flow diagram of the included and excluded videos in the study.
Descriptive characteristics of included YouTube videos (n = 104).
Comparison of video parameters by content type are showed in Table 2. When the content producer groups were evaluated, the video duration showed a statistically significant difference among the groups (p < 0.001). In pairwise comparisons, videos uploaded by patients, non-physician healthcare professionals, and physicians were found to have significantly longer durations compared to those uploaded by medical channels and healthcare institutions (all p < 0.05). No significant differences were observed among the groups in terms of time elapsed since video upload, viewing rate, number of comments, number of views, and number of likes (p = 0.28; p = 0.21; p = 0.30; p = 0.36; p = 0.64, respectively). A significant difference was found in the number of subscribers among the groups (p = 0.004). Specifically, the number of subscribers in the medical channel group was significantly higher than that of the patient group (p < 0.05).
Comparison of video parameters by content type.
: Kruskal—Wallis H.
Bold values indicate statistical significance.
The evaluation of DISCERN scores revealed that 14 (13.5%) of the videos were classified as very poor (score range 16–26), 55 (52.9%) as poor (score range 27–38), 30 (28.8%) as fair (score range 39–50), and 5 (4.8%) as good (score range 51–62). Notably, no video was categorized as excellent (score range 63–75). According to the JAMA score, 5 (4.8%) videos received a score of 1, 58 (55.8%) received a score of 2, 38 (36.5%) received a score of 3, and 3 (2.9%) received a score of 4. In GQS, 9 (8.6%) videos were categorized as low quality, 40 (38.5%) as moderate quality, and 55 (52.9%) as high quality. A more detailed analysis showed that 2 (1.9%) videos scored 1 point, 7 (6.7%) scored 2 points, 40 (38.5%) scored 3 points, 49 (47.1%) scored 4 points, and 6 (5.8%) videos scored 5 points.
Comparison of DISCERN, JAMA, and GQS across content producers are presented in Table 3. In the pairwise comparisons, a significant difference was observed in DISCERN Part 1 between patients and healthcare institutions (p = 0.04); and in DISCERN Part 3 between patients and medical channels (p = 0.002) as well as between medical channels and physicians (p < 0.001). Additionally, in the GQS, a significant difference was observed between patients and physicians (p = 0.03).
Comparison of DISCERN, JAMA, and GQS across content producers.
: One-way ANOVA.
: Kruskal—Wallis H.
Bold values indicate statistical significance.
Comparison of DISCERN, JAMA, and GQS across video content groups are showed in Table 4. In the pairwise comparisons, conservative treatment videos had higher DISCERN Part 1 scores than patient experience videos (p = 0.03). For DISCERN Part 2, both surgical and conservative treatment videos had higher scores than diagnostic information videos (p < 0.001 and p = 0.002, respectively). In DISCERN Part 3, surgical and conservative treatment videos also had higher scores than diagnostic information videos (all p < 0.001). Additionally, in the GQS, surgical treatment videos showed higher quality scores than patient experience and diagnostic information videos (p < 0.001 and p = 0.007, respectively), while conservative treatment videos scored higher than patient experience videos (p = 0.002). Also, correlation of video parameters with DISCERN, JAMA, and GQS scores are showed in Table 5.
Comparison of DISCERN, JAMA, and GQS across video content groups.
: One-way ANOVA.
: Kruskal—Wallis H.
Bold values indicate statistical significance.
Correlation of video parameters with DISCERN, JAMA, and GQS scores.
r = Spearman correlation coefficient.
Bold values indicate statistical significance.
Discussion
YouTube is currently one of the most widely used video-sharing platforms worldwide for accessing information. Consequently, many individuals visit this platform to obtain knowledge about existing health problems and potential solutions. However, a feedback mechanism for evaluating the reliability and credibility of the information provided in these videos, which are produced by content creators from different groups, is lacking.
The literature in physical medicine and rehabilitation includes studies assessing the informational content and reliability of YouTube videos on topics such as physiotherapy exercises, chronic pain, carpal tunnel syndrome, multiple sclerosis, and osteoarthritis.18,19 On YouTube, numerous videos are available across various fields of medicine. Sun et al. reported that while online information on breast cancer treatment was reliable, it was limited by a lack of originality and insufficient referencing. 20
In various studies analyzing videos related to psoriatic arthritis, anterior cruciate ligament injuries, fibromyalgia, and cerebral palsy, the majority of the content was found to be shared by healthcare professionals.21–24 In the study conducted by Juyeon Oh et al., 47% of the videos were reported to have been uploaded by physicians. Similarly, in our study, 41.3% of the analyzed videos were published by physicians. This finding suggests that YouTube is more frequently used by physicians for patient education. 24 In the study by Ayo-Ajibola et al. on Graves’ disease, 8% of the videos were uploaded by patients. In our study, similar to these findings, patients constituted the source with the lowest contribution in terms of uploaded videos. 25 Patients may be less likely to share videos due to privacy concerns, the sensitive nature of their medical conditions, and associated emotional challenges. The number of views of the videos included in our study was found to be higher than those reported in studies analyzing videos on peripheral nerve stimulation, cervical spondylosis, femoroacetabular impingement, lumbar spine manipulation techniques, and cervical disc replacement. This finding indicates the high level of interest in videos related to cervical disc herniation and suggests that such videos are frequently used by patients as a source of information.26–28
In the study by Gokcen et al., the authors used the more general English term “disc herniation” as a keyword in their search, and the evaluated content had a mean DISCERN score of 30.7 and a mean JAMA score of 1.8. 11 In addition, Bayram et al. evaluated YouTube videos related to transforaminal lumbar interbody fusion (TLIF) and reported mean GQS and JAMA scores of 1.82 ± 0.87 and 1.08 ± 0.80, respectively, indicating poor educational quality and limited reliability. 29 Similarly, Yaradılmış et al. showed that videos on spondylolisthesis achieved mean DISCERN, JAMA, and GQS scores of 35 ± 11.1, 2.7 ± 0.6, and 2.84 ± 1.05, respectively, again reflecting suboptimal informational value. 30 Consistent with these findings, our study also demonstrated that YouTube videos on cervical disc herniation exhibit low-to-moderate reliability and quality, suggesting that insufficiently referenced and non-standardized educational content remains a pervasive issue across spine-related topics. Rudisill et al. showed that YouTube videos on pediatric scoliosis, when analyzed using the JAMA score and GQS, had low reliability and educational quality. Similarly, Lama et al., in a comparable evaluation on cubital tunnel syndrome, reported that the videos contained poor and insufficient information and might provide incomplete knowledge to YouTube users.31,32 In our study, the mean DISCERN score was 35.3, indicating that the content was generally inadequate but close to a moderate level of quality. Analysis with the JAMA score revealed considerable variability in terms of reliability. Notably, 55.8% of the videos received a score of 2, placing them in the low-reliability category, which demonstrates that the majority of videos on cervical disc herniation available on this platform suffer from substantial deficiencies in reliability. Furthermore, only 2.9% of the videos achieved the maximum score of 4, thereby meeting high-quality standards, suggesting that most health-related content on YouTube fails to reach rigorous standards and that only a limited proportion provides users with reliable information. Additionally, 36.5% of the videos were categorized as offering moderate reliability, which indicates that while the platform holds some potential in terms of informational reliability, this potential remains limited. Overall, these findings demonstrate that, while a few videos meet moderate to high-quality standards, most YouTube videos on cervical disc herniation exhibit significant shortcomings in terms of reliability of information. Although in our study more than half of the videos were rated as high quality according to the GQS, this scale primarily reflects presentation flow and perceived usefulness rather than scientific reliability. Therefore, videos with clear narration and visually organized content could receive relatively high GQS scores despite containing incomplete or unreliable medical information. In contrast, DISCERN and JAMA are objective, checklist-based instruments are designed to assess the presence of evidence-based components, including source citation, authorship identification, risk disclosure, and discussion of treatment alternatives. Therefore, they tend to yield lower scores unless the content meets these explicit scientific criteria. This discrepancy illustrates the phenomenon of “polish over proof,” where visually appealing and well-structured videos achieve high subjective ratings despite lacking scientific transparency or evidence-based context. For clinical conditions such as cervical disc herniation, this divergence between perceived and actual reliability underscores that subjective quality scores like GQS alone may be insufficient for evaluating medical content and should therefore be interpreted in conjunction with objective reliability measures such as DISCERN and JAMA. In our study, the mean GQS score was slightly lower than the proportion of high-rated videos, which may be explained by the variability in informational depth among the videos. Those providing limited medical detail or focusing mainly on general descriptions rather than clinical aspects tended to receive lower scores, thereby reducing the overall mean despite the predominance of well-presented videos.
Evaluating the reliability and informational quality of different types of content is important for understanding which categories provide more useful information to viewers. This pattern may reflect that treatment-related videos are more frequently produced by healthcare professionals or established medical sources, which generally provide more structured, accurate, and evidence-based explanations. In contrast, patient experience videos often emphasize subjective narratives rather than content reliability, leading to lower reliability and quality scores, while diagnostic information videos generally present only brief overviews without sufficient depth or reference support.
Beyond these content-type differences, our findings revealed that most videos provided insufficient coverage of non-surgical management strategies, rehabilitation-based treatment options, and prognosis. Only a minority included content consistent with evidence-based physical therapy or clinical practice guidelines. Moreover, detailed analysis of the JAMA benchmark indicated that the lowest-scoring domains were authorship and referencing. Most videos failed to disclose the creator's professional identity or institutional affiliation, and only a few provided verifiable references or citations to scientific sources. These deficiencies reduce the transparency, traceability, and educational reliability of the content.
Enhancing professional engagement in online education would further improve digital health literacy, allowing patients seeking information about cervical disc herniation or neck-related symptoms to more accurately distinguish reliable, evidence-based resources from misleading or anecdotal content.
Heisinger et al. demonstrated that video length is a critical factor of reliability and quality. 33 Consistent with our findings, studies have shown that high-quality videos tend to be longer in duration than low-quality ones. Krakowiak et al. reported a significant association between video length and GQS, JAMA, and DISCERN scores. Toprak et al. also identified a positive correlation between video length and both GQS and DISCERN scores.34,35 This suggests that video length may provide an indication of reliability and quality.
In contrast to previous studies evaluating video-based educational content on spinal disorders, our study revealed multiple significant positive correlations between video metrics and quality indicators. Specifically, longer videos and those with higher numbers of views, likes, and comments were associated with higher DISCERN, JAMA, and GQS scores. This contrasts with Martyn et al., who found no correlation between video metrics and quality measures in cervical disc replacement content, 36 and with Yaradılmış et al., who reported only limited associations for spondylolisthesis-related videos. 30 This difference may be explained by the narrower clinical scope of our study, which focused solely on cervical disc herniation. The homogeneity of the topic likely enhanced consistency across video characteristics, allowing significant correlations between engagement metrics and quality indicators to emerge. However, these associations remain indirect and should not be interpreted as evidence of scientific reliability.
Our study has certain limitations. First, only videos in English were evaluated, which may have limited the assessment of cultural and geographical differences. In addition, content on other social media platforms related to cervical disc herniation was not analyzed. YouTube is a dynamic platform with a large volume of videos uploaded daily and continuously updated content, making it difficult to track comprehensively. Moreover, as YouTube's search algorithm prioritizes engagement metrics such as views, likes, and comments, analyzing only the top 100 videos for each search term may have led to an overrepresentation of popular or algorithmically promoted content rather than a fully representative sample of all available videos. For this reason, our analysis was limited to videos published within the last decade. Furthermore, there is currently no universally accepted national or international standardized tool for evaluating medical information presented in video-based formats. Therefore, in our study, we employed subjective assessment tools such as the JAMA, DISCERN, and GQS scoring systems, which have been widely used in previous research. Further studies involving the evaluation of a larger number of videos and the inclusion of multiple platforms are warranted.
In conclusion, in our study, the combined assessment of DISCERN, JAMA, and GQS scores indicated that most of the included videos were of low to moderate levels in terms of overall quality and reliability. This highlights the need for users to critically appraise the information presented in these videos and to approach them with caution as a source of medical knowledge. To improve the educational value of future content, professionals in physical medicine and rehabilitation should develop standardized, evidence-based videos that combine the strengths of existing evaluation frameworks—addressing DISCERN's focus on balanced treatment information, meeting JAMA benchmarks, and maintaining the clarity reflected by the GQS. Producing such high-quality content would promote accurate public understanding and support patients in making informed health decisions.
Footnotes
Informed consent
This study did not involve human participants or patient data. Only publicly available YouTube videos were analyzed. Therefore, informed consent was not required.
Ethics approval
This study was approved by the Ethics Committee of Bakırkoy Dr Sadi Konuk Training and Research Hospital on December 18, 2023 (Decision No: 2023-24-10).
Author contributions
Significant contribution to conception and design: Menekse Gok Simsek
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The datasets generated during the current study are not publicly available due to ethics restrictions but are available from the corresponding author on reasonable request.
