The pivotal role of video duration in health science popularization: A mixed-methods analysis integrating machine learning and fuzzy-set qualitative comparative analysis

Abstract

Background

Short video platforms have become important channels for the public to obtain health information, but the quality of health science popularization content varies greatly. Existing studies lack a comprehensive exploration of the determinants of video quality and their interaction mechanisms.

Objective

This study aimed to identify the key features influencing the quality of cerebrovascular disease health science popularization short videos and clarify their configurational effects.

Methods

Python web-crawling technology was used to collect health science popularization short videos on TikTok related to cerebrovascular diseases over the past year, and the video quality was evaluated using the Grade Quality Score (GQS) tool by two medical professionals. Eight machine learning models were constructed to identify key quality-related features. The joint effect of six features was analyzed for necessity and sufficiency by using the fuzzy set qualitative comparative analysis (fsQCA) method. Finally, the Kruskal-Wallis H test was employed to evaluate differences in quality among videos of varying duration.

Results

A total of 541 valid videos were collected. Most videos were posted by medical staff (77.27%), among which high-quality videos (with GQS > 3) accounted for 14.42%. The importance of video duration reached 30.5%, making it the most crucial feature affecting video quality. The fsQCA results indicated that short duration was one of the conditions for high-quality videos, with the optimal duration being 3 to 5 minutes.

Conclusions

Video duration was the main determinant of the quality of cerebrovascular disease health science popularization short videos. Improving the short-video communication skills of medical professionals and optimizing video duration are effective ways to enhance the quality of health science popularization content.

Keywords

health science popularization video quality video duration machine learning algorithms fuzzy-set qualitative comparative analysis

1. Introduction

The rapid growth of the internet has made social media an essential channel for the public to obtain information, especially in the field of health.¹ A growing number of individuals now rely on these platforms to seek health-related information.^2,3 At the same time, licensed and certified healthcare professionals are using platforms like Weibo, short-form video services, and WeChat official accounts to share knowledge about disease prevention, progression, and prognosis.^4–6 While this development has significantly improved the public’s ability to access health information, it has also boosted public health literacy and treatment adherence.^7,8 However, the quality of health content created by different users varies widely, even as the public continues to demand higher-quality information.

Short-form video has become an indispensable part of daily life for a significant segment of the population, with approximately 1.068 billion users in China.⁹ In particular, TikTok, as a widely popular short-form video platform with global reach, has emerged as a key channel for the public to access health information.^10,11 With comprehensible language and an entertaining style, short-form video platforms can rapidly disseminate complex medical knowledge, making it highly popular among viewers.¹² However, as open and interactive platforms, short-video websites host a vast number of health science popularization videos. While audiences have access to high-quality content, they are also exposed to low-quality and even pseudo-scientific health information.^13,14 The widespread dissemination of such misleading content not only distorts people’s perceptions of certain diseases but also undermines their capacity to make sound health decisions and adopt rational health behaviors.¹⁵ In contrast, high-quality health science popularization short videos can not only drive positive changes in public health behaviors but also help enhance public health literacy to a certain extent.^16,17 At present, there are a large number of studies focusing on the quality of health science popularization short videos, and the results show that the quality of health science popularization short videos on short video platforms is generally low.^18–20 Moreover, due to the inherent algorithmic recommendation mechanisms of these platforms,²¹ social interactions such as likes and shares of low-quality health science popularization short videos have further expanded their reach, thereby overshadowing the dissemination of high-quality counterparts.

Existing studies have confirmed that the quality of health science popularization short videos depends not only on the accuracy and comprehensiveness of health information content, but also on inherent video characteristics such as duration.²² Meanwhile, numerous studies have revealed that video quality is significantly positively correlated with communication performance indicators, including the number of shares, likes, and comments.^23,24 These findings have clarified the internal relationships among video characteristics, quality, and communication effects, providing an important basis for understanding the communication rules of short health education videos and offering significant reference value for optimizing video communication and improving health education effectiveness. However, the communication indicators concerned in the above studies are mostly formed after video release as gradually accumulated outcomes over time, and are easily interfered with by multiple factors such as video style and user preferences, thus exerting relatively limited guiding effects for creators during the video planning stage. Although existing research has preliminarily explored the correlations between some video characteristics and quality,^25,26 systematic investigations on the relationships between inherent characteristics definable before release, such as duration and emotional tendency, and video quality remain insufficient. In addition, current studies mostly adopt correlation analysis, which can only determine the direction of linear correlations between characteristics and quality,²⁷ can hardly quantify the relative importance of each characteristic in affecting video quality, and fail to provide support for the optimization of video creation. Based on the achievements and limitations of previous research, this study aims to deeply explore the significance of the inherent features of videos that can be defined before release for the quality of short health education videos, and to quantify the importance of key features. The relevant results not only provide a practical basis for video creators to optimize content quality, but also help more effectively identify and select high-quality health science popularization short videos.

Machine learning methods have shown outstanding performance in exploring key factors from health-related data and can serve as reliable analytical tools. Khan et al.²⁸ developed a random forest model based on linguistic and emotional feature extraction to identify COVID-19-related health misinformation and achieved an accuracy of 88.5%. Chen YF et al.²⁹ further applied text feature fusion combined with machine learning algorithms to detect derivative health rumors, reaching an accuracy rate of 97.01%. These studies have verified that fine-grained feature extraction can effectively improve the accuracy of health information identification. More importantly, machine learning approaches are also capable of quantifying the relative importance of influencing factors, which provides a feasible and effective way to further explore the intrinsic relationship between pre-defined video features and the quality of health science popularization short videos.

This study systematically combined eight machine learning models with fuzzy set qualitative comparative analysis (fsQCA) to identify the key determinants of the quality of health science popularization short videos on the TikTok platform. Unlike previous studies that only relied on correlation analysis or single model prediction, our comprehensive approach not only determined the most influential individual predictors but also revealed the configuration effects and sufficient conditions for generating high-quality videos. This methodological integration provides a more comprehensive understanding of how multiple features jointly influence video quality. Our research findings will provide practical references for audiences to distinguish and select high-quality health science popularization short videos, offer guidance for creators to optimize video design and content production, provide evidence-based strategies for short video platforms to improve health content review mechanisms and algorithm recommendation logic, and lay the foundation for the government to formulate standardized management policies for online health science popularization information.

Therefore, this study selected health science popularization short videos related to cerebrovascular diseases on the TikTok platform as the research object, systematically extracting the relevant features of these videos, including content features and publisher features. Subsequently, eight machine learning models were constructed to identify the key features that have a significant impact on the quality of health science popularization short videos. Additionally, the fsQCA method was integrated to explore the configuration effects among the most important features.

2. Methods

2.1. Data collection

Using Python web scraping technology, we collected short videos related to the following seven Chinese keywords from the Chinese version of the TikTok platform: “Cerebrovascular disease”, “cerebral apoplexy,” “Stroke”, “Cerebral infarction”, “Transient ischemic attack”, “Cerebral hemorrhage”, and “Subarachnoid hemorrhage”. We included all the videos related to these keywords that were posted between December 10, 2023, and December 10, 2024. Using these 7 keywords, 2,205 videos were retrieved from TikTok. After removing duplicates, 1,467 videos remained. Through manual screening to exclude videos unrelated to health science popularization, duplicate content, purely image videos, and non-existent videos, a total of 541 videos were finally included in this study. We collected video information, including video titles, account types, number of account followers, number of likes received by accounts, number of accounts followed, release area, release times, video duration, number of video likes, number of video favorites, number of video comments, and number of video shares.

2.2. Text preprocessing

Text extraction was conducted on the videos to facilitate subsequent processing. Specifically, SPSSAU software was employed to extract the text and dialogue content from the videos, converting the video materials into text format. Based on word cloud technology, the most frequently occurring single words and word combinations were extracted, thereby achieving the simplification and relevant visualization of the text data set derived from videos. Subsequently, textual emotion analysis was performed, resulting in the classification of the videos into five categories: negative, slightly negative, text-absent, slightly positive, and positive.

2.3. Manual evaluation

Two medical professionals independently scored the videos, using the GQS tool to assess their overall quality. The 5-point GQS enables systematic evaluation of video quality, with total scores ranging from 1 to 5, where lower scores indicate poor quality and higher scores indicate superior quality.³⁰ The specific scoring details for GQS were shown in Table 1. Before the assessment, the two experts were provided with a detailed explanation of the GQS scoring criteria to reduce errors caused by cognitive bias. Ultimately, the video quality score was taken as the average of the two experts’ scores.

Table 1.

Global quality scale.

No.	Item	Score
1.	Poor quality, poor flow, most information missing, not helpful for patients	1-5
2.	Generally poor, some information given but of limited use to patients	1-5
3.	Moderate quality, some important information is adequately discussed	1-5
4.	Good quality good flow, most relevant information is covered, useful for patients	1-5
5.	Excellent quality and excellent flow, very useful for patients	1-5

The continuous GQS score was further dichotomized with a threshold of GQS ≥ 3 to categorize videos into high-quality and low-quality groups. This cutoff was rationally determined according to the moderate boundary of the 1–5 rating scale and referenced to existing published standards for health science popularization short video quality grading. A stricter threshold of GQS ≥ 4 would markedly reduce the sample size of high-quality videos and aggravate class imbalance, whereas a lenient threshold of GQS ≥ 2 would incorporate low-quality and irrelevant content and weaken the discriminative performance of the prediction model. Setting GQS ≥ 3 fully balances sample distribution rationality, content misrepresentations, and practical applicability for quality screening.

2.4. Pre-release feature selection

We selected six features before the video was released: the features of the publisher (account type, release area, account influence) and the content features (video duration, release time, emotional score). The account influence was calculated using the following formula³¹: U_infl= $\frac{n_{f a n s} + n_{l i k e n}}{n_{f a n s} + n_{l i k e n} + n_{f o l l o w e r s} + 1}$ . In this formula, n_fans denotes the number of account followers, n_liken denotes the number of likes received by the account, and n_followers denoted the number of accounts followed by the publisher.

2.5. Validation of machine learning algorithms

Eight machine learning models were constructed using R software, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), K Nearest Neighbors (KNN), Adaptive Boosting (AdaBoost), and Gradient Boosting Decision Tree (GBDT). Accuracy, Precision, Recall, and F1-score were employed as the evaluation metrics for model performance. The Area Under the Receiver Operating Characteristic Curve Score (ROC AUC Score) was used for testing the probability prediction ability of the algorithms and was the most important overall measure for evaluating the machine learning models: the higher the ROC AUC Score is, the better the probabilities calibrated. All models were implemented in R software with fixed and uniform hyperparameters, which were not optimized via grid search, random search, or Bayesian optimization. This experimental setup guarantees full reproducibility under identical experimental conditions and enables fair comparison of model performance. A global random seed was set throughout the entire modeling process, and the number of computing threads was limited to eliminate random fluctuations and computational bias. Detailed hyperparameter configurations are as follows: the random forest model adopts 500 decision trees, enables variable importance calculation, and operates in single-threaded mode. XGBoost is configured with a maximum tree depth of 3, a learning rate of 0.1, 50 boosting iterations, and a binary logistic regression objective function. The support vector machine uses a linear kernel (vanilladot) with a penalty coefficient of 1, and outputs class probabilities for model evaluation. KNN is set with k = 5 nearest neighbors. AdaBoost runs 30 iterations with a learning rate of 0.1. GBDT adopts 100 decision trees with a tree depth of 3, a shrinkage coefficient of 0.1, and uses the Bernoulli distribution as the loss function. Logistic regression and decision trees are implemented with default algorithm parameters.

2.6. Statistical analysis

Data analysis was mainly conducted using R software (version 4.5.2) and SPSS 27.0, with machine learning analysis performed by R software. This study developed eight machine learning algorithms for video quality prediction. To prevent data leakage, the data set was randomly split into an 80% training set and a 20% independent test set before any resampling operation. To ensure the stability of the results, the random split was independently repeated 10 times. After the data split, the SMOTE oversampling technique was strictly applied to the training set only, while the independent test set retained its original class distribution without any resampling or data modification. Model performance was evaluated by accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve (AUC), with all metrics presented as the mean ± standard deviation of 10 repeated tests. The overall performance differences among models were assessed using the Friedman test, followed by paired Wilcoxon tests with Holm–Bonferroni correction for post hoc pairwise comparisons, with significant differences indicated by letter annotations. Normalized feature importance was calculated for each model, and an AUC-weighted integrated feature importance was generated to identify key predictors. All modeling processes were executed with a fixed random seed to ensure complete reproducibility of the results.

The multi-feature collaborative configuration driving high-quality health science popularization short videos was completed using the QCA package in R software. The result variable was defined as GQS motional score, account type, release region, and release time period. For continuous variables, the 95th percentile, 50th percentile, and 5th percentile were respectively used as the anchor points for complete membership, intersection, and complete non-membership. For categorical variables, membership was assigned based on the proportion of high-quality videos in each category. A necessity analysis was conducted for all conditions, with a consistency threshold set at 0.9 to identify necessary conditions. A truth table was constructed, with a case frequency threshold set at 2, an original consistency threshold set at 0.8, and a PRI consistency threshold set at 0.7. All rows meeting the above thresholds and with an output of 1 were extracted from the truth table, and the original coverage and unique coverage of each combination were calculated. Non-redundant combinations with a unique coverage greater than zero were retained as the final results, and their consistency and coverage were recorded. To ensure the robustness of the results, multi-dimensional robustness tests were conducted: adjusting the calibration percentiles of continuous variables (90th percentile, 50th percentile, 10th percentile, and 75th percentile, 50th percentile, 25th percentile), raising the original consistency threshold to 0.85, and increasing the case frequency threshold to 3, respectively repeating the above analysis process.

The Kruskal-Wallis H test was employed to assess the quality disparity among health science short videos of varying duration.

3. Results

3.1. Basic information about the videos

A total of 541 health science popularization short videos related to cerebrovascular diseases were collected from the TikTok platform over the past year, and the basic information of these videos was summarized. Regarding account types, videos published by medical staff accounted for the largest proportion (77.27%), followed by Individual accounts (7.95%). Hospital accounts and media accounts each accounted for 6.65%, while government accounts published the fewest videos, representing a mere 1.48%. In terms of release area, North China had the highest number of published videos, accounting for 40.48%. East China and Central China followed with 19.96% and 14.79%, respectively. Northeast China, Southwest China, and South China each contributed 6.65%, 6.65%, and 5.92% of the videos, while Northwest China had the lowest proportion at 5.55%. For video release time, the period from 12:00 to 18:00 had the largest number of videos, accounting for 43.81%, while the period from 00:00 to 06:00 had the fewest videos, accounting for 0.93%. Other relevant characteristics of the videos were presented in Table 2.

Table 2.

Video feature statistics.

Characteristics	Features	Median (IQR)/N (%)
Dissemination characteristics	Likes	520(118,10652.5)
	Favorites	131(22,3742.5)
	Comments	34(8,344)
	Shares	127(16.5,3536.5)
Content characteristics	Video duration	89(58,139.5)
	Emotional score	-0.929628350(-0.998725403,0.0017354643)
	Release time
	0:00-6:00	5(0.93%)
	6:00-12:00	135(24.95%)
	12:00-18:00	237(43.81%)
	18:00-24:00	164(30.31%)
Publisher characteristics	Account type
	Individual	43(7.95%)
	Medical staff	418(77.27%)
	Hospital	36(6.65%)
	Media	36(6.65%)
	Government	8(1.48%)
	Release area
	Northeast	36(6.65%)
	East China	108(19.96%)
	Central China	80(14.79%)
	North China	219(40.48%)
	South China	32(5.92%)
	Northwest	30(5.55%)
	Southwest	36(6.65%)
	Account influence	0.9998517259(0.9986956461,0.9999965244)

Two experts conducted quality assessments on the 541 videos in accordance with the GQS scoring criteria. The consistency between the two raters was good, with an ICC of 0.831, and the 95% confidence interval was 0.800 - 0.858. The results indicated that the overall quality of the videos was relatively low: 126 videos (23.29%) scored below 3 points, corresponding to low quality, 337 videos (62.29%) were of moderate quality, and only 78 videos (14.42%) scored above 3 points, representing high quality.

Text processing was performed on the video transcripts. The results revealed that conceptual terms such as “blood vessel”, “stroke”, and “aneurysm” appeared frequently. Specifically, terms related to disease manifestations—including “hemorrhage”, “prevention”, and “treatment”—were also commonly observed, while disease-related factors like “hypertension”, “diet”, and “exercise” were present as well, with the word cloud presented in Figure 1. The detailed word frequency table can be found in the supplementary materials. In terms of emotional scores, 64.7% of the videos scored ≤ 3 points and exhibited negative emotions, while only 5.18% of the videos scored > 3 points and demonstrated positive emotions.

Figure 1.

Word cloud of text content from health science popularization short videos on cerebrovascular diseases.

3.2. Comparison of machine learning models’ performance

The four core performance metrics—Accuracy, Precision, Recall, and F1 Score—for eight machine learning models (Random Forest, XGBoost, SVM, Logistic Regression, Decision Tree, KNN, AdaBoost, and Gradient Boosting Decision Tree) were presented in Table 3. The ROC curves of different models were shown in Figure 2, and Figure 3 presented the heatmap of the models.

Table 3.

The performance of machine learning algorithms.

Model	Accuracy	Precision	Recall	F1	AUC
AdaBoost	0.806±0.024	0.853±0.033	0.899±0.032	0.875±0.019	0.802±0.058
Decision Tree	0.796±0.036	0.842±0.046	0.903±0.038	0.870±0.024	0.780±0.049
GBDT	0.809±0.017	0.854±0.035	0.906±0.028	0.878±0.012	0.811±0.048
KNN	0.788±0.037	0.830±0.040	0.908±0.025	0.867±0.024	0.771±0.056
Logistic Regression	0.819±0.033	0.874±0.031	0.892±0.028	0.882±0.022	0.799±0.049
Random Forest	0.812±0.024	0.856±0.030	0.905±0.028	0.880±0.017	0.803±0.053
SVM	0.826±0.032	0.889±0.030	0.881±0.020	0.885±0.020	0.814±0.039
XGBoost	0.806±0.014	0.854±0.030	0.898±0.022	0.875±0.010	0.813±0.055

Figure 2.

Roc curves (10-run average on test set).

Figure 3.

Model performance metrics (10-run average).

There was no statistically significant difference in the performance of these models (P > 0.05). The detailed performance results and comparative rankings of the eight machine learning models were visually presented in Figure 4.

Figure 4.

Model performance: Auc with 95% confidence interval.

3.3. The importance of AUC-weighted aggregated features in all models

The weighted ensemble feature importance based on the AUC of all eight models was shown in Figure 5. The results showed that the importance of video duration was the highest, accounting for 30.5%. The importance of the release region was the second, accounting for 16.6%. The type of account had an importance of 15.5%, the emotional score accounted for 14.9%, and the influence of the account had an importance of 14.6%. In contrast, the importance of the release time was relatively low.

Figure 5.

Feature importance analysis. (a) Weighted ensemble importance. (b) Individual model importance.

3.4. Analysis of the combined effects of the key features

The results of the necessity test on the influence of video duration, account influence, emotional score, release time, release region, and account type on high-quality health science popularization short videos were presented in Table 4. The results showed that no single feature’s compliance exceeded the threshold of 0.9, indicating that no single factor constitutes a necessary condition for high-quality health science popularization short videos. This confirmed that high-quality health science popularization short videos were influenced by the interaction of multiple features rather than being directly determined by a single factor.

Table 4.

Necessity analysis of a single feature.

Condition	Consistency	Coverage
∼Video duration	0.890	0.513
Video duration	0.668	0.487
∼Release time	0.765	0.234
Release time	0.766	0.766
∼Account type	0.747	0.228
Account type	0.772	0.772
∼Account influence	0.764	0.641
Account influence	0.768	0.359
∼Emotional score	0.737	0.412
Emotional score	0.787	0.588
∼Release region	0.742	0.227
Release region	0.773	0.773

“∼” represents negation.

Sufficiency analysis identified two equifinal configurations. Both require short duration, low account influence, advanced account type, high-activity regions, and high-activity time periods. The only difference between them lay in the sentiment score (low for Configuration 1 and high for Configuration 2). The overall solution coverage rate was 0.397, and the consistency was 0.880. The specific details were shown in Table 5. Robustness tests confirmed the stability of the core findings, with the overall coverage rate ranging from 0.381 to 0.414 and the consistency ranging from 0.880 to 0.898. Specific details were presented in Table 6.

Table 5.

Analysis of the configurations for videos with QRS ≥ 3.

Condition	Configuration 1	Configuration 2
Video duration	⊗	⊗
Account Influence	⊗	⊗
Emotion score	⊗	●
Account type	●	●
Release region	●	●
Release time	●	●
Raw coverage	0.220	0.284
Unique coverage	0.113	0.284
Consistency	0.868	0.865
Solution coverage	0.397
Solution consistency	0.880

Note. “●” indicates the presence of a condition, meaning its calibrated fuzzy-set membership score is equal to or greater than 0.5. “⊗” indicates the absence of a condition, meaning its membership score is below 0.5.

Table 6.

Results of robustness test.

Configuration	Changed thresholds form 95th, 50th, 5th to 90th, 50th, 10th	Changed thresholds form 95th, 50th, 5th to 75th, 50th, 25th	Changed case frequency thresholds from 2 to 3 cases	Changed the consistency score from 0.8 to 0.85
∼Video duration∼Account influence∼Emotional scoreAccount typeRelease region*Release time	√		√	√
∼Video duration∼Account influenceEmotional scoreAccount typeRelease region*Release time	√		√	√
Overall solution coverage	0.381	0.414	0.397	0.397
Overall solution consistency	0.882	0.898	0.880	0.880

Note. √ indicates the existence of a solution; blank indicates that no solution exists. “*” represents “AND”. It connects different conditions and indicates that these conditions must be combined together.

3.5. The optimal duration for high-quality health science popularization short videos

To further explore the quality of health science popularization short videos of different durations and identify the most suitable duration, we conducted a segmented comparison of health science popularization short videos of various durations. The specific comparison results were presented in Figure 6. The quality of health science popularization short videos with durations of 1 to 3 minutes, 3 to 5 minutes, and over 5 minutes was all superior to that of those with a duration of less than 1 minute (P < 0.001). The quality of health science popularization short videos with a duration of 3 to 5 minutes was better than that of those with a duration of 1 to 3 minutes (P < 0.001).

Figure 6.

Segmented comparison of video quality across different durations.

4. Discussion

This study systematically explored the pre-features that affect the quality of health science popularization short videos on cerebrovascular diseases, and comprehensively utilized methods such as machine learning modeling and fuzzy set qualitative comparative analysis (fsQCA) to clarify the key features influencing video quality and their interaction mechanisms. Factors such as video duration and releasing region were the main features of high-quality short health science popularization videos.

4.1. Discrepancy between content producers and video quality

In this study, health science short videos with a Grade Quality Score (GQS) below 3 were classified as low-quality. The results indicated that although accounts from medical staff comprised the largest proportion of videos (77.27%), only 14.42% of these were rated as high-quality. This finding revealed a mismatch between the professional skills of medical professionals and the dissemination effect of health science popularization short videos. While medical professionals possess specialized knowledge, they may lack the skills to adapt complex scientific information to the short video format.³² Cerebrovascular disease knowledge is inherently complex and requires sufficient duration to be explained clearly and comprehensibly. In contrast, the platform’s algorithm inherently favors concise and emotionally engaging content.³³ By contrast, low-quality or even false health science popularization videos typically employ sensational language to capture attention.^34,35

Meanwhile, the platform’s algorithmic recommendation exacerbated this issue.³⁶ Videos released during peak user hours (12:00–18:00), which account for 43.81% of all video releases, gained more exposure. However, due to heavy clinical workloads, medical staff may be unable to align their video release times with periods of high user activity. Furthermore, regional disparities reflected the uneven distribution of medical communication resources. While developed regions produced more content, they failed to create videos of higher quality.

4.2. Textual and emotional patterns in science communication

Word cloud analysis showed frequent use of terms like “blood vessel,” “stroke,” and “aneurysm,” indicating that videos prioritize disease-specific concepts. In contrast, the word cloud from the study by Hongyu Wu et al.¹¹ Non-alcoholic fatty liver disease (NAFLD) health science videos on TikTok leaned more toward terms related to lifestyle interventions, such as “diet” and “reversible.” In terms of quality, the overall excellence rate of videos in our study was lower than that in the NAFLD research, reflecting the greater challenges in producing high-quality science videos that balance rigor and accessibility in the more specialized field of cerebrovascular diseases.

The distribution of emotional scores for videos of different qualities indicated that videos of poorer quality tend to have lower emotional scores. This fear-inducing communication strategy can stir up negative emotions among the public and may cause excessive psychological pressure and fear for those who are fighting the disease or have just been diagnosed.³⁷ Moderate fear appeals also raise self-protection awareness,³⁸ as exemplified by the classic case of graphic warning images on cigarette packages triggering negative emotional responses to promote self-protective motivation and action. Emotional scores exhibited a weak correlation with user engagement. This weak relationship indicated that users’ interaction with health science popularization videos was not driven solely by emotion, as practical value, such as actionable prevention advice, also played a crucial role. Although high-quality videos provide more accurate information and have a more neutral or rational emotional tone, they struggle to compete with fear-inducing false content in terms of capturing immediate attention.

4.3. Machine learning model performance and feature importance

This study employed eight classic machine learning models to predict the quality of health science popularization short videos on cerebrovascular diseases, including Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, K-Nearest Neighbor, AdaBoost, GBDT, and XGBoost. All models performed well on ten independent, randomly divided test sets, with the area under the receiver operating characteristic curve (AUC) exceeding 0.75 for each. There was no statistically significant difference in AUC among the models. Therefore, model selection should be based more on interpretability and computational efficiency rather than pursuing minor improvements in AUC. This approach aligns with the conclusion of Ding et al.,³⁹ who found that traditional machine learning models can provide higher interpretability while maintaining predictive performance and at a lower training cost. The machine learning models used in this study can also be trained quickly in a regular computing environment and directly output feature importance. The basic hyperparameters of each model, such as the number of trees, maximum depth, learning rate, and kernel function, have been publicly reported to ensure reproducibility. As Cao et al.⁴⁰ emphasized, strictly adhering to methodological norms is crucial for enhancing the credibility of machine learning research. Although this study did not conduct extensive hyperparameter searches for each model, it still provides practical and interpretable machine learning evidence for predicting the quality of health science popularization videos with a moderate sample size and clear features.

Feature importance analysis derived from the weighted AUC ensemble of all models indicates that video duration is the most critical predictor. This finding was consistent with our correlation analysis and the results reported by Rongguang Ge et al.,²⁴ which documented a positive correlation between video length and quality. Furthermore, release region, account type, emotional score, and account influence were also assigned relatively high importance scores. From the perspective of account type distribution, videos published by medical professionals account for the largest proportion. This distribution suggested that professional medical personnel were the primary producers of high-quality video content, and their professional knowledge backgrounds might directly improve the accuracy and credibility of the content, which explained the high importance of account type in the prediction models. In terms of releasing region, North China had the largest number of published videos, while Northwest China had the fewest. This geographical distribution characteristic was highly consistent with the regional disparities of medical resources and economic development in China. North China, especially Beijing, hosts the country’s top-tier medical institutions and medical education resources. East China and Central China boast developed economies and dense populations, where the dissemination of online medical information is more active. In contrast, Northwest China has relatively insufficient medical resources and, accordingly, has lower production capacity for high-quality health science popularization videos. The uneven regional distribution further confirms the importance of the publishing region feature. This characteristic not only reflects the geographical agglomeration effect of medical resources but may also affect the applicability of video content to audiences in different regions. These results are consistent with social cognitive theory,⁴¹ which posits that users generally evaluate content based on the authority or popularity of the content source. In the domain of medical videos, videos published by medical professionals from regions with developed medical resources inherently carry higher credibility signals and thus are more likely to be regarded as high-quality content by users. In comparison, releasing time showed relatively low predictive importance, which may be attributed to the fact that releasing time has no inherent logical association with the intrinsic quality of video content, and only reflects users’ uploading habits or peak traffic periods of the platform. Therefore, compared with attributes directly related to content quality and source credibility, such as video duration and account origin, contextual factors, such as releasing time, have significantly weaker explanatory power.

4.4. Identification and equivalence analysis of configuration paths

This study found that the main configuration of high-quality health science popularization short videos includes five key conditions, including short duration, low account influence, high account type (such as medical professional accounts), high-activity release regions, and high-activity release time periods. Among them, the core driving role of short duration is particularly prominent. Existing empirical studies have shown that the completion rate of 15–30 second short videos is significantly higher than that for longer videos, and the platform algorithm accordingly gives stronger recommendation weighting.⁴² Meanwhile, short videos, those with a duration of less than 60 seconds, have a much higher information density than long videos. The camera switches more frequently, resulting in the core information being presented upfront. These pieces of evidence support the results of this study. Controlling the duration not only benefits creators in obtaining algorithmic recommendations but also enhances users’ efficiency in absorbing health information. Additionally, the emergence of low account influence as a core condition is a counterintuitive but positive finding, indicating that even if the creator’s fan base is limited, as long as the content is concise and powerful, and is released by a highly credible account type (such as a hospital or official science popularization accounts) in active regions or time periods, high-quality videos can still be produced. For short video users, this result means they will have the opportunity to access more high-quality health information from medical professional accounts. Short video creators do not need to overly pursue the accumulation of followers but should focus on optimizing duration, account verification, and release timing to improve content quality.

This study identified two equivalent configurations, which differed only in the level of emotional intensity. One required a low emotional score, meaning a calm and objective presentation. The other demanded a high emotional score, indicating an enthusiastic and engaging expression. This interchangeability suggests that emotional intensity is not a decisive factor for high-quality health science popularization videos. Although previous research has indicated that in general social media content,⁴³ audiences often prioritize emotional stimulation and emotional resonance over the authenticity of information or depth of thought, and rational cognition often gives way to emotional preferences, in the specific field of health science popularization, this study found that there are two equivalent paths for emotional scores, high and low, suggesting that the clarity of fact transmission and scientific accuracy may be more important than the intensity of emotions, thus complementing the traditional view that emotional content is more easily spread. At the same time, the influence of the accounts in all configurations was at a low level, once again confirming that the quality of the content itself can make up for the lack of the creator’s traffic foundation.

4.5. The importance of controlling the duration of health science popularization short videos

For high-quality health science popularization short videos, choosing the appropriate duration is of great importance. Although the above configuration analysis indicated that short durations had advantages in completion rate and recommendation weighting, overly short durations often lead to fragmented knowledge presentation, making it difficult to systematically explain the prevention or rehabilitation knowledge of diseases. Such fragmented content not only weakens the audience’s ability for in-depth thinking and rational judgment but also impairs memory effects.⁴⁴ More seriously, the continuous information flow causing frequent context switching significantly damages individual prospective memory, that is, the ability to remember to execute a certain intention in the future.⁴⁵ Our research results further suggested that the optimal duration for truly high-quality health science popularization videos should be between 3 and 5 minutes. This is mainly because a duration that is too short cannot fully explain the causes, symptoms, and countermeasures of diseases, affecting the completeness and scientific nature of the content. On the other hand, a duration that is too long may exceed the audience’s attention span, reducing their willingness to like, collect, or share. Therefore, we call on creators of health science popularization short videos, especially medical professionals, to consciously control the video duration within 3 to 5 minutes to balance information density and knowledge systematicity. At the same time, short video platforms should also optimize their recommendation mechanisms, giving appropriate preferences to short videos with moderate duration and complete content, and avoiding excessive rewards for fragmented content. By reasonably controlling video duration, creators can not only improve video quality but also help the public more efficiently and systematically acquire knowledge on disease prevention and health management, thereby promoting a substantial improvement in health literacy.

4.6. Limitations

This study had several limitations. First, the data were exclusively derived from videos on the Chinese TikTok platform collected over the past year, and it remains unclear whether the findings can be generalized to other platforms or extended to longer time periods. Second, the GQS scores rely on subjective ratings from two experts. Although the inter-rater reliability was relatively high, individual bias cannot be ruled out. Third, the model only utilizes ex-ante features available before content release and excludes ex-post user behavior data. While this design avoids reverse causality, it also limits the model’s capability to leverage user feedback signals. Finally, this study only includes Chinese-language health science popularization short videos, and its applicability to contexts with other languages, cultures, or healthcare systems remains to be investigated.

5. Conclusion

This study systematically identified the main determinants of the quality of cerebrovascular disease-related health science popularization short videos on TikTok by integrating machine learning algorithms and fuzzy set qualitative comparative analysis (fsQCA). The results showed that all eight machine learning models achieved robust predictive performance, and no statistically significant differences were observed among the models. Feature importance and configuration effect analysis indicated that video duration was a key feature affecting video quality, and short duration was one of the core conditions for high-quality health education short videos. The optimal duration range for such high-quality videos was determined to be 3 to 5 minutes. Overall, this study confirmed that optimizing video duration is a core strategy for improving the quality of health education short videos on cerebrovascular diseases.

Based on these findings, creators of health science popularization short videos should keep the video duration within 3 to 5 minutes, as this time range is considered the optimal duration for information dissemination. Creators should also enhance the credibility of their accounts by prominently displaying their professional qualifications. For users, they can choose to watch videos from creators affiliated with more authoritative hospitals or those from regions with more developed economies. They should give priority to videos from certified medical professionals, hospitals, or government health accounts. Since the time of posting is relatively less important, users do not need to overly focus on when the creator posts the video.

Supplemental material

Supplemental material - The pivotal role of video duration in health science popularization: A mixed-methods analysis integrating machine learning and fuzzy-set qualitative comparative analysis

Supplemental material for The pivotal role of video duration in health science popularization: A mixed-methods analysis integrating machine learning and fuzzy-set qualitative comparative analysis by Xueping Jiao, Xingyu Liu, Mengting Liu, Yueting Wang, Shuhan Yang, Xueqin Yang, Yuhuan Xie, Yufang Guo, Fanghong Yan and Yanan Zhang in Digital Health.

Footnotes

Acknowledgements

The authors thanked the Gansu Provincial People’s Hospital for supporting this study.

ORCID iD

Xueping Jiao

Ethical considerations

This study has been approved by the Medical Ethics Committee of the School of Nursing, Lanzhou University, with the approval number: LZUHLXY20250052. All research data were derived from publicly accessible health science popularization short videos on TikTok, and did not involve personal privacy information.

Author contributions

Yanan Zhang conceived and designed the study. Xueping Jiao and Xingyu Liu collected the videos. Mengting Liu and Xueqing Yang collected the characteristics of the videos and authors. Yueting Wang, Shuhan Yang, Yuhuan Xie, and Yufang Guo were responsible for reviewing, classifying, and scoring the videos. Xueping Jiao and Xingyu Liu analyzed and visualized the data. Fanghong Yan managed the project, and Yanan Zhang provided financial support. Xueping Jiao and Xingyu Liu wrote the original draft. Yanan Zhang, Xueping Jiao, and Xingyu Liu reviewed and edited the manuscript. All the authors contributed to manuscript writing and editing, and approved the final draft for submission.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Project of Gansu University Teachers Innovation Fund [No.2025B-016], the 2025 Research Project of the Chinese Nursing Association [No. ZHKYQ202516], and the General Project of the Gansu Provincial Department of Science and Technology [No. 26JRRA195].

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated or analyzed during this study are available from the corresponding author on reasonable request.*

Supplemental material

Supplemental material for this article is available online.

References

Zhang

Wen

Liang

, et al. How the public uses social media wechat to obtain health information in china: a survey study. BMC Med Inform Decis Mak 2017; 17(Suppl 2): 66. https://doi.org/10.1186/s12911-017-0470-0

Bujnowska-Fedak

Węgierek

. The Impact of Online Health Information on Patient Health Behaviours and Making Decisions Concerning Health. Int J Environ Res Public Health 2020; 17(3): 880. https://doi.org/10.3390/ijerph17030880

Song

Zhao

Yao

, et al. Serious information in hedonic social applications: affordances, self-determination and health information adoption in TikTok. Journal of Documentation 2021; 78(4): 890–911. https://doi.org/10.1108/jd-08-2021-0158

Hazzam

Lahrech

. Health Care Professionals' Social Media Behavior and the Underlying Factors of Social Media Adoption and Use: Quantitative Study. J Med Internet Res 2018; 20(11): e12035. https://doi.org/10.2196/12035

Wang

Zhang

Cao

, et al. Quality and content evaluation of thyroid eye disease treatment information on TikTok and Bilibili. Sci Rep 2025; 15(1): 25134. https://doi.org/10.1038/s41598-025-11147-y

Surani

Hirani

Elias

, et al. Social media usage among health care providers. BMC Res Notes 2017; 10(1): 654. https://doi.org/10.1186/s13104-017-2993-y

Chirumamilla

Gulati

. Patient Education and Engagement through Social Media. Curr Cardiol Rev 2021; 17(2): 137–143.

Jayasinghe

Kanmodi

Jayasinghe

, et al. Assessment of patterns and related factors in using social media platforms to access health and oral health information among Sri Lankan adults, with special emphasis on promoting oral health awareness. BMC Public Health 2024; 24(1): 1472. https://doi.org/10.1186/s12889-024-19008-5

The 56th Statistical Report on China's Internet Development. China Internet Network Information Center. [Available from: https://cnnic.cn/n4/2025/0721/c326-11327.html

10.

Douyin Hotspot and Jìliàng Data . 2023 Douyin Annual Observatory Report Beijing, China: ByteDance. 2023. [Available from: https://file.digitaling.com/eImg/uimages/20240202/1706853402934638.pdf

11.

Peng

, et al. Comparative analysis of NAFLD-related health videos on TikTok: a cross-language study in the USA and China. BMC Public Health 2024; 24(1): 3375. https://doi.org/10.1186/s12889-024-20851-9

12.

Xiao

Min

, et al. Public's preferences for health science popularization short videos in China: a discrete choice experiment. Front Public Health 2023; 11: 1160629. https://doi.org/10.3389/fpubh.2023.1160629

13.

Cheng

. Trust of Information during the Dissemination of Popular Science Web Videos in the New Media Era. Comput Intell Neurosci 2022; 2022: 1746472–1746478. https://doi.org/10.1155/2022/1746472

14.

Skafle

Nordahl-Hansen

Quintana

, et al. Misinformation About COVID-19 Vaccines on Social Media: Rapid Review. J Med Internet Res 2022; 24(8): e37367. https://doi.org/10.2196/37367

15.

Chowdhury

Khalid

Turin

. Understanding misinformation infodemic during public health emergencies due to large-scale disease outbreaks: a rapid review. Z Gesundh Wiss 2023; 31(4): 553–573. https://doi.org/10.1007/s10389-021-01565-3

16.

Rodriguez

Winnett

Wong

, et al. Feasibility and Acceptability of an Adolescent-Friendly Rap Video to Improve Health Literacy Among HIV-Positive Youth in Urban Peru. AIDS Behav 2021; 25(4): 1290–1298.

17.

Ganjekar

, et al. Study on awareness and management based health action using video intervention (SAMBHAV) for postpartum depression among mothers attending immunisation clinic in a tertiary medical college hospital: Study protocol. PLoS One 2024; 19(4): e0301357. https://doi.org/10.1371/journal.pone.0301357

18.

Zhang

Yuan

Zhang

, et al. Short video platforms as sources of health information about cervical cancer: A content and quality analysis. PLoS One 2024; 19(3): e0300180. https://doi.org/10.1371/journal.pone.0300180

19.

Sun

Liu

Zhang

, et al. Health information analysis of cryptorchidism-related short videos: Analyzing quality and reliability. Digit Health 2025; 11: 20552076251317578. https://doi.org/10.1177/20552076251317578

20.

Liao

Huang

Lai

, et al. The status quo of short videos as a source of health information regarding bowel preparation before colonoscopy. Front Public Health 2024; 12: 1309632. https://doi.org/10.3389/fpubh.2024.1309632

21.

Metzler

Garcia

. Social Drivers and Algorithmic Mechanisms on Digital Media. Perspect Psychol Sci 2024; 19(5): 735–748. https://doi.org/10.1177/17456916231185057

22.

Wang

, et al. Video in Chinese short video sharing platforms as a source of information on sleep disorders: A cross-sectional content analysis study. Digit Health 2026; 12: 20552076261415944. https://doi.org/10.1177/20552076261415944

23.

Liu

Yang

Tan

, et al. Web-Based Video Platforms as Sources of Information on Body Image Dissatisfaction in Adolescents: Content and Quality Analysis of a Cross-Sectional Study. JMIR Form Res 2025; 9: e71652. https://doi.org/10.2196/71652

24.

Dai

Gong

, et al. The Quality and Reliability of Online Videos as an Information Source of Public Health Education for Stroke Prevention in Mainland China: Electronic Media-Based Cross-Sectional Study. JMIR Infodemiology 2025; 5: e64891. https://doi.org/10.2196/64891

25.

Ren

, et al. The quality and reliability of short videos about premature ovarian failure on Bilibili and TikTok: Cross-sectional study. Digit Health 2025; 11: 20552076251351077. https://doi.org/10.1177/20552076251351077

26.

Guo

Ding

Zhang

, et al. Quality Assessment of Radiotherapy Health Information on Short-Form Video Platforms of TikTok and Bilibili: Cross-Sectional Study. JMIR Cancer 2025; 11: e73455. https://doi.org/10.2196/73455

27.

Nie

Ning

Ding

, et al. Quality Analysis of Stroke-Related Videos on Video Platforms: Cross-Sectional Study. JMIR Form Res 2025; 9: e80458. https://doi.org/10.2196/80458

28.

Khan

Hakak

Deepa

, et al. Detecting COVID-19-Related Fake News Using Feature Extraction. Front Public Health 2021; 9: 788074. https://doi.org/10.3389/fpubh.2021.788074

29.

Chen

. Research on Derivative Online Health Rumors Identification Modal Based on Text Feature Fusion. Library and Information Service 2023; 67(14): 73–84.

30.

Zheng

Tong

Wan

, et al. Quality and Reliability of Liver Cancer-Related Short Chinese Videos on TikTok and Bilibili: Cross-Sectional Content Analysis Study. J Med Internet Res 2023; 25: e47210. https://doi.org/10.2196/47210

31.

Zhao

Pang

Shi

. Construction of Health Information Portrait and the Identification of False Health Information by Integrating Social Sensing Data with Publisher's Prior Knowledge. Documentation, Information & Knowledge 2024; 41(06): 141–54+65.

32.

Zhu

Liu

Zhang

. Examining the Persuasive Effects of Health Communication in Short Videos: Systematic Review. J Med Internet Res 2023; 25: e48508. https://doi.org/10.2196/48508

33.

Chen

Shi

. Analysis of Algorithm Recommendation Mechanism of TikTok. International Journal of Education and Humanities 2022; 4: 12–14. https://doi.org/10.54097/ijeh.v4i1.1152

34.

Luo

Ling

. Seeing Isn't Believing: Narrative Characteristics and Social Psychology of Short Video Rumors about Health Care——An Empirical Study Based on Toutiao Rumor Database. Journal of University of Chinese Academy of Social Sciences 2021; 6: 93–103.

35.

Sun

Deng

, et al. A Study on the Motivation of Short Video Users' False Health Information Adoption from the Perspective of Coping Theory. Information and Documentation Services 2023; 44(06): 100–110.

36.

Jiang

Zhou

Wang

. Mechanisms of Recommendation Algorithms Driving the Value Creation of Content Platforms: Relevance or Causality? Foreign Economics & Management 2025; 47(02): 3–19.

37.

Stolow

Moses

Lederer

, et al. How Fear Appeal Approaches in COVID-19 Health Communication May Be Harming the Global Community. Health Educ Behav 2020; 47(4): 531–535. https://doi.org/10.1177/1090198120935073

38.

Kok

Peters

Kessels

LTE

, et al. Ignoring theory and misinterpreting evidence: the false belief in fear appeals. Health Psychol Rev 2018; 12(2): 111–125. https://doi.org/10.1080/17437199.2017.1415767

39.

Ding

Wang

Zhang

, et al. Trade-offs between machine learning and deep learning for mental illness detection on social media. Sci Rep 2025; 15(1): 14497. https://doi.org/10.1038/s41598-025-99167-6

40.

Cao

Dai

Wang

, et al. Machine Learning Approaches for Depression Detection on Social Media: A Systematic Review of Biases and Methodological Challenges. Journal of Behavioral Data Science 2025; 5(1): 67–102. https://doi.org/10.35566/jbds/caoyc

41.

Bandura

. Social cognitive theory: an agentic perspective. Annu Rev Psychol 2001; 52: 1–26. https://doi.org/10.1146/annurev.psych.52.1.1

42.

Zannettou

Nemes-Nemeth

Ayalon

, et al. Analyzing User Engagement with TikTok's Short Format Video Recommendations using Data Donations. In: Mueller FF, Kyburz P, Williamson JR, et al., eds. CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems (ACM) 2024; Article 731: 1–16. https://doi.org/10.1145/3613904.364243

43.

. Research on the Reshaping of Information Reception Behavior of Media Audiences by Short Video Platforms. Journal of New Media and Economics 2025; 2(5): 2025–2070. https://doi.org/10.62517/jnme.202510512

44.

Wei

Liu

Wang

, et al. Fragmented learning from short videos modulates neural activity and connectivity during memory retrieval. npj Science of Learning 2026; 11(1): 15. https://doi.org/10.1038/s41539-025-00399-y

45.

Chiossi

Haliburton

, et al. Short-Form Videos Degrade Our Capacity to Retain Intentions: Effect of Context Switching On Prospective Memory. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems2023. Association for Computing Machinery, pp. 1–15. https://doi.org/10.1145/3544548.3580778

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.28 MB