Abstract
Objective:
To examine the empirical correspondence between data-driven pain phenotypes and classical Traditional Medicine (TM) syndromes in a cohort with specific low back pain (LBP). We aimed to identify distinct phenotypes associated with musculoskeletal and spinal disorders using Latent Tree Models (LTM) and evaluate their alignment with traditional diagnostic constructs.
Methods:
This cross-sectional study was conducted at a tertiary specialty hospital in Ho Chi Minh City, Vietnam, recruiting 260 patients with specific LBP resulting from identifiable musculoskeletal and spinal disorders. The LTM was used to identify pain phenotypes from clinical symptom data. Risk factors for each phenotype were analyzed by multivariable logistic regression.
Results:
The LTM identified three primary phenotypes. A “Stiffness/Cold/Heavy Pain” phenotype was strongly associated with radiating pain and a diagnosis of herniated disc. A “Dull, Localized Pain” phenotype was associated with bilateral pain. A “Sharp, Stabbing Pain” phenotype was strongly associated with a history of smoking. These data-driven clusters showed a clear alignment with the classical TM syndromes of Cold-Dampness, Kidney Deficiency, and Blood Stasis, respectively.
Conclusion:
LTM is an effective tool for identifying data-verifiable clusters that correspond to TM syndromes in patients with specific LBP. Linking these phenotypes to clinical risk factors and pathoanatomical diagnoses provides an evidence-based framework for stratifying heterogeneous LBP related to spinal disorders. This approach offers a potential tool to move toward mechanism-based therapeutic strategies, bridging traditional observation with modern data science.
Introduction
Low back pain (LBP) is the leading cause of disability worldwide, affecting hundreds of millions of people and creating a significant socioeconomic burden through health care costs and lost productivity.1–3 In Vietnam, LBP is the second most common type of pain among the population. 4 Despite its high prevalence, the diagnosis and management of LBP remain challenging. Western clinical practice guidelines, such as those from the National Institute for Health and Care Excellence (NICE) and the American College of Physicians (ACP), classify the majority of cases (approximately 90%) as “non-specific low back pain” (NSLBP), meaning no clear pathoanatomical cause can be identified. 5 However, many patients presenting with LBP in specialty clinical settings possess identifiable musculoskeletal and spinal disorders, yet their qualitative pain experiences remain highly heterogeneous and difficult to stratify effectively.
To address this diagnostic heterogeneity, Traditional Medicine (TM) offers an alternative framework. For centuries, TM has classified LBP not by anatomical cause, but by distinct symptom-based patterns (or syndromes) that describe the qualitative nature of the patient’s experience. 6 Common syndromes include “Cold-Dampness” (stiff, cold, and heavy pain), “Kidney Deficiency” (dull and chronic ache), and ‘Blood Stasis” (sharp, fixed, and stabbing pain).6–8 While this syndrome-based approach has the potential to stratify clinically heterogeneous populations, its integration into evidence-based practice is limited by inherent subjectivity. It remains to be explored whether these historically described patterns correspond to statistically verifiable patient clusters in contemporary clinical populations.7,8
However, the primary barrier to integrating this approach into modern medicine is its inherent subjectivity, as syndrome differentiation relies heavily on practitioner experience and lacks standardized, data-driven criteria. It remains to be explored whether these historically described patterns correspond to statistically verifiable patient clusters in contemporary clinical populations. This study seeks to bridge this gap by employing an advanced statistical method, the Latent Tree Model (LTM), to investigate the empirical correspondence between these theoretical symptom patterns and data-verifiable pain phenotypes in patients with specific spinal disorders.9,10 We hypothesize that this data-driven approach will provide an objective basis for these symptom clusters and reveal associated clinical risk factors, moving beyond subjective assessment.
Therefore, the objectives of this study were to: (1) apply LTM analysis to identify data-driven pain phenotypes from clinical symptoms in patients with specific LBP related to musculoskeletal and spinal disorders; (2) examine the empirical alignment between classical TM syndrome classifications and these data-driven phenotypes; and (3) investigate the demographic and clinical risk factors associated with each identified phenotype to understand their potential underlying mechanisms and clinical utility.
By analyzing a pathoanatomically defined cohort, we aim to provide an evidence-based framework for classifying heterogeneous LBP presentations. This approach offers a potential tool to stratify symptom patterns, moving toward mechanism-based therapeutic strategies for both specific spinal disorders and the broader NSLBP population in future applications.
Materials and Methods
Study design and setting
This was a cross-sectional study conducted at the Inpatient and Outpatient departments of a tertiary specialty hospital in Ho Chi Minh City, Vietnam. Data were collected over a 5-month period, from January 1 to May 31, 2023. The study was reviewed and approved by the Ethics Council in Biomedical Research at University of Medicine and Pharmacy at Ho Chi Minh City (Decision No. 06/HĐĐĐ-ĐHYD, dated January 5, 2023), and all participants provided written informed consent after being fully informed of the study’s purpose and procedures.
Participants
We recruited a convenience sample of 260 patients diagnosed with LBP. The sample size was determined by the number of eligible and consenting patients available during the recruitment period; as this was an exploratory study designed to generate hypotheses, no a priori sample size calculation was performed. The sample size of 260 was considered adequate for the initial exploratory phase of this novel application of LTM in this context, with the understanding that findings would require validation in larger cohorts. Inclusion criteria were (1) patients aged 18 years or older, (2) an underlying cause of LBP from musculoskeletal and spinal disorders (e.g., herniated disc, spinal degeneration, spondylolisthesis, and spinal stenosis), (3) ability to respond to questions, and (4) provision of written informed consent.
Data collection and variables
A 29-item questionnaire was developed based on a systematic review of the classical and modern TM literature, university textbooks, and influential monographs (Appendices: 1–3). To ensure the rigor of data collection and the structural assessment of the identified symptom clusters, the following process was implemented:
Content validity
A panel of three TM experts (with >10 years of experience) performed content validation. The average Scale-level Content Validity Index (S-CVI/Ave) was 0.97 for relevance and 0.92 for clarity.
Structural assessment
The latent structure of the clinical symptoms was explored through LTM analysis. This approach identified statistical dependencies within the dataset, providing a data-driven basis for symptom grouping rather than a formal psychometric validation of the instrument.
Data consistency
To ensure standardized data collection, all interviews were conducted face-to-face by trained TM doctors. The training protocol focused on standardizing the interview process and minimizing interviewer bias; however, this procedural measure was intended for quality control and does not replace formal inter-rater reliability assessment.
Statistical analysis
Latent tree model
LTM was selected as the primary analytical framework over traditional Latent Class Analysis (LCA) because it relaxes the strict assumption of local independence. While LCA posits that observed variables are conditionally independent within a latent class, LTM accounts for the fact that clinical symptoms in specific LBP are often pathologically or biologically interdependent. By modeling these interdependencies as a hierarchical tree structure, LTM captures complex phenotypes more effectively than traditional flat clustering methods.9–11
An LTM is a probabilistic graphical model that allows for the modeling of hierarchical relationships between variables. In our model, the collected clinical symptoms are the “observed variables,” while the underlying pain phenotypes are the “latent variables” that the model seeks to infer.9–11 As this was an exploratory study focused on hypothesis generation, the statistical stability of the identified clusters was assessed using the total mutual information (TMI) metric rather than formal cross-validation methods. TMI serves as an indicator of the strength of statistical association between latent variables and observed symptoms within the model.
The analysis process involved the following steps:
Data preparation: Symptom data from 260 patients were encoded in a binary format (1 = present and 0 = absent). Model training: The Kongming Lantern software and the Expansion, Adjustment, and Structural Simplification algorithm were used to construct the LTM. The model strength was assessed using TMI; values >0.5 were interpreted as suggestive of a stable statistical structure, while values approaching 0 indicated a negligible structure. Model interpretation and mapping: The initial LTM identified eight latent variables (Y0–Y7). These were aligned with classical TM syndromes by identifying “anchor symptoms” (e.g., “Sharp, Stabbing Pain” for “Blood Stasis”) as established in the foundational TM literature. This approach evaluated the empirical correspondence of how secondary symptoms statistically clustered around predefined “anchor symptoms,” providing a data-driven framework to examine the relevance of historical TM descriptors in a modern pathoanatomical sample. Model visualization: Five LTM diagrams were generated to illustrate the hierarchical relationships and cluster distributions.
Risk factor analysis
To identify potential demographic and clinical predictors associated with each preliminary pain phenotype in patients with specific LBP, we performed multivariable Firth’s penalized logistic regression using Stata 20.0. This method was specifically selected to ensure the stability of parameter estimates and to handle potential issues of quasi-complete separation or sparse data within our modest sample size.
Recognizing the hypothesis-generating nature of this exploratory, single-center study, no formal adjustments for multiple comparisons (e.g., Bonferroni correction) were applied. While this approach prioritizes the identification of potentially relevant clinical associations in this novel context, it acknowledges an increased risk of Type I error. Consequently, the observed associations should be interpreted as suggestive of potential correlations rather than definitive evidence of clinical entities.
The results are presented as adjusted odds ratios (AORs) with 95% confidence intervals (CIs). A p-value of <0.05 was considered indicative of statistical significance. To avoid bias from insufficient sample size, the single patient presenting with Damp-Heat pattern symptoms was excluded from the multivariable analysis. Missing data for predictor variables were addressed using multiple imputation.
Patient and public involvement
Patients and the public were not directly involved in the design, conduct, reporting, or dissemination plans of this research. However, the research is fundamentally centered on patient-reported symptoms. To ensure the clinical relevance and appropriateness of the data collection tool, the questionnaire was developed based on a systematic review of medical literature and validated by a panel of senior TM physicians with extensive experience in treating these patient populations.
Results
Participant flow and characteristics
A total of 260 patients meeting the eligibility criteria were recruited for the study. All eligible patients who were approached consented to participate and completed the questionnaire, resulting in a final analysis sample of 260. During the survey, of the 29 predefined symptoms, 28 were present in our sample. In addition, four new symptoms not on the original list were reported by patients. The final LTM analysis was therefore conducted on a total of 32 symptom variables.
The demographic and clinical characteristics of the study participants are detailed in Table 1. The cohort was predominantly female (82.7%) and older, with 55.4% of participants aged 60 years or older. A majority reported engagement in occupations involving heavy manual labor (57.3%) and were classified as overweight based on BMI (57.3%). The most common primary diagnoses were lumbar disc herniation (41.2%) and lumbar spinal degeneration (40.0%).
Demographic and Clinical Characteristics of Study Participants (N = 260)
“Other” primary diagnoses include spondylolisthesis, scoliosis, and spinal stenosis. Percentages are rounded to one decimal place.
LBP, low back pain.
The clinical features of the pain are summarized in Table 2. Bilateral pain was the most common presentation (83.1%), and most patients experienced continuous pain (76.5%). The most frequently reported pain quality was “stiffness or a cold and heavy sensation” (44.2%), followed by dull pain (29.2%) and sharp/stabbing pain (26.2%). Notably, “hot-burning pain,” a characteristic sometimes associated with inflammatory patterns, was reported by only one patient (0.4%).
Clinical Characteristics of Low Back Pain Reported by Participants (N = 260)
LTM analysis and pain phenotype identification
The LTM analysis of 32 symptom variables suggested the existence of eight statistically significant latent variables (Y0–Y7), representing distinct symptom clusters (Fig. 1). These latent variables were aligned with classical TM syndromes based on patterns of symptom co-occurrence and mutual exclusion, as detailed in Table 3. This hierarchical structure identified a multi-layered organization of symptom clusters, providing a more granular classification than traditional flat clustering.

General latent tree model of symptom clusters in low back pain. This diagram illustrates the overall hierarchical relationships between the eight latent variables (Y0–Y7) and the 32 observed clinical symptoms as determined by the LTM analysis across the entire patient cohort. LTM, Latent Tree Models.

Latent tree model for the “Dull, Localized Pain” phenotype. This model visualizes the symptom cluster for the phenotype characterized by dull and localized pain. The thickness of the lines represents the strength of the association. This phenotype corresponds to the classical “Kidney Deficiency” syndrome.

Latent tree model for the “Stiffness/Cold/Heavy Pain” phenotype. This model shows the specific symptom cluster defining the phenotype characterized by stiffness, cold, and heavy sensations. The thickness of the lines represents the strength of the association between the latent phenotype and the observed symptoms. This phenotype corresponds to the classical “Cold-Dampness” syndrome.

Latent tree model for the “Sharp, Stabbing Pain” phenotype. This model displays the symptom cluster for the phenotype characterized by sharp, stabbing pain and its association with injury-related onset. The thickness of the lines represents the strength of the association. This phenotype corresponds to the classical “Blood Stasis” syndrome.

Latent tree model for the “Hot/Burning Pain” pattern. This diagram illustrates the weak model for the pattern characterized by hot or burning pain, which was rare in the study population. This pattern is conceptually similar to the classical “Damp-Heat” syndrome.
Analysis Table of Co-Present and Excluded Variables Corresponding to Clinical Syndromes in a Traditional Medicine Context
CM syndrome 1–4 refers to Cold-Dampness, Damp-Heat, Kidney Deficiency, Blood Stasis. Dots represent clinical features commonly present in CM syndromes. P(s0), P(s1), and P(s2) represent the conditional probabilities of the latent variable states.
Manifest variable (symptoms) reached 95% CMI.
aThe latent variables are presented starting from the bottom of the latent tree model.
Bold solid dots (•) indicate the statistical presence or high conditional probability association of the respective clinical features (co-present or excluded variables) within each specific CM syndrome (1–4) derived from the Latent Tree Analysis.
Separate LTMs were constructed for each cluster to assess the statistical dependency using the TMI metric (Table 4). Three potential phenotypes were identified as showing stable statistical structures (TMI >0.50):
Latent Tree Model for Traditional Medicine Syndrome Models
TMI, total mutual information; PMI, Pairwise Mutual Information; CMI, Conditional Mutual Information.
The “Dull, Localized Pain” Phenotype: The LTM analysis identified a cluster (TMI = 0.56622) primarily defined by “dull pain” and “localized pain” consistent with the classical “Kidney Deficiency” syndrome (Fig. 2).
The “Stiffness/Cold/Heavy Pain” Phenotype: This model demonstrated high statistical dependency (TMI = 0.5383) and was defined by the core symptom “stiffness or a cold and heavy sensation”. This preliminary phenotype appears to align with the “Cold-Dampness” syndrome in TM (Fig. 3).
The “Sharp, Stabbing Pain” Phenotype: This cluster showed moderate statistical dependency (TMI = 0.506) and was characterized by “sharp pain” and “onset after injury”. It aligns with the “Blood Stasis” syndrome (Fig. 4).
The “Hot/Burning Pain” pattern was statistically negligible (TMI = 0.0813) due to only one report, and was therefore excluded from further predictive analysis (Fig. 5).
Factors associated with pain phenotypes
To ensure the stability of our estimates and address potential issues of data separation identified in preliminary analyses, we performed multivariable Firth’s penalized logistic regression to identify predictors for each of the three primary pain phenotypes. This method provides more reliable estimates in the presence of sparse data or separation. The results presented in Tables 5, 6, and 7 are from this robust analysis.
Factors Associated with Dull Pain Phenotype
Values in bold indicate statistically significant associations with a p-value of less than 0.05 ( p < 0.05).
AOR of >1 indicates a greater likelihood of dull pain. “Ref” indicates the reference group for comparison.
Factors Associated with Sharp/Stabbing Pain Phenotype
Values in bold indicate statistically significant associations with a p-value of less than 0.05 ( p < 0.05).
AOR of >1 indicates a greater likelihood of sharp/stabbing pain. “Ref” indicates the reference group for comparison.
Factors Associated with Stiffness/Cold/Heavy Pain Phenotype
Values in bold indicate statistically significant associations with a p-value of less than 0.05 ( p < 0.05).
AOR of >1 indicates a greater likelihood of stiff/cold/heavy pain. “Ref” indicates the reference group for comparison.
For the “Dull pain” phenotype: The odds of experiencing dull pain were significantly higher in patients with bilateral pain (AOR: 9.01; 95% CI: 1.23 to 65.99). Conversely, the odds were significantly lower in patients with radiating pain (AOR: 0.016; 95% CI: 0.002 to 0.10) or a diagnosis of lumbar disc herniation (AOR: 0.0073; 95% CI: 0.0011 to 0.0565) when compared to spinal degeneration.
For the “Sharp pain” phenotype: A history of smoking was a strong predictor, significantly increasing the odds of sharp pain (AOR: 3.56; 95% CI: 1.51 to 8.36). Patients with other spinal conditions (e.g., spondylolisthesis) also had higher odds compared to those with spinal degeneration (AOR: 3.12; 95% CI: 1.37 to 7.09).
For the “Stiffness/cold/heavy pain” phenotype: The odds were significantly higher in patients with radiating pain (AOR: 6.07; 95% CI: 3.33 to 14.14) and those with a diagnosis of lumbar disc herniation (AOR: 3.73; 95% CI: 1.74 to 7.98) compared to spinal degeneration. A history of smoking or infection was associated with significantly lower odds of this pain type (AOR: 0.30; 95% CI: 0.11 to 0.80 for both variables).
Discussion
This study successfully applied a data-driven method (LTM) to stratify a heterogeneous LBP cohort into three distinct phenotypes. The primary contribution here is the demonstration of empirical correspondence between these derived clusters and classical TM syndromes (Cold-Dampness, Kidney Deficiency, Blood Stasis). By linking these phenotypes to specific clinical risk factors, our findings indicate that these traditional symptom patterns possess a statistical basis in clinical data. This moves the classification of LBP symptoms beyond purely subjective assessment and offers actionable insights for clinical management.
The specific context of our study—a tertiary specialty hospital—likely influenced the identified phenotypes. Unlike primary care settings where LBP is often acute and self-limiting, 1 our cohort represents a more severe and chronic spectrum of the disease, evidenced by the high prevalence of lumbar disc herniation and spinal degeneration. While this pathoanatomically defined setting provided a robust environment to link symptom clusters with biological mechanisms, we acknowledge that these patterns reflect specific LBP rather than strictly NSLBP. 3 However, mapping these mechanisms in patients with identifiable disorders is a crucial step before extrapolating to populations without clear structural causes, providing a necessary baseline for future research.
Clinical implications and pathophysiological insights
Our analysis provides a data-driven bridge between subjective symptoms and potential pathophysiology. The use of LTM allowed us to move beyond simple frequency-based clustering to reveal the deep structural organization of symptoms. By identifying how secondary symptoms statistically gravitate toward core “anchor” symptoms, the model provides a more stable stratification of the specific LBP population, offering an evidence-based bridge to qualitative TM syndromes.
The “Sharp, Stabbing Pain” phenotype (Blood Stasis) and Microvascular dysfunction: A key finding with direct clinical implications is the strong association between smoking and the “sharp pain” phenotype (AOR: 3.56). Smoking is a well-established cause of microcirculatory impairment, endothelial damage, and tissue ischemia, which often manifests as sharp, stabbing pain.12–15 Therefore, the TM concept of “Blood Stasis” can be interpreted as a precise clinical descriptor for a state of microvascular dysfunction. For a rheumatologist, this suggests that an NSLBP patient presenting with this specific pain quality, especially if they smoke, may warrant investigation or management focused on vascular health, moving beyond a simple “non-specific” label.
The “Stiffness/Cold/Heavy Pain” phenotype (Cold-Dampness) and Radiculopathy: The strong link between this phenotype and both radiating pain (AOR: 6.07) and herniated disc (AOR: 3.73) is equally significant. Nerve root compression is known to affect A-delta nerve fibers, which transmit cold sensation.16,17 This finding implies the “cold” sensation is not only metaphorical but also a potential neurophysiological descriptor of A-delta fiber pathology, aiding clinicians in understanding the neuropathic quality of pain.
The “Dull, Localized Pain” phenotype (Kidney Deficiency) and Degeneration: This phenotype, strongly associated with bilateral, non-radiating pain in patients with spinal degeneration aligns impressively with the clinical picture of chronic, non-specific LBP related to age-associated degenerative processes.5,18 This reinforces the TM theory that Kidney Deficiency is a systemic, debilitating process often related to aging, rather than an acute structural problem.7,8
Phenotypic stratification of LBP and implications for future research
The most significant contribution of this study is its provision of a systematic, data-driven approach to stratify clinically diverse LBP populations. 19 While our framework is developed within a cohort presenting musculoskeletal and spinal disorders (specific LBP), these symptom-mechanism linkages offer a valuable template for future research aiming to stratify the more complex heterogeneous populations, including those often classified as NSLBP. Clinicians can use these data-driven pain qualities (dull, sharp, and heavy/cold) and their associated risk factors (smoking and disc herniation) to form more nuanced diagnostic hypotheses.
Furthermore, this phenotypic stratification may help resolve existing controversies in therapeutic literature. Guidelines from bodies like the ACP and the UK’s NICE often show conflicting results for therapies like acupuncture.2,3,5,20 Our study suggests this may be because trials frequently treat the heterogeneous LBP population as a single entity without considering underlying sensory phenotypes. It is plausible that a therapy is highly effective for one phenotype (e.g., “Blood Stasis”) but not another. Future clinical trials should test interventions targeted at these specific, data-driven phenotypes, potentially leading to more personalized treatment strategies.
Strengths and limitations
This study has several strengths, including the first application of LTM to LBP in a Vietnamese population and the identification of pathophysiological links between phenotypes and risk factors. Methodologically, data quality was ensured through face-to-face interviews by trained TM doctors, and the tool demonstrated excellent content validity.
However, several limitations must be acknowledged. First, the convenience sample from a single tertiary hospital introduces selection bias and limits generalizability to primary care settings. Second, the cross-sectional design prevents establishing causal relationships. Third, as an exploratory study, we did not perform formal internal (e.g., bootstrapping) or external validation; thus, the potential for overfitting or cluster instability cannot be entirely excluded despite robust TMI values.
Fourth, testing multiple predictors without formal adjustment for multiple comparisons increases the risk of Type I error, though Firth’s penalized regression was used to enhance estimate stability. Fifth, other psychometric properties such as test-retest reliability and internal consistency remain to be evaluated. Finally, our study mapped latent variables to existing theoretical constructs; therefore, the results demonstrate an empirical correspondence rather than an independent, definitive validation of TM syndromes. Future research should integrate psychosocial variables and validate these phenotypes in larger, multi-center cohorts.
Conclusion
In conclusion, this exploratory study suggests that LTM is a valuable tool for stratifying patients with specific LBP into distinct clinical phenotypes and examining their empirical alignment with classical TM syndromes. By linking these phenotypes to specific risk factors and potential pathophysiological mechanisms (e.g., smoking-induced microvascular dysfunction for the “Sharp Pain” phenotype), this work provides a hypothesis-generating framework for addressing the clinical heterogeneity of specific LBP associated with musculoskeletal and spinal disorders. For the rheumatologist, this approach offers a promising step towards more personalized diagnostic hypotheses and targeted therapeutic strategies, bridging traditional clinical observation with advanced computational data analysis.
Authors’ Contributions
M.Q.H.L.: Conceptualization, methodology, supervision, project administration, and writing—review and editing. D.T.D.: Conceptualization, methodology, investigation, data curation, and writing—original draft. N.N.T.N. and T.H.N.: Investigation and validation. B.N.L.: Software and formal analysis. T.A.H.: Resources and investigation. M.P.T.N.: Visualization and formal analysis.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Footnotes
Disclosure Statement
The authors declare that they have no competing interests.
Funding Information
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Supplemental Material
Supplemental Material
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
