Abstract
Objectives:
In order to treat depressive patients using Traditional Chinese Medicine (TCM), it is necessary to classify them into subtypes from the TCM perspective. Those subtypes are called Zheng types. This article aims at providing evidence for the classification task by discovering symptom co-occurrence patterns from clinic data.
Methods:
Six hundred four (604) cases of depressive patient data were collected. The subjects were selected using the Chinese classification of mental disorder clinic guideline CCMD-3. The symptoms were selected based on the TCM literature on depression. The data were analyzed using latent tree models (LTMs).
Results:
An LTM with 29 latent variables was obtained. Each latent variable represents a partition of the subjects into 2 or more clusters. Some of the clusters capture probabilistic symptom co-occurrence patterns, while others capture symptom mutual-exclusion patterns. Most of the co-occurrence patterns have clear TCM Zheng connotations.
Conclusions:
From clinic data about depression, probabilistic symptom co-occurrence patterns have been discovered that can be used as evidence for the task of classifying depressive patients into Zheng types.
Introduction
I
There are currently no widely accepted guidelines on the classification of depressive patients into Zheng types. In clinic research, different researchers adopt different strategies. For example, Gao and Fang 1 divide depressive patients into three types: Stagnation of Liver Qi, Spirit Injured by Worry, and Heart–Spleen Dual Vacuity. You et al. 2 divide them into three types: Liver Depression and Spleen Vacuity, Heart–Spleen Dual Vacuity, and Deficiency of Liver-Yin and Kidney-Yin. Guo et al. 3 divide them into four types: Liver Depression and Spleen Vacuity, Liver Blood Stasis and Stagnation, Heart–Spleen Dual Vacuity, and Spleen and Kidney Dual Vacuity.
The objective of this work is, through the analysis of clinic symptom data, to provide evidence that can be used to answer the following questions: What Zheng types are present in the population of depressive patients? What are the characteristics of each type? How can one differentiate between the different types? In TCM, patient classification (also known as syndrome differentiation) is based on symptom co-occurrence patterns. Therefore, this study aimed to identify such patterns from clinic symptom data.
Six hundred and four (604) cases of depressive patient data were collected. The subjects were selected using the Chinese classification of mental disorders clinic guideline CCMD-3. 4 The symptoms were selected based on the TCM literature on depression. In other words, symptoms of interest related to TCM that were reported to have occurred in depressive patients were used. The data were analyzed using latent tree models (LTMs). 5,6 The analysis reveals a host of probabilistic symptom co-occurrence patterns with clear TCM Zheng connotations, as well as symptom mutual-exclusion patterns.
Methods
Data collection
The data were collected in 2005–2006. The subjects were inpatients or outpatients aged between 19 and 69 years from 9 hospitals from several regions of China. They were selected using the Chinese classification of mental disorders clinic guideline CCMD-3. 4 CCMD-3 is similar in structure and categorization to the International Classification of Diseases (ICD) and the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV), though it includes some variations on their main diagnoses and around 40 culturally related diagnoses. For example, CCMD-3 places more emphasis than ICD and DSM do on neurasthenia, which denotes a mental disorder marked by chronic weakness and easy fatigability. Koro and qigong deviation are examples of culture-specific symptoms included in CCMD-3.
Excluded from the study were subjects who took antidepression drugs within 2 weeks prior to the survey, women in the gestational and nursing periods, patients suffering from other mental disorders such as mania, and those suffering from other severe diseases or having had operations recently.
The symptoms (and signs) were extracted from the TCM literature on depression between 1994 and 2004. The searched was done with the terms “
” and “
” (
An epidemiologic survey was conducted on the 143 symptoms. Six hundred and four (604) patient cases were collected. Each patient case contains information about which symptoms occurred in the patient and which ones did not. Various measures were taken to ensure data quality. Examples include staff training, site visit by principal investigators, and dual data entry.
In the 604 patient cases, 57 symptoms occurred fewer than 10 times. They were removed from the data set, and the remaining 86 symptoms were included in further analysis.
Data analysis
In data-driven medical research, latent class analysis (LCA) 7 is often used to identify subtypes in a group of patients. LCA is based on the latent class model (LCM), which consists of 1 latent variable and multiple symptom variables that are observed. The symptom variables are assumed to be mutually independent, given the latent variable. To perform LCA, one needs to determine the number of states for the latent variable (i.e., the number of clusters, and the probabilistic parameters). Several researchers have used LCA to study major depressive disorder from the Western medicine perspective. According to the recent systematic review by van Loo et al., 8 those studies “mainly grouped patients on overall severity, but not in classes with qualitatively different symptom profiles.”
To analyze the TCM depression data, a generalization of LCM called LTMs was used. 5 An LTM can be viewed as a collection of LCMs where each LCM involves 1 latent variable and a distinct subset of the symptom variables, and the latent variables are connected to form a tree structure. Each LCM partitions the patients into two or more clusters. Because different LCMs involve different symptom variables, the clusters given by different LCMs have qualitatively different symptom profiles and can capture symptom co-occurrence patterns.
Currently, the state-of-the-art algorithm for latent tree analysis (LTA) is the EAST algorithm, where EAST stands for Extension Adjustment Simplification until Termination. 6 A Java implementation* of EAST was used to analyze the TCM depression data.
Results
Model structure
The result of the analysis is an LTM. The structure of the model is shown in Figure 1. The nodes labeled with English phrases represent symptom variables. Each of them has two possible values, indicating the presence or absence of the symptoms. The symptom variables come from the data set. The nodes labeled with the capital letter “Y” followed by an integer are the latent variables. They are not from the data set. Rather they were introduced during data analysis to explain patterns in the data. There is an integer next to each latent variable. It is the number of possible states of that latent variable. For example, the latent variable Y1 has 3 possible states, while Y29 has 2.

The structure of the model obtained by latent tree analysis on the depression data set.
Each latent variable and the symptom variables directly connected to it form an LCM. For example, Y29 forms an LCM with “fear of cold,” “cold limbs,” and “surging pulse.” It will referred to as the LCM headed by Y29, or simply the Y29 LCM. Numerical information includes the conditional probability distribution of each symptom variable given the latent variable. The strengths of the dependencies as measured by mutual information are visually depicted by the widths of the edges. For example, Y29 is strongly correlated with “cold limbs,” moderately correlated with “fear of cold,” and weakly correlated with “surging pulse.”
The latent variables are connected up to form a tree structure. The correlatedness between two neighboring latent variables is characterized by a probability distribution. The strength of the correlation is depicted, shown by the width of the edges between the two variables. For example, and Y15 are Y16 are strongly correlated, while Y28 and Y29 are only marginally correlated.
As will be seen later, most of the connections among the variables are consistent with the TCM postulates on how the variables are related to each other. However, several symptom variables in the model seem to be out of place. They include “somnolence” under Y11, “tinnitus” and “pain in limbs” under Y12, “dry eyes” under Y15, “yellow urine” under Y16, “sloppy stool” under Y24, and “surging pulse” under Y29. The reason is that those symptoms occur rarely in the data and hence there is not sufficient information to determine their appropriate locations in the model. It is for this reason that those symptom variables are only weakly related to the latent variables to which they are connected. We will ignore those variables in subsequent discussions.
Symptom co-occurrence patterns
Each latent variable in the model represents a partition of the patients surveyed, and each state of the latent variable denotes a cluster. For example, the latent variable Y29 has two states, which are denoted as Y29=s0 and Y29=s1, respectively. It represents a partition of the patients into two clusters.
In Figure 1, Y 29 forms an LCM with three symptom variables. To appreciate the meaningfulness of the partition represented by Y29, the probability distributions of the two symptom variables “fear of cold” and “cold limbs” in the two clusters Y29=s0 and Y29=s1 are examined. They are given in Table 1(a). It shows that the first cluster (Y29=s0) consists of 54% of the patients while the second cluster (Y29=s1) consists of 46% of the patients. The two symptoms “fear of cold” and “cold limbs” do not occur often in the first cluster, while they both tend to occur with high probabilities (0.8 and 0.85) in the second cluster.
The probability distribution indicates that the two symptoms “fear of cold” and “cold limbs” tend to co-occur in the cluster Y29=s1. The co-occurrence is not a certain event. Rather it is probabilistic in nature. Thus, it is called a probabilistic symptom co-occurrence pattern. It turns out that the pattern is meaningful from the TCM perspective. As a matter of fact, TCM asserts that Yang Deficiency can lead to, among other symptoms, “fear of cold” and “cold limbs” 9, p. 192 So, the presence of the probabilistic pattern Y29=s1 suggests the Zheng-type Yang Deficiency.
Note that there are three notions concerning Y29=s1: First, it is a state of the latent variable Y29; second, it denotes a probabilistic symptom co-occurrence pattern; and third, it represents the cluster of patients with the pattern.
The LTA has revealed a host of probabilistic symptom co-occurrence patterns that are meaningful from the TCM perspective. As shown in Table 1(b), the latent state Y28=s1 captures the probabilistic co-occurrence of “aching lumbus,” “lumbar painlike pressure” and “lumbar painlike warmth.” This pattern is present in 27% of the patients and it suggests the Zheng-type Kidney Deprived of Nourishment. 9, p. 192 The latent state Y27=s1 (Table 1[c]) captures the probabilistic co-occurrence of “weak lumbus and knees” and “cumbersome limbs.” This pattern is present in 44% of the patients and it suggests the Zheng-type Kidney Deficiency. 9, p. 192
The discussions, which so far have focused on the bottom right corner of the model structure, now move to the latent variables depicted on the second-last level. The latent state Y23=s1 (Table 1[d]) captures the probabilistic co-occurrence of “hypochondriac distension,” “hypochondriac pain,” and “abdominal pain.” This pattern is present in 16% of the patients and it suggests the Zheng-type Liver Qi Stagnation. 9, p. 252 The latent state Y22=s1 (Table 1[e]) captures the probabilistic co-occurrence of “gastric stuffiness,” and “abdominal distension.” This pattern is present in 28% of the patients and it also suggests the Zheng-type Liver Qi Stagnation. 9, p. 252
The latent state Y21=s1 (Table 1[f]) captures the probabilistic co-occurrence of “upset and restlessness” and “irritability and bad temper.” This pattern is present in 81% of the patients and it suggests the Zheng-type Stagnant Qi Turning into Fire (also known as Liver Fire Flaming Up). 9, p. 253 The latent state Y19=s1 (Table 1 [g]) captures the probabilistic co-occurrence of “clouded head,” “heavy head,” and “distention in head.” This pattern is present in 59% of the patients and it suggests the Zheng-type Qi Stagnation in Head. 9, p. 253 The latent state Y15=s1 (Table 1[j]) captures the probabilistic co-occurrence of “feeling of suffocation,” “shortness of breath,” and “sighing.” This pattern is present in 48% of the patients and it suggests the Zheng-type Qi Deficiency. 9, p. 234
The latent variable Y17 is directly connected with only one symptom variable “palpitations,” which suggests Heart Qi Deficiency in TCM. 9, p. 264 The latent variable Y16 is directly connected with two symptom variables. However, the dependence of “yellow urine” on Y16 is only marginal. The other symptom variable is “oppression in chest,” which suggests Qi Deficiency in TCM. 9, p. 240 Those two latent variables do not reveal symptom co-occurrence patterns themselves. However, their relationships with neighboring latent variables do. Using those relationships, one can calculate, for instance, the probability distributions of relevant symptom variables in the Y16 clusters. They are shown in Table 1(i). The distributions for the cluster Y16=s1 indicate that “oppression in chest” tends to co-occur with “shortness of breath,” “feeling of suffocation,” “palpitation,” and “sighing.”
It is noted that all the latent variables depicted on the second-last level except 1 are related to Qi disorders. The exception is Y18. The latent state Y18=s1 (Table 1[h]) captures the probabilistic co-occurrence of “enlarged tongue” and “tooth-marked tongue.” This pattern is present in 39% of the patients and it suggests the Zheng-type Internal Accumulation of Excessive Dampness. 9, p. 59 This is not related to Qi disorders, which explains why the connection between Y18 and Y17 is weak.
The latent state Y11=s1 (Table 1[k]) captures the probabilistic co-occurrence of “sticky and slow stool” and “constipation.” This pattern is present in 14% of the patients and it also suggests the Zheng-type Deficiency of Stomach/Spleen Yin. 9, p. 280 The latent state Y10=s1 (Table 1[l]) captures the probabilistic co-occurrence of “heat in palms and soles” and “baking heat.” This pattern is present in 35% of the patients and it suggests the Zheng-type Yin Deficiency. 9, p. 293 The latent state Y9=s1 (Table 1[m]) captures the probabilistic co-occurrence of “spontaneous sweating” and “night sweating.” This pattern is present in 66% of the patients and it suggests the Zheng-type Deficiency of Both Qi and Yin. 9, pp. 239, 293
Table 2 shows information about two other latent variables, Y5 and Y12. The latent state Y5=s1 captures the probabilistic co-occurrence of “difficulty in falling asleep,” “easy to awake during sleep,” “difficulty in falling asleep again,” and “reduced sleep time.” This pattern is present in 68% of the patients and it clearly suggests sleep disorders. The latent state Y12=s2 captures the probabilistic co-occurrence of “nausea,” “frequent urination,” and “awake early sleep again.” This pattern is present in 8% of the patients and it is clearly meaningful.
Symptom mutual-exclusion patterns
Table 3 shows information about latent variables that reveal symptom mutual-exclusion patterns. The latent variable Y1 (Table 3[a]) reveals the mutual exclusion of “white tongue coating,” “yellow tongue coating,” and “yellow-white tongue coating.” Indeed, Y1 divides the patients into three clusters. Each of the three symptoms occurs only in 1 of the clusters. No two symptoms occur in the same cluster. So the symptoms are mutually exclusive.
Similarly, Y2 (Table 3[b]) reveals the mutual exclusion of “thin tongue coating,” “thick tongue coating,” and “little tongue coating.” Y3 (Table 3[c]) reveals the mutual exclusion of “yellow complexion” and “white/dark-yellow complexion.” Y6 (Table 3[d]) reveals the mutual exclusion of “pale purple tongue,” “pale tongue,” and “purple tongue.” Y8 (Table 3[e]) reveals the mutual exclusion of “pale red tongue” and “red tongue.”
Y13 (Table 3[f]) indicates that “thirst with desire to drink” and “thirst with no desire to drink” are mostly mutually exclusive. However, they do co-occur in a small fraction of the patients. The reason is that those patients were not sure which of the two alternatives to check off in the survey and thus checked off both. Note that Table 3(f) also indicates that those two symptoms tend to co-occur with “dry mouth and throat.”
Y24 (Table 3[g]) reveals the mutual exclusion of “deep pulse” and “floating pulse.” It also uncovers the fact that “rapid pulse” co-occurs with only “floating pulse,” but not “deep pulse.” On the other hand, “slow pulse” co-occurs with only “deep pulse,” but not “floating pulse.” Y25 (Table 3[h]) reveals the mutual exclusion of “forceless pulse” and “forceful pulse.” “String-like pulse” tends to be mutual exclusive with both of them, while “slippery pulse” tends to co-occur with “forceful pulse.”
In summary, this analysis has identified from survey data a host of probabilistic symptom co-occurrence and mutual-exclusion patterns. Some of the symptom co-occurrence patterns have clear TCM Zheng connotations and hence are especially interesting. Those patterns are summarized in Table 4.
Discussion
Previous data-driven investigations of depression were based on the 12 symptoms disaggregated from the nine DSM-III-R criteria for major depression. 8 The objectives were either to identify symptom dimensions using factor analysis or to determine subtypes of depression using LCA.
This study is based on 86 symptoms that are of interest from the TCM perspective. The data were analyzed using a new method called LTA. The analysis reveals both symptom dimensions and interesting subclasses of patients. Specifically, the output of the analysis is a LTM that consists of 29 latent variables. Each latent variable represents a symptom dimension, and a partition of the patients along that dimension. Some of the clusters in the partitions capture probabilistic symptom co-occurrence patterns, while others capture symptom mutual-exclusion patterns.
In China, depressive patients are often treated using TCM. To do so, doctors need to first classify those patients into subtypes (called Zheng types) from the TCM perspective, and then come up with treatment plans for each subtype. Three questions arise: (1) What Zheng types are present in the population of depressive patients? (2) What are the characteristics of each type? and (3) How can one differentiate between the different types? The results of this analysis can be used as evidence for answering those questions.
For example, the authors' analysis has revealed the probabilistic co-occurrence of “hypochondriac distention,” “hypochondriac pain,” and “abdominal pain” and that the pattern is present in 16% of the depressive patients (see Y23=s1 in Table 1[d]). From those, we can conclude the presence of the Zheng-type Liver Qi Stagnation in the population of depressive patients.
The subsequent questions are the following: (1) What are the characteristics of the Zheng-type Liver Qi Stagnation? and (2) How can one determine whether a particular patient belongs to the type? Y23 provides some evidence for answering those questions. However, the questions cannot be answered based solely on Y23. The reason is that Y23 captures only one aspect of Liver Qi Stagnation. As shown in Table 4, the latent variables Y16, Y21, and Y22 are also related to Liver Qi Stagnation. They capture other aspects of the Zheng type. Therefore, it will be necessary to jointly consider those latent variables (and potentially others) in order to obtain an appropriate overall characterization of Liver Qi Stagnation. Future research will determine how this can be done. In this article, some of the necessary building blocks have been provided.
Conclusions
By analyzing 605 cases of depressive patient data using LTMs, a host of probabilistic symptom co-occurrence patterns and symptom mutual-exclusion patterns have been discovered. Most of the co-occurrence patterns have clear TCM Zheng connotations, while the mutual-exclusion patterns are also reasonable and meaningful. The patterns can be used as evidence for the task of classifying depressive patients into Zheng types.
Footnotes
Acknowledgments
Research on this article was supported by China National Basic Research 973 Program under Project No. 2004CB517106, 2011CB505101, 2011CB505105, Guangzhou HKUST Fok Ying Tung Research Institute, Innovative Team Project of Beijing University of Chinese Medicine (2011-CXTD-08), and Research Base Development Project of Beijing University of Chinese Medicine (2011-JDJS-09).
Author Disclosure Statement
No competing financial interests exist.
*
Available at:
