Abstract

Over the past 30 years, the field of research in Traditional Chinese Medicine (TCM) has made great strides. One area that continues to create vigorous debate is validity of TCM diagnosis and inter-rater reliability among practitioners. This issue is not one that is unique to TCM. Allopathic medicine and psychology have similar issues of low inter-rater reliability. 1 –5
There have been several attempts to increase reliability of diagnoses in TCM; however, all of the methodologies—mine included—fail to capture the inter-relatedness of TCM diagnoses. 6 I have argued that TCM can be thought of as a medicine primarily organized around relationships between organ systems and substance densities, whereas allopathic medicine is primarily organized around substances and that the relationships between systems have only in the past 30–40 years been considered.
As such, the TCM diagnoses are related to each other and in fact are often considered a progression of disease. For example, TCM theory holds that there is a progression of “precious substances” from the densest fluid, Jing,—often translated as essence and most closely related to DNA in allopathic medicine—to the least dense substance, Shen, which is closely related to consciousness. Within this paradigm, deficiency conditions worsen by moving from a balance point in the center toward the extremes (Fig. 1). Diseases with Xue deficiency often progress to Yin deficiency. Disease with Qi deficiency progresses to Yang deficiency. A patient with Kidney Qi deficiency—a pattern often seen in low-back pain—might progress to Kidney Yang deficiency—a pattern that includes low-back pain, but also a pervasive sensation of cold, urinary incontinence, and edema in the lower limbs. However, the statistical methods such as Fleiss' kappa that are used to determine whether practitioners are accurately diagnosing do not account for this underlying progression of disease.

Density of Traditional Chinese Medicine precious fluids.
In this issue are two excellent articles by Zhang et al. 7 and Liu et al. 8 Although they are not directly aimed at understanding inter-rater reliability, they use statistical techniques that help uncover the complex relationships among diagnoses. Zhang et al. 7 compare two methodologies—exploratory factor analysis-based hierarchical clustering analysis and complex-system entropy-based clustering analysis. The authors found that the entropy-based technique is a better model that the providers felt most closely identified the key diagnoses and their components. Liu et al. 8 use a different approach using mutual information measures and Markov clustering analysis.
Statistically these two techniques are different methods used to reduce data into groups. There are many clustering methodologies and each has different strengths and weakness. Clustering methodologies can be sorted into two types: vector clustering and graph clustering. Zhang et al. 7 use vector clustering whereas Liu et al. 8 uses graph clustering.
Factor analysis-based hierarchical clustering analysis starts with each variable and progressively groups them together, creating a structure that is similar to animal classification taxonomies. Zhang et al. 7 use a measurement of entropy to make the groupings. Entropy, in this case, is related to information theory rather than thermodynamics and is a measurement of uncertainty. The methodology sorts variables with a high level of information (low-probability events) into unique groups. The goal is to produce groups with a maximum difference between each other.
Markov clustering is a graph clustering methodology. In this methodology, distances between data are adjusted so that items with less relationship between each other are weakened whereas those with more relationship are strengthened. This is done through a mathematical technique called a random walk. Markov clustering produces groups using a nonlinear methodology and can produce truly nonlinear groupings.
A weakness of both of these articles and the techniques is that they are considered “unsupervised” techniques. That is to say each analysis that assumes the true nature of TCM diagnoses can be found by looking at the structure of the data and needs no prior assumptions. There is no accounting of the underlying theoretical relationships within TCM theory. Each time a diagnosis is made, the TCM practitioner comes to the session with a vast amount of underlying theory—just as allopathic doctors do. They understand the relationship of Qi and Blood and Spleen and Liver and Kidneys. The theoretical mapping of these is not captured in the data set and, therefore, cannot be included in the analysis. You can see this issue in Liu et al.'s article 8 where they report that Yang deficiency (characterized by fatigue and cold sensation) is paired with the excess condition known as Fire (characterized by sensations of heat such as fever along with constipation and dry skin). As the authors note, these two are usually considered incompatible with each other from a TCM theory perspective.
A second issue is whether the symptoms listed were binary or continuous variables. From the data descriptions, it appears that it may be binary. This would tend to give increased importance to symptoms of low interest and decreasing symptoms of strong concern. A patient with infrequent sweating without exertion and high levels of pain is different than one with frequent sweating without exertion and low levels of pain. These presentations would lead a practitioner toward a different diagnosis. However, from an unsupervised data extraction, these are lost data and could lead to contradictory combinations.
I want to emphasize that these are both important articles utilizing sophisticated analyses. Each represents forward progress in our field. From these articles, one can find evidence for TCM diagnoses that are unique and have key characteristics. However, even with these techniques, there are methodologic improvements that could lead to a better understanding of TCM diagnosis.
