A Novel Approach to Describing Traditional Chinese Medical Patterns: The “Traditional Chinese Medical Diagnostic Descriptor”

Abstract

Objectives:

In the first of a series of three articles by the present authors, diagnostic agreement between Traditional Chinese Medicine (TCM) practitioners was found to be low. This was the first time that TCM diagnoses had been evaluated with an open population of patients and this result is a cause of concern. In the second article, incorrect statistics were shown to have often been used to calculate chance-removed inter-rater agreement, and appropriate statistics such as Gwet's Agreement Coefficient 2 (AC2) was recommended for future studies. In this, the third article, a novel approach to recording TCM diagnostic patterns, the Traditional Chinese Medical Diagnostic Descriptor (TCMDD), is presented that allows chance-removed agreement calculation. An example of mapping TCM diagnostic patterns to the TCMDD format is given and diagnostic agreement is evaluated.

Design, Settings, Subjects:

The same 35 subjects used to report agreement in our first article were also diagnosed by additional practitioners using the TCMDD format during the same experimental sessions at the University of Technology, Sydney Clinic. TCM diagnoses from the first article were also mapped to the TCMDD format.

Outcome measures:

Linearly weighted simple agreement and the AC2 statistic were utilized and all results compared.

Results:

Linearly weighted simple agreement using the TCMDD and TCM mapped to TCMDD format averaged 0.80 ± 0.02 compared with 0.19 for TCM. TCMDD and TCM mapped to TCMDD chance-removed agreement, as calculated with AC2, ranged between 0.67 and 0.73 ± 0.03.

Conclusions:

The TCMDD allows the essence of diagnoses expressed by TCM practitioners to be appropriately compared. This was confirmed by the TCM mapped to TCMDD results. In both cases, simple agreement was significantly greater than that obtained with the TCM format. Chance-removed statistics and error estimates can be reliably calculated with the AC2 and the TCMDD in open populations.

Introduction

In the first article of a series of three by the present authors,¹ simple weighted agreement was found to be low between Traditional Chinese Medicine (TCM) practitioners diagnosing patients with unrestricted diagnostic options from an open population that involved the participation of subjects with large variations in health status. This is a significant result, as previous TCM diagnostic reliability research has focused on single disease states. Furthermore, in the second article² it was revealed that most inter-rater reliability studies utilized incorrect Fleiss' kappa statistic, whereas the chance-removed statistic developed by Gwet and calculated with software named the Agreement Coefficient 2 (AC2)³ was identified and demonstrated as appropriate. Unlike Gwet's AC2, Fleiss' kappa is only accurate if each category is equally represented in the data analyzed (known as fixed marginality) and severely reduces agreement if they are not.

As discussed in the first article of the series, unless diagnostic reliability in TCM practice is adequately measured and improved to a satisfactory level, it may be argued that treatments are being applied effectively in a random manner. One research team⁴ has already stated that TCM diagnoses are not suitable for inclusion as investigation variables.

The current reporting format for a TCM diagnosis makes available over 100 often semantically similar patterns^5
–7 and any number of these patterns can be combined to provide a complex diagnosis for a patient. A diagnostic description in TCM usually consists of a combination of terms, somewhat like a phrase. When people communicate with each other, completely different combinations of diagnostic terms can, and are often used to describe a similar diagnostic concept. Similarly, substitutions or combinations of TCM patterns could be used to describe basically the same condition of a patient. Herein lies a major obstacle in comparing diagnoses in the current TCM framework; with so many diagnostic choices available to express what may be the same clinical presentation, exact agreement is very unlikely. All these factors present a major obstacle, as the very large number of diagnostic combinations decreases potential agreement.

The contemporary TCM diagnostic framework is perhaps responsible for the low probability of practitioners agreeing upon the very same patterns. This seems to have led to a failure to examine diagnostic reliability within open populations, that is, to explore diagnostic reliability in typical clinical settings. The only study known to the authors in which agreement in an open population is examined is that of Ward et al.⁸ within the field of psychiatry. Agreement studies in TCM investigate a specific prediagnosed biomedical disease condition with restriction of the TCM diagnostic options or specific diagnostic factors.⁹ This lack of investigation of diagnostic reliability in open populations, together with the poor result found in our first article¹ of the series was the motivation for the present research team to design and implement a study that developed and tested strategies to improve agreement for use in more complex, clinically realistic contexts.

Using such a plethora of options as the starting point for agreement may be setting the threshold too high to achieve realistic consensus. It would seem to be more appropriate to focus on Descriptors, that is, the diagnostic building blocks or concepts that combine to form the TCM patterns as a basis of determining consensus. In this article, the utilization of Descriptors for determining consensus is termed “Essential” agreement, whereas when agreement is deemed to occur only where the terms utilized are exactly the same is termed “Particular” agreement. It seems that there is a need to move away from an unattainable goal of particular agreement and seek to determine at least Essential agreement.

In the authors' first study,¹ the prevalence of certain terminologies in raters' diagnoses were noted. The diagnostic terms Liver, Spleen, Qi Stasis, or Qi Xu were involved in a majority of diagnoses. Reducing the number of Descriptors utilized, while retaining appropriate diagnostic detail, would naturally facilitate higher levels of consensus. This reduction in diagnostic options must not lead to the possibility that vital diagnostic detail is sacrificed. With these opposing requirements in mind, a diagnostic format is presented and its potential to improve diagnostic agreement is tested.

Traditional Chinese Medical Diagnostic Descriptor

The Traditional Chinese Medical Diagnostic Descriptor (TCMDD) is based upon the TCM pattern reporting method used in the Diagnostic System of Oriental Medicine (DSOM), a diagnostic questionnaire developed by Lee et al.¹⁰ to diagnose and investigate women's reproductive health from a Korean traditional medicine perspective. The DSOM reporting format (DSOMf) was used in the second of our series of articles to demonstrate problems with the statistics generally used to determine diagnostic reliability and was found to have attributes that address the problems of the contemporary TCM diagnostic format identified in the first article. Agreement between practitioners when using the DSOMf, in a study of 42 patients and 5 practitioners, as detailed in our second article,² was found to be very high. Simple agreement of 0.78 ± 0.01 was found and chance-removed agreement calculated with AC2 was 0.60 ± 0.02. This result led to the development of the TCMDD.

In this study, the practitioners were instructed to score every Descriptor of the DSOMf, including zero scores for Descriptors that were without issue, thereby providing a complete diagnostic picture of the patient. This allowed a more holistic view of the patient's health. The holistic approach to health, a major tenet of TCM, which is characterized by “the belief that the parts of something are intimately interconnected and explicable only by reference to the whole.”¹¹ Any alteration of the DSOMf had to preserve this principle.

Interestingly, the approach of focusing on agreement in components of TCM diagnostic syndromes was also adopted in part by Mist et al.¹² to improve diagnostic accord. Unfortunately, the results of Mist et al. seem to be compromised by the use of Fleiss' kappa statistics and changes in the approach used to determine diagnostic agreement after training.

After reviewing over 59,000 TCM diagnostic patient records collected in a 13-year period at the University of Technology Sydney (UTS) Chinese medicine outpatient clinic,¹³ the TCMDD was developed by editing the Descriptors of the DSOMf to accommodate all patterns reported in Australian clinical environment and thereby extending the DSOMf system from one concerned with female reproductive health to a generalized indicator of TCM health.

Although there had been 16 Descriptors in the original DSOMf, 14 were retained, 1 deleted, 2 combined, and 1 added to arrive at the 15 Descriptors of the TCMDD. All changes between the two formats took place in the Disease Cause (bing yin) category. Phlegm and Damp were combined to become a single Descriptor Damp. This amalgamation occurred because there was minimal use of phlegm for diagnoses in the UTS clinic, although it is recognized that dampness can have either an external or internal origin and phlegm is always an internal factor.¹⁴ Dryness was also removed, as it was not recorded at the UTS Chinese medicine outpatient clinic.

Bi syndromes were not included in the DSOM. This approach was preserved in the TCMDD; and as in the published article¹⁵ they were seen as Cold, Qi Stasis, and/or Blood Stasis as appropriate. As in the DSOMf, yin and yang organs assessments were amalgamated using the “five-phase” approach in the TCMDD and the Heart and the Pericardium were combined. Finally, Wind was added as a Descriptor to the TCMDD to allow representation of all the patterns retrieved from the UTS Chinese medical outpatient clinic database.

The Descriptors that were tested were selected only after much deliberation, as it was recognized that the rationalizations that occurred during the Descriptor selection process would be bound to trouble some sectors of the TCM community. A solution to the conundrum of possible perceived diagnostic oversimplification while attempting to attain improved diagnostic accord is presented in the Discussion of this article. The process by which the DSOMf was altered to arrive at the TCMDD after examining the UTS clinic diagnostic data is discussed in detail in the first author's PhD thesis.¹⁶

A schematic diagram of the TCMDD diagnostic reporting system is given in Figure 1, with the Descriptors arranged in three columns; the first being disease causes, the second Chinese medical factors, and the last the TCM five phase/organs.

FIG. 1.

The TCMDD. The 15 Descriptors of the TCMDD are defined using the appropriate definitions contained the WHO International Standard Terminologies on Traditional Medicine in the Western Pacific Region.¹⁰ Quoting directly from this reference each of the Descriptors is now briefly summarized. Heart: the organ located in the thoracic cavity above the diaphragm that controls blood circulation and mental activities; Spleen: the organ located in the middle energizer below the diaphragm, whose main function is to transport and transform food, up-bear the clear substances, keep the blood flowing within the vessels, and is closely related to the limbs and flesh; Lungs: a pair of organs located in the thoracic cavity above the diaphragm that control respiration, dominate qi, govern diffusion and depurative down-bearing, regulate the waterways, and are closely related to the function of the nose and skin surface; Kidneys: a pair of organs located in the lumbar region that store vital essence, promote growth, development, reproduction, and urinary function, and also have a direct effect on the condition of the bone and marrow, activities of the brain, hearing and inspiratory function of the respiratory system; Liver: the organ located in the right hypochondrium below the diaphragm that stores blood, facilitates the coursing of qi, and is closely related to the function of the sinews and eyes; Qi Deficiency: a general term for deficiency of qi that leads to decreased visceral functions and lowered body resistance; Blood Deficiency: any pathological change characterized by deficiency of blood that fails to nourish organs, tissues, and meridians/channels; Qi Stagnation: a pathological change characterized by impeded circulation of qi that leads to stagnation of qi movement and functional disorder of organs, manifested as distention or pain in the affected part; Blood Stagnation: a pathological product of blood stagnation, including extravasated blood and the blood circulating sluggishly or blood congested in a viscus, all of which may turn into pathogenic factor, the same as blood stasis or stagnant blood; Yin Deficiency: a pathological change marked by deficiency of yin with diminished moistening, calming, down-bearing, and yang-inhibiting function, leading to relative hyperactivity of yang qi; Yang Deficiency: a pathological state characterized by deficiency of body's yang qi that leads to diminished functions, decreased metabolic activities, reduced body reactions and deficiency-cold manifestations; Dampness: a pathogenic factor characterized by its impediment to qi movement and its turbidity, heaviness, stickiness, and downward flowing properties, also called pathogenic dampness, which also includes Internal Dampness produced in the body due to yang deficiency of the spleen and kidney with decreased fluid transportation and transformation and resultant water stagnation and Phlegm: (1) pathological secretions of the diseased respiratory tract, which is known as sputum; (2) the viscous turbid pathological product that can accumulate in the body, causing a variety of diseases; Cold: (external) as one of the six excesses that causes external cold pattern/syndrome and (internal) cold in the interior due to deficiency of yang qi or preponderance of yin cold or Cold: as a pathogenic factor characterized by the damage to yang qi, deceleration of activity, congealing and contracting actions, also called pathogenic cold; Heat: as a pathogenic factor that causes heat pattern/syndrome, also called pathogenic heat that also includes, Fire: as a pathogenic factor characterized by intense heat that is apt to injure fluid, consume qi, engender wind, inducing bleeding, and disturb the mental activities, also called pathogenic fire, and Wind: (external) as a pathogenic factor characterized by its rapid movement, swift changes, and ascending and opening actions, also called pathogenic wind or (internal) the same as liver wind, wind in the interior due to abnormal movement of body's yang qi. TCMDD, Traditional Chinese Medical Diagnostic Descriptor. Color images available online.

The 15 Descriptors of the TCMDD conform to the standardized, concretized definitions of TCM terms as set out by the World Health Organization's International Standard Terminologies on Traditional Medicine in the Western Pacific Region.¹⁷

Extra diagnostic detail is facilitated using the TCMDD rather than the contemporary TCM format, due to individual scoring of a TCM pattern's constituent descriptors. For instance, if a patient is diagnoses with Liver Qi Stasis, the standard TCM approach would have the choice of strong, moderate, or mild expression of the pattern. Using the TCMDD, each Descriptor can be weighted independently with, for instance, a strong Liver involvement as indicated by perhaps a score of 4 or 5 and mild Qi Stasis as represented by a 1 or 2 score. The use of a linear scale also enables the utilization of weighted versions of simple and chance-removed agreement statistics such as Gwet's AC2.^18,19 The TCMDD allows the recording of any patient's zang fu diagnosis that may consist of either a combination of many patterns or simple single pattern.

Mapping diagnoses from TCM to TCMDD

An example of a TCM diagnosis mapped to the TCMDD format is now presented, and the diagnostic matching capacity of each system will be shown. This hypothetical example is specifically designed to demonstrate most of the major principles of TCM mapping to TCMDD diagnosis and agreement calculation. Three raters gave three diagnoses to a subject using the TCM format. These diagnoses are later mapped to the TCMDD format for agreement comparison between the two diagnostic formats.

The principles used for Descriptor mapping from the TCM to TCMDD format are given in the Table 1.

Table 1.

TCM Diagnoses

Rater 1	Rater 2	Rater 3	Matches
Lung Phlegm Cold		Lung Damp Heat
Kidney Jing Xu		Kidney Yin Xu
	Heart Phlegm Heat
Liver Qi Stasis	Liver Qi Stasis	Liver Qi Stasis	3
	Qi and Blood Stasis
Total			3

TCM, Traditional Chinese Medicine.

The number of matched patterns is displayed in the final column. If three raters produced patterns that matched exactly, for example, “Liver Qi Stasis” in Table 1, three agreements were noted, that is, A–B, A–C, and B–C, whereas if two raters matched, a single match was recorded.

As in our first article, the calculation of overall simple agreement involves the determination of the number of matches obtained and dividing this by the number of possible matches. Agreement in this example is calculated as three-matched patterns of a possible 9, or 0.33.

Table 2 provides the diagnoses from Table 1 mapped in the TCMDD format, and simple agreement is then calculated. For example, Lung Phlegm Cold in TCM maps to the Descriptors Lung, Damp, and Cold in the TCMDD, similarly Lung Damp Heat maps into Lung, Damp, and Heat.

Table 2.

TCM Mapped to Traditional Chinese Medical Diagnostic Descriptor

Rater 1	Rater 2	Rater 3	Matches	Nonselected matches
Liver	Liver	Liver	3
Kidney	Kidney	Kidney	1
Lung	Lung	Lung	1
Spleen	Spleen	Spleen		3
Heart	Heart	Heart		1
Qi Xu	Qi Xu	Qi Xu		3
Yang Xu	Yang Xu	Yang Xu		3
Yin Xu	Yin Xu	Yin Xu	1
Blood Xu	Blood Xu	Blood Xu		3
Qi Stasis	Qi Stasis	Qi Stasis	3	3
Blood Stasis	Blood Stasis	Blood Stasis		1
Damp	Damp	Damp	3
Wind	Wind	Wind		3
Heat	Heat	Heat	1
Cold	Cold	Cold		1
			13	21

Descriptors selected by raters in bold.

TCM, Traditional Chinese Medicine.

Liver Qi Stasis consists of two Descriptors in the TCMDD format, so when two raters select it, two matches occur rather than one in the TCM. This is a distinct difference to calculating agreement to the TCM format. The amalgamations used within the TCMDD of Phlegm to Damp and Jing to Yin are also differences.

The number of Descriptor matches observed in Tables 1–3 from a possible 9 in Table 1 and 13 from and possible 45 in Table 2, have a similar agreement as the TCMDD approach. But, it should also be noted that when the TCMDD format is used to define a patient's health status, the deliberate nonselection of a Descriptor by a practitioner is an indication of the health status of the patient. Should two or more practitioners not select a Descriptor, then an agreement is deemed to occur. In the above example this would have provided another 21 matches. Combined with the matches at total of 34 from a possible 45 are obtained, a simple agreement of 0.76.

In a situation where an individual is exceedingly healthy and has virtually no health problems whatsoever, practitioners examining such an individual would score most, if not all, Descriptors with zero to define their health status. For a healthy individual, high levels of agreement would therefore occur. The recording of absence of disease factors as represented by zero scores is central to the TCMDD approach and is a major point of difference from the conventional TCM approach.

TCMDD agreement calculation example

Using the mapping example data presented in Table 2, but now with scores allocated of 0–5 for each chosen pattern, an example of the method used to calculate agreement will now be given, with the assumption that the zeros indicate that the Descriptor was deliberately omitted as presented in Table 3. The use of scores instead of just expressing patterns as chosen or not, will in general cause agreement to be reduced, as shown in the example.

Table 3.

Agreement Calculation Data

Descriptor	Rater 1	Rater 2	Rater 3
Liver	3	4	2
Kidney	3	0	3
Lung	2	0	3
Spleen	0	0	0
Heart	0	2	0
Qi Stag	3	4	2
Blood Stasis	0	3	0
Yin Xu	3	0	3
Blood Xu	0	0	0
Qi Xu	0	0	0
Yang Xu	0	0	0
Cold	2	0	0
Heat	0	0	3
Wind	0	0	0
Damp	2	2	3
TPS	18	15	19

TPS, Total Pathogenic Score.

In this example, linearly weighted simple agreement of 0.5 ± 0.1 and Gwet's chance-removed AC2 agreement of 0.4 ± 0.1 were found. The high error is due to the small sample of a single subject.

The scoring of Descriptors within the TCMDD enables a Total Pathogenic Score (TPS) to be calculated. The TPS is the sum of scores from all Descriptors, and has potential for use as a generalized wellness measure of a patient within TCM terms. The TPS can also be used to track changes in the overall health of a patient, where a lower TPS indicates an improvement in health. Score changes in a subject's health status recorded by the same practitioner should be more useful for the determination of health changes than absolute Descriptor scores,²⁰ due to possible practitioner scoring biases.

Having developed the TCMDD and shown that any TCM diagnosis can be mapped onto the TCMDD and presented as an example of a method for calculating agreement, it remains to be shown whether agreement is improved with its use with real practitioner data, as seemed to be the case in the hypothetical example presented previously.

Materials and Methods

Thirty-five volunteer participants were enrolled for the study (23 women, 12 men). Subjects were recruited by word of mouth. There were no exclusions based on age or health status. The mean age of the women was 50 years and the men 56 years, with both in the range of 17–78 years. In fact, the same participants who were utilized in our first article were diagnosed by extra practitioners to produce the results reported in this article. Ethical clearance was obtained from the UTS Human Research Ethics Committee before commencing the study.

Data were collected over two consecutive days at the UTS Chinese medicine outpatient clinic. Before commencing data collection, each participant was asked to read an information sheet and to agree to participate by signing a consent form. Subjects were allocated to one of the three appointment times on either day. The order in which the subjects were interviewed was determined and quasi-randomized by the order of their arrival at the clinic and the availability of practitioners for interviews. The practitioners used the diagnostic approaches they would normally use in their normal clinical settings to arrive at their diagnoses and were not restricted, limited or guided in this process in any way. Each subject was seen by four or six practitioners, half in each case used either the TCMor TCMDDformat, with a changeover in format used by the practitioners midway through the day.

This approach also assisted workflow. Practitioners conducted interviews with each subject, recorded their results on the supplied forms,¹⁶ and were given an unenforced 20-min time limit for each interview.

As in our first article, the practitioners used in the study all graduated from either a mainland Chinese Medicine College or University, or from the Chinese medicine program at the University of Technology, Sydney. Each diagnostician had at least 5 years of clinical experience in TCM. Practitioners were given approximately 15-min training in the use of the TCMDD, including being trained to be cognizant that scoring a descriptor zero meant that there was no issue with this descriptor and all their scoring choices would be used to determine inter-rater agreement, before the commencement of patient interviews.

Additional practitioners to those mentioned in the first article were utilized. On the first day, 2 further practitioners diagnosed the same 19 subjects. On the second day, 3 extra practitioners diagnosed the same 16 subjects. A total of 86 new diagnostic assessments were therefore obtained.

The TCM diagnoses recorded in our first article were mapped to the TCMDD format using the principles illustrated in the mapping example and included as a further data set.

Outcome Measures

Two methods were utilized to calculate and evaluate levels of consensus with the TCMDD and TCM mapped to TCMDD data; linearly weighted percentage agreement, and chance-removed, linearly weighted agreement calculated by Gwet's AC2 statistic, a superior method to other comparable statistics as reported in our second of the series article, and by other studies.²¹ The results will be discussed using Landis' scale for kappa agreement²² as defined in Table 4.

Table 4.

Landis' Scale for Discussing Agreements

<0	Poor
0.0 < 0.2	Slight
0.2 < 0.4	Fair
0.4 < 0.6	Moderate
0.6 < 0.8	Substantial
0.8 < 1.0	Almost perfect

The standard error calculated by the AC2 provides important additional information as to how reliable the agreement result is and gives confidence in the result. A high standard error may mean that the sample is too small, or that agreement is unreliable.

Results

Weighted simple and AC2 statistic agreement with standard errors where available are given in Table 5.

Table 5.

Weighted Simple and Agreement Coefficient 2 Agreement with Standard Errors

Format	Weighted simple	AC2
TCM	0.19	NA
TCMDD day 1	0.80 ± 0.02	0.67 ± 0.03
TCMDD day 2	0.80 ± 0.02	0.67 ± 0.03
TCM mapped to TCMDD day 1	0.78 ± 0.02	0.65 ± 0.03
TCM mapped to TCMDD day 2	0.83 ± 0.02	0.73 ± 0.03

AC2, Agreement Coefficient 2; TCM, Traditional Chinese Medicine; TCMDD, Traditional Chinese Medical Diagnostic Descriptor.

Discussion

Results

None of the practitioners raised any concerns or had difficulties in using the TCMDD format. Indeed, there were positive comments as to the ease of use and value of this new approach.

Linearly weighted simple agreement calculated from the TCMDD and the TCM mapped to TCMDD data were significantly greater than that obtained with the TCM data. Using Landis and Koch's accepted method of interpreting chance-removed agreement reported with Kappa statistics as outlined in Table 4, both the TCMDD and TCM mapped to TCMDD formats lead to levels of AC2 agreement classified as substantial, whereas the TCM format was slight¹ even with chance agreement not removed.

The agreement obtained after TCM diagnoses was mapped to TCMDD format demonstrates that the level of agreement between the raters was very high, but Essential agreement was obscured by use of the TCM diagnostic format. Indeed, the agreement obtained in the TCM diagnoses mapped to TCMDD format were effectively the same as those obtained between practitioners who had used the TCMDD format to record their diagnoses.

As diagnosis is typically taught in Chinese medicine educational institutions and described in diagnostic textbooks,⁵ the functions and pathologies of TCM concepts such as Liver, Spleen, Qi Xu, and Qi Stasis are defined and then combined into TCM patterns. Aggregations, such as Liver Qi Stasis or Spleen Qi Xu, are made available to the practitioner for diagnosis. Practitioners are then taught to convert the diagnosis into a treatment principle. This treatment principle is then rendered into a treatment for liver, spleen, qi xu, qi stasis, and so on. The TCMDD utilizes this diagnostic foundational approach and focuses on the principal diagnostic factors rather the expression of complex aggregations.

The TCMDD format seems to have many advantages over the existing TCM structure when used to describe a subject's health. The TCMDD allows appropriate chance-removed statistics to be applied to accurately determine levels of consensus, facilitates far more acceptable levels of diagnostic reliability than the current TCM diagnostic format, and allows a holistic representation of a patient's health in the TCM context. These factors combine to form a persuasive argument for its adoption for the description of the Traditional Chinese Medical diagnosis of a patient.

Study limitations

Sample size

Caution should be exercised when interpreting the present results in isolation, due to the small data set of only 35 subjects examined by 4 or 6 raters. However, the study carried out with the DSOMf, a format with the same theoretical properties, yielded similar agreement outcomes when 5 practitioners examined 42 subjects.² The DSOMf study strengthens the validity of the result obtained with the TCMDD. The two studies when considered together provide some validation. The validation of the concepts introduced here requires larger scale investigations as a priority.

Sensitivity

As mentioned previously in this article, not all variables of potential diagnostic interest have been included in the TCMDD. For instance, the conflation of Phlegm and Damp, and/or Yin and Jing may be problematic. It was intended that the 15 Descriptors proposed would not be the final definitive statement on this important subject and the Descriptors would be open to change. Examples of situations in which higher diagnostic resolution might be required are TCM herbal practice and the specialized studies of, say, Liver conditions. A solution is proposed in the Future research section of the Discussion.

Internal validity

The causal relationship investigated is that the TCMDD leads to increased diagnostic agreement between TCM practitioners. The experiments were conducted at UTS clinic in an appropriate environment. There seems to be no confounding variables that would compromise the result. Therefore, all care was taken to ensure the internal validity of the experiment was not compromised.

External validity

The small size of the sample of subjects and practitioners including the potential lack of sensitivity outlined previously is a limitation on the external validity of the study.

Future research

We present new information of low TCM diagnostic reliability¹ and the increase in agreement with the use of the TCMDD reported in the this article and given in greater detail in the primary author's PhD¹⁶ should be validated as a priority. Once validated, future TCM investigations would be able to use the TCMDD and thereby might include verification of the participating practitioners' diagnostic reliability.

High-quality research using meta-analysis of large numbers of subjects drawn from randomized control trials has recently been published^23,24 that confirm that acupuncture is very effective in chronic pain and depression. Furthermore, the Acupuncture Evidence Project²⁵ also reports strong evidence for effective treatment of 8 conditions and moderate evidence for a further 38 conditions. These studies however, did not report the diagnostic reliability of the treating practitioners that now seems to be useful information in light of the evidence presented earlier.

To address concerns raised in the Study Limitations section mentioned previously regarding TCMDD sensitivity, future investigations should also explore the use of diagnostic Subdescriptors that would form subsets of the 15 Primary Descriptors, given in Table 6. The Subdescriptors when utilized would be scored and then these values mapped to the Primary Descriptors for agreement calculation.

Table 6.

The Primary and Potential Subdescriptors of the Traditional Chinese Medical Diagnostic Descriptor

Primary descriptor	Subdescriptor
Heart	Heart, Small Intestines, Pericardium and Triple Heater
Liver	Liver, Gall Bladder
Kidney	Kidney, Bladder
Lung	Lung, Large Intestine
Spleen	Spleen, Stomach
Heat	Heat, Fire
Cold	External Cold, Internal Cold
Damp	Damp, Phlegm
Wind	External Wind, Internal Wind
Yin Xu	Yin Xu, Jing Xu
Yang Xu	Yang Xu
Qi Xu	Qi Xu
Xue Xu	Xue Xu, Dryness

A definitive list of Subdescriptors can be only be formed after significant research and debate and as with the Primary Descriptor definitions, must be compatible with an evolved diagnostic classification system such as the World Health Organizations' International Classification of Disease (ICD)⁷ or the soon to be released ICD-11 coding tool for Traditional medicine.²⁶ Indeed, in the event of sufficient evidence being presented (such as extensive use of a Subdescriptor or the very limited of use of a Primary Descriptor), the TCMDD could be altered in the future.

The proposed 26 Subdescriptors include the 12 TCM Zang–Fu organs and another 4 items. Diagnostic agreement would still needs to be calculated on the primary TCMDD format, as a central principle of this approach is the inclusion of zero scores of Descriptors for the calculation of inter-rater agreement.

An examination of the TCMDD data used for validation revealed that raters scored an average of just over 5 of the 15 Descriptors. The average TPS of each patient was used to sort the data and the subjects above and below the median were investigated to determine their properties and is presented graphically in Figure 2. The broad range of TPS demonstrates the large variation in the subject's health, which suggests the patient sample was from an open population. As one might expect, a trend of increasing numbers of Descriptor utilization was observed as TPS increased. Even in the lowest TPS group, just over four Descriptors were used in a significant frequency.

FIG. 2.

Average number of Descriptors selected in TPS sorted groups. TPS, Total Pathogenic Score. Color images available online.

If the Subdescriptor tier of the TCMDD was used to calculate agreement, it is likely that there would be no significant increase in the number of diagnostic factors scored 1 or greater in recording a patient's diagnosis. The subsequent increase in zero scores would necessitate an abandonment of a consideration of zero scores in agreement calculation and eliminate an important feature of the TCMDD approach.

Once validated, the higher levels of consensus facilitated with the use of the TCMDD and the holistic rendering of a patient's health made possible with it, will make it a useful tool in research. In future investigations, some mechanisms of action of interventions or treatment effects may be found to differ depending on the TCMDD-defined TCM diagnoses of subjects. The determination of the most effective interventions and their applicability for pattern independent symptomatic treatments or symptoms associated with TCMDD patterns could also be made. Objective physiological measurements may correlate to TCMDD diagnostic factors.¹⁶

It may be that future studies that incorporate reporting of diagnostic reliability with the TCMDD may be able to recommend tailored treatments according to the patient's TCMDD diagnoses and therefore be of great value clinically. To measure the potentially subtle changes in TCMDD diagnostic descriptors, subdescriptors, and symptoms that may occur after interventions, consideration should be given to increasing the sensitivity of the scoring system from 0 to 5. The TPS should also be investigated for its potential to measure for changes in the holistic wellbeing of patients as viewed from the TCM perspective and compared with other validated wellness measures such as the SF-36.²⁰

Questionnaires determining the pathology in some of the Descriptors that are based on symptom lists have been published.^27

–30 The completion of questionnaires for all descriptors would easily integrate with the TCMDD approach, potentially further improving agreement.

Other styles of Chinese Medicine may require different lists of descriptors and should be investigated separately. The investigation of inter-rater agreement in open populations is a new field of investigation and could also be considered in some other health modalities.

The history of TCM shows many examples of the acceptance of new theories, or the reorganization of existing theories and approaches, with a convention of retaining and utilizing previous systems where appropriate.³¹ In keeping with this ideal, we envisage the TCMDD would not replace but stand beside other diagnostic approaches. The TCMDD is a proposal to provide a solution to the current problems that have been identified in TCM diagnostic reliability.

Conclusions

The TCM diagnostic format called the TCMDD was developed, validated to a reasonable extent, and found to be easy to use as a diagnostic classification system for open populations. It attempts to allow the expression of all facets of a patient's TCM condition and facilitates the evaluation of genuine inter-rater consensus, capturing Essential agreement, without losing significant details relating to the TCM diagnostic pattern. Additional diagnostic details can be captured with the TCMDD due to the separate scoring of each descriptor. In addition, multiple TCM patterns can be accommodated within the TCMDD and provide a framework for a holistic representation of a patient's TCM health. The introduction of the Subdescriptor element will allow further diagnostic detail to be considered and allows a pathway for possible change to the TCMDD if warranted.

The diagnostic definitions currently used to record a TCM diagnosis seem to provide an impediment to inter-rater agreement and do not allow use of appropriate chance-removed statistics or the calculation of error estimates, whereas the TCMDD facilitates both. Contemporary TCM diagnostic patterns can be mapped to the TCMDD to yield similarly high agreement results to those obtained with the direct use of the TCMDD.

Treatment effectiveness and mechanism of action studies could utilize the TCMDD to take advantage of its capacity to provide a holistic view of the patient's health and its inter-rater reliability capability. Benchmarking the diagnostic reliability of practitioners participating in future studies could be normal practice. Eventually it is envisaged that TCMDD could be used across the TCM profession, including private practices as a universal, standardized diagnostic format.

Further studies should take place to confirm and improve the utility of the TCMDD, perhaps using a more sensitive scoring format. The Subdescriptor format also should be evaluated. Questionnaires and/or symptom lists for each of the Descriptors and Subdescriptors should also be developed and evaluated. The TPS should also be compared with previously validated wellness measures such as the SF-36 to determine whether correlations exist and to evaluate its usefulness.

The use of the TCMDD with large data, with the increased diagnostic consistency this new approach affords, and the application of appropriate chance-removed statistics it provides, may allow the empirical evidence of TCM diagnosis and treatment practice to be properly evaluated, and perhaps reinterpreted. As mentioned in the first article of the series, some researchers have suggested that some or even all the TCM theories might be found to be without significance. However, to discard the enormous repository of historical and contemporary clinical observations we have inherited from previous and current generations of TCM practitioners without adequate evaluation, would be indeed irresponsible.

The proposal for consideration of the TCMDD as a TCM diagnostic format should not be seen as an attempt to disregard the preceding and existing diagnostic theories and formats, but, in keeping with the historic advances in TCM, provide another layer and approach that add to the rich fabric that is TCM.

Footnotes

Acknowledgments

The authors thank Associate Professor Peter Meier for the supply of data from the University of Technology Sydney Chinese medicine outpatient clinic data. We also acknowledge the generous contribution of UTS for making funds, staff, and the clinic facilities available for this research.

Author Disclosure Statement

No competing financial interests exist.

References

Popplewell

, Reizes

, Zaslawski

Consensus in Traditional Chinese Medical Diagnosis in open populations. J Altern Complement Med, 2018. [Epub ahead of print]; DOI: 10.1089/acm.2017.0148.

Popplewell

, Reizes

, Zaslawski

Appropriate statistics for determining chance removed inter-practitioner agreement. J Altern Complement Med, 2018. [Epub ahead of print]; DOI: 10.1089/acm.2017.0297.

Gwet

AgreeStat 2013.42013.4 for Mac ed: Advanced Analytics. Statistical Analysis on the Extent of Agreement Among Multiple Raters, 2013.

Hua

, Abbas

, Hayes

, et al. Reliability of Chinese medicine diagnosis variables in the examination of patients with osteoarthritis of the knee. J Altern Complement Med, 2012; 18:1028–1037.

Maciocia

The Foundations of Chinese Medicine. Edinburgh, Scotland, UK: Churchill Livingstone, 1989.

Shanghai College of Traditional Medicine. Acupuncture, a Comprehensive Text, 8th ed. Seattle: Eastland Press, 1981.

WHO. International Classification of Diseases. 2015; 10–11. Online document at: www.who.int/classifications/icd/en accessed August 1, 2018 .

Ward

, Beck

, Mendelson

, et al. The psychiatric nomenclature reasons for diagnostic disagreement. Arch Gen Psychiatry, 1962; 7:198–205.

O'Brien

, Birch

. A review of the reliability of Traditional East Asian Medicine Diagnosis. J Altern Complement Med, 2009; 15:353–366.

10.

Lee

, Kim

, Ji

, et al. Reliability study for upgrade of diagnosis system of oriental medicine DSOM(r) S. 1. 1. Korean J Orient Physiol Pathol, 2011; 26:88–97.

11.

Oxford. Oxford English

Dictionary

. In: Proffitt

, ed. Online Oxford English Dictionary. Oxford, United Kingdom: Oxford University Press, 2018.

12.

Mist

, Ritenbaugh

, Aickin

. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of Traditional Chinese Medicine. J Altern Complement Med, 2009; 15:703–709.

13.

Meier

PC.

Data from UTS Acupuncture Clinic Database V2. Sydney: UTS, 2012.

14.

Clavey

Fluid Physiology and Pathology in Traditional Chinese Medicine. Singapore: Churchill Livingstone, 1995.

15.

Zhang

Bi Sydnrome (Arthralgia Syndrome). J Tradit Chin Med, 2010; 30:145–152.

16.

Popplewell

MC.

Improving Diagnostic Reliability in Chinese Medicine. Sydney: University of Technology, 2015.

17.

WHO. WHO International Standard Terminologies on Traditional Medicine in the Western Pacific Region. Manilla, Phillipines: WHO, 2007.

18.

Gwet

Kappa Statistic is not Satisfactory for Assessing the Extent of Agreement Between Raters. Statistical Methods for Inter-Rater Reliability Assessment, 2002. http://agreestat.com/research_papers/kappa_statistic_is_not_satisfactory.pdf

19.

Gwet

Handbook of Inter-Rater Reliabilithy. Gaithersburg, Maryland: Advanced Analytics, 2014.

20.

Gandek

Interpreting the SF-36 Health Survey. Winnipeg, Canada: Canadian Association of Cardiac Rehabilitation, 2002. https://pdfs.semanticscholar.org/4ce5/2fc9ef29966ab51e0a6692fe208a6f2156ea.pdf

21.

Wongpakaran

, Wongpakaran

, Wedding

, et al. Comparison of Cohen's Kappa and Gwet's AC2 when calculating inter-rater reliability coefficients: A study conducted with personality disorder. Med Res Methodol, 2013; 13:61.

22.

Landis

, Koch

. The measurement of observer agreement for categorical data. Biometrics, 1977; 33:159–174.

23.

MacPherson

, Vickers

, Bland

, et al. Acupuncture from Chronic Pain and Depression in Primary Care: A Programme of Research. Southampton: Programme Grants for Applied Research, 2017.

24.

Vickers

, Vertistock

, Lewith

, et al. Acupuncture for chronic pain: Update of an individual data meta-analysis. J Pain, 2018; 19:455–474.

25.

Macdonald

, Janz

. The Acupuncture Evidence Project. A Comparative Literature Review (Revised edition). Acupuncture and Chinese Medicine Association (AACMA). 2017. Online document at: www.acupuncture.org.au accessed September 1, 2018 .

26.

WHO. ICD 11 (Beta). 2018. Online document at: www.who.int accessed September 1, 2018 .

27.

Park

, Yang

, Lee

, et al. Development of a valid and reliable blood stasis questionnaire and its relationship to heart rate variability. Complement Ther Med, 2013; 21:633–640.

28.

Okitsu

, Iwasaki

, Monma

, et al. Development of a questionnaire for the diagnosis of Qi Stagnation. Complement Ther Med, 2012; 20:207–217.

29.

Chen

, Wong

, Cao

, et al. Construction of a traditional Chinese medicine syndrome-specific outcome measure: The Kidney Deficiency Syndrome questionnaire (KDSQ). BMC Complement Altern Med, 2012; 12:73–82.

30.

Kang

, Park

, Moon

, et al. Reliability and validity of the Korean standard pattern identification for stroke (K-SPI-Stroke) questionnaire. BMC Complement Altern Med, 2012; 12:55.

31.

, Wiseman

, Mitchell

, et al. Shang Han Lun. Brookline, MA: Paradigm, 1989.