Abstract
Background:
Chinese medicine (CM) has its own diagnostic indicators that are used as evidence of change in a patient's condition. The majority of studies investigating efficacy of Chinese herbal medicine (CHM) have utilized biomedical diagnostic endpoints. For CM clinical diagnostic variables to be incorporated into clinical trial designs, there would need to be evidence that these diagnostic variables are reliable. Previous studies have indicated that the reliability of CM syndrome diagnosis is variable. Little information is known about where the variability stems from—the basic data collection level or the synthesis of diagnostic data, or both. No previous studies have investigated systematically the reliability of all four diagnostic methods used in the CM diagnostic process (Inquiry, Inspection, Auscultation/Olfaction, and Palpation).
Objectives:
The objective of this study was to assess the inter-rater reliability of data collected using the four diagnostic methods of CM in Australian patients with knee osteoarthritis (OA), in order to investigate if CM variables could be used with confidence as diagnostic endpoints in a clinical trial investigating the efficacy of a CHM in treating OA.
Methods:
An inter-rater reliability study was conducted as a substudy of a clinical trial investigating the treatment of knee OA with Chinese herbal medicine. Two (2) experienced CM practitioners conducted a CM examination separately, within 2 hours of each other, in 40 participants. A CM assessment form was utilized to record the diagnostic data. Cohen's κ coefficient was used as a measure of the level of agreement between 2 practitioners.
Results:
There was a relatively good level of agreement for Inquiry and Auscultation variables, and, in general, a low level of agreement for (visual) Inspection and Palpation variables.
Conclusions:
There was variation in the level of agreement between 2 practitioners on clinical information collected using the Four Diagnostic Methods of a CM examination. Some aspects of CM diagnosis appear to be reliable, while others are not. Based on these results, it was inappropriate to use CM diagnostic variables as diagnostic endpoints in the main study, which was an investigation of efficacy of CHM treatment of knee OA.
Introduction
CM is characterized by its adoption of an integrated view of the mind and body. There are four diagnostic methods used in CM diagnosis: Inquiry; (visual) Inspection; Auscultation/Olfaction; and Palpation. Based on the information obtained by the four diagnostic methods, by a process known as Syndrome Differentiation, CM practitioners are able to diagnose the diagnostic subcategory or “pattern of disharmony” occurring in the body.
As in Western medicine, the consultation between the doctor and patient is also the fundamental art of CM. Inquiry is important not only to find out about the patient's life and symptoms, but also to determine the perceived causes of the disease/condition and underlying disharmony or Syndrome. In general, Inspection can be divided into three components: (1) inspection of the patient's vitality and complexion; (2) inspection of body parts; and (3) inspection of the tongue (tongue diagnosis). As a traditional medicine, the methods of direct Auscultation or Olfaction (without assistance of modern instrumentation) are still used, although there is less reliance on olfaction. This kind of diagnosis is based on the view that the sound of the voice, breath sounds, coughing, and body odors are related to the various internal organs. The diagnostic methods of Palpation refer to palpation of the pulse, (pulse diagnosis) the head and neck, chest and abdomen, muscles, skin, extremities, and acupoints. Pulse diagnosis has been developed to a very sophisticated level in CM, assessing the characteristics of the radial pulse and interpreting these in relation to the functional and physiologic state of the internal organs. Pulse and tongue diagnosis are considered to be the “two pillars” of the four examination methods in traditional practice. There have been several studies that have investigated the reliability of CM Syndrome diagnosis, in a variety of clinical conditions. 4 –11 Overall, the majority of studies have indicated that CM Syndrome diagnosis is not particularly reliable. There was no analysis of the reliability of the clinical data collected in these studies; therefore, it is unknown where the inconsistency lies—at the basic level of data collection or in the synthesis of the data to form the Syndrome diagnosis. A few studies have investigated the consistency of data collected using two of the main CM diagnostic methods, tongue diagnosis 12,13 and pulse diagnosis. 13,14 One (1) study 13 examined comprehensively the reliability of three of the four main methods of collecting clinical information, but there have been no studies that have investigated systemically the reliability of all four diagnostic methods.
CM Syndrome diagnosis is dependent on the individual diagnostic observations (data). If the basic diagnostic data are not reliable, there will be lower confidence that the CM Syndrome diagnosis would be reliable and, therefore, lower confidence that optimal treatment is being received by the patient (because treatment follows Syndrome diagnosis). If CM diagnostic variables are consistent, then they can also be used in clinical trials as outcome indicators to assess the efficacy of CM therapies in addition to biomedical diagnostic endpoints. Thus, a study of the reliability of the CM diagnostic process was conducted as part of a clinical trial investigating the efficacy of a Chinese Herbal Medicine (CHM) in knee osteoarthritis (OA).
Materials and Methods
This study was part of a clinical trial to assess the efficacy of a CHM for treating Australian patients with knee OA. The trial was conducted in the Centre for Clinical Studies of the Nucleus Network, and the Alfred Hospital in Melbourne, Australia. Ethics approval was obtained from the Victoria University ethics committee and the Alfred Hospital ethics committee.
Practitioners and participants
Two (2) CM practitioners participated in this study. Both are registered CM practitioners in the state of Victoria, Australia, with bachelor degree–level training in CM and at least 10 years clinical experience. One (1) practitioner was trained and was practicing in China, and the other practitioner had completed training and is in clinical practice in Australia.
Forty (40) eligible patients from the CHM study participated in the inter-rater reliability study. The inclusion criteria for the main clinical study included unilateral or bilateral OA of the knee and fulfillment of the criteria provided by the American College of Rheumatology (ACR) 1995, 15 mainly based on knee pain and presence of radiographic osteophytes. Radiographic evidence of OA was based on the Kellgren-Lawrence radiographic system, 16 either grade II or grade III severity primary tibio-femoral OA as a condition of inclusion. The exclusion criteria of the study included secondary OA or rheumatoid inflammatory or any other type of arthritis, accompanying OA of hip of sufficient severity to interfere with the functional assessment of the knee, having received intra-articular treatment of the involved joint or joint lavage in the previous 6 months (e.g., corticosteroids or hyaluronic acid) or knee surgery during the previous 3 months, and any significant systemic illnesses or medical conditions that could lead to difficulty with complying with the protocol.
There was no CM assessment made for any of the participants prior to the study.
Materials
A CM assessment form was utilized to record systematically CM diagnostic information during a CM examination. This form was developed based on standard questions and observations using the four diagnostic methods: Inquiry; Inspection; Auscultation; and Palpation, as set out in the State Administration of Traditional Chinese Medicine's Advanced Textbook on Traditional Chinese Medicine and Pharmacology, vol. 1, 17 a standard textbook used in CM curricula in China. The form was originally developed by consensus of a group of Australian researchers, and applied in a study of the reliability of CM diagnosis in hypercholesterolemic Australians, and, thus, had face validity and content validity. 13 This assessment form was modified to include specific questions related to the knee joint.
The clinical information in the form recorded clinical data in a manner that allowed it to be readily analyzed. The first section (Inquiry, case history) consisted of a series of questions related to bodily functioning, with extra questions about the knee joint added for this study. The other sections recorded information relating to Inspection, Auscultation, and Palpation. For most questions, information was recorded as categorical variables, and practitioners were required to choose one answer from a limited range of answers.
With respect to pulse diagnosis, the basic characteristics recorded in this study were pulse speed, force, and depth. Pulse speed was measured as the number of beats of beats per breath of the patient. A normal speed is considered to be 4–5 beats per breath. A slow pulse is defined as ≤3 beats per breath, and a fast pulse is described as ≥6 beats per breath. The CM assessment form used in this study is considered to have face validity, construct validity, and criterion validity in terms of the theoretical structure and the real practice of CM diagnosis. Diagnostic endpoints utilised in the CM assessment form are set out in Table 1.
Study procedures
The reliability study was conducted at the first study visit of the CHM clinical trial. Both practitioners conducted the CM assessment separately on the same day, with a maximum of 30 minutes between the two CM assessments (each session lasted approximately 30 minutes). There was no training or discussion between the 2 practitioners prior to or after the CM assessment and the CM assessment forms were kept separate until data analysis. Data entry and analysis were conducted by the first author of this article.
Statistical analysis
The level of agreement (%) for the key diagnostic endpoints was calculated, along with the (Cohen's) κ coefficient. If data were missing from at least one tabulation cell (the responses of an endpoint were concentrated on a particular choice) the κ coefficient could not be calculated. The interpretation of κ values used in this study 18 is shown in Table 2.
Results
A total of 40 study participants were recruited: 23 females and 17 males. The mean age of participants was 62.2 years (standard deviation [SD]=11.0) and the age range was 42–92.
Inquiry variables
There were 84 investigated endpoints in the Inquiry section. Insufficient data were collected for 9 variables, and data for a further 23 variables that could not be analyzed using the κ coefficient because there were missing data in at least one of the tabulation cells (mainly because of the consistent recording of one particular response for one of the practitioners). Inter-rater reliability was assessed for 52 key Inquiry variables and ranged from “poor” to “almost perfect.” Results are shown in Table 3.
Data missing from at least one tabulation cell, so Kappa could not be calculated.
SE, standard error; CI, confidence interval.
Inspection variables
A total of 24 Inspection variables were assessed. Nine variables could not be analyzed using the κ coefficient because 1 rater consistently chose the one response. Agreement for the 15 Inspection diagnostic variables ranged from “poor” to “moderate.” Results are shown Table 4.
Data missing from at least one tabulation cell, so Kappa unable to be calculated.
SE, standard error; CI, confidence interval.
Auscultation variables
Agreement between 2 practitioners for the two auscultation variables was 82.5% for voice strength and 97.5% for characteristics of breath sounds. Given that one practitioner consistently chose the same responses for each participant for these variables, the κ coefficient could not be applied. Results are shown in Table 4.
Palpation variables
A total of eight Palpation variables were assessed. One variable (moisture of hands) could not be assessed using the κ coefficient because one rater consistently chose one response. Agreement ranged from “poor” to “substantial” for the eight palpation endpoints investigated. “Fair” agreement was found for the variable of location of right pulse. “Slight” agreement was found for four variables of pulse diagnosis (location and force of left pulse, force and speed of right pulse) A “poor” level of agreement was found for speed of the left pulse (level of agreement 76.9%). Results are shown in Table 4.
Discussion
In general, this study suggests that there is substantial variation in level of agreement for diagnostic information collected in a CM examination. In some cases, it appears to be quite reliable and in other cases, it appears to be unreliable. The majority of studies of reliability of CM diagnosis for a variety of clinical conditions have yielded relatively low levels of agreement among practitioners. 4 –11 Little information is available about where in the diagnostic process the variability exists—at the basic level of data collection or at the stage of synthesizing the data according to various CM theories to arrive at the CM syndrome diagnosis, or both. The current study examined the reliability of the fundamental stage of the diagnostic process—data collection.
Reproducibility of Inquiry variables
Given that most of the information elicited from a case history comprise, by nature, subjective symptoms, it is common to find inconsistency in diagnostic assessments among different doctors—even in conventional medicine. 19,20 Zhang and colleagues argued that Inquiry as a diagnostic method is not a valid instrument, because of the low level agreement of diagnosis found among 3 practitioners working with patients who had rheumatoid arthritis (RA). 4,10 The use of a CM Assessment form used in this study attempted to standardize the Inquiry process.
In the current study, a relatively good level of agreement for most Inquiry variables was found. For almost 50% of the Inquiry endpoints there was a “substantial” to “almost perfect” level of agreement.
The limitations of the κ statistics need to be considered when interpreting some items. A “fair” level of agreement was found for some symptoms, such as characteristics of abdominal pain; there was only a small number of valid cases for the κ analysis and therefore there were insufficient data from which to judge reliability. There were also responses to 23 questions that could not be analyzed. Reasons included missing data for κ tabulation cells, or small numbers of valid cases. In the case of some variables when the κ coefficient interpretation indicated only “moderate” to “fair” levels of agreement, the actual percentage agreement may have been quite high. Caution should be exercised in the interpretation of the level of agreement using the κ coefficient when there is a skewing of a high number of responses for one particular response option. This is one of the limitations of the κ coefficient. 21
Given the nature of the response options (fixed, categorical) and the use of a fixed form of words, it is not surprising that there was a reasonably high level of agreement on Inquiry variables. This contrasts with clinical practice in which asking an open-ended question may elicit more varied responses. Other research found the use of objective questionnaires instead of conventional case notes for Inquiry can increase agreement on CM diagnosis among practitioners for RA. 22
Reproducibility of Inspection variables
Inspection in CM is based on the concept of correspondence between the Internal Organs and their external manifestations. A “slight” to “fair” level of agreement was found for the majority of the eight Inspection variables assessed, indicating a relatively low level of agreement overall for Inspection variables in general. The results are similar to those found in other studies that found variable levels of agreement for Inspection endpoints. 12,13 For example, although not directly comparable to the current study, because their study used three practitioners, O'Brien and colleagues' found that the level of agreement between 3 practitioners for color and size of tongue body was “slight,” and for thickness of tongue coating it was “fair” (level of agreement was higher when level of agreement was measured for at least 2 practitioners). 13 The low level of agreement is indicative of the subjective nature of Inspection, which can require quite subtle observations. Clear definitions of these characteristics in CM are needed. Work is underway in China to develop objective methods for tongue diagnosis, including color detection instruments and computerized image analysis systems based on different color models. 23 –26 These models are an attempt to quantify characteristics of the tongue body and tongue coating color. Although studies have found that tongue color identification systems can reflect the characteristics of tongue color and record similar judgments to those of CM practitioners, there are also several factors that could contribute to errors in measurement, such as the shape of tongue, the structure of its curved surface, and the sampling area. 24 –26 In addition, studies in Western medicine have also indicated considerable variation among observers in physical examinations. 27 –31
Reproducibility of Auscultation variables
The level of agreement between 2 practitioners on the strength of voice and character of breathing sounds was reasonably high in the current study; they were 82.5% and 97.5%, respectively (although κ could not be calculated). This was a similar result to that found by O'Brien and colleagues who found a “moderate” to “almost perfect” level of agreement. 13 However in CM clinical practice, not much significance is attached to Auscultation because the hearing ability of practitioners varies from person to person. Some research has attempted to quantify and interpret the voice using a spectrogram. 32,33 However, Auscultation in CM also includes many other sounds such as eructation, groaning, and crying which have not been studied with respect to reliability. More studies are needed to establish the reproducibility of Auscultation variables.
Reproducibility of Palpation variables
Palpation in CM includes palpation of the chest, abdomen, other body parts as indicated, meridians, and acupoints, but the most important aspect in clinical practice is pulse diagnosis. The results of this study indicated “poor” agreement for left pulse speed, “slight” agreement for four aspects (left pulse location and force, and right pulse speed and force), and “fair” for one aspect (right pulse location). This is not dissimilar to O'Brien and colleagues' study that found the level of agreement among 3 practitioners was “slight” for location and “fair” for force for the right pulse. 13 The results support the notion that pulse diagnosis is the most difficult part of the art of CM diagnosis, which requires extensive clinical experience to master. 3
Despite its crucial role in the diagnostic process in CM, it is difficult to objectify and standardize pulse diagnosis. Some attempts have been made in China to try to objectify pulse diagnosis through development of pulse-measuring apparatuses. 34,35 However, it has been argued that those apparatusus are simply pulse-tracing devices based on anatomy and physiology as described in conventional medicine rather than in CM theory. 36 Current research indicates that unambiguous definitions of pulse characteristics are critical in pulse research. 37,38 Standardizing the pressure applied (i.e., measurement of finger strength) in pulse detection is the most difficult part in pulse research. 36 A review of reliability studies of pulse diagnosis 38 showed that level of agreement varies from “low” to “very good” agreement.
Study limitations
There were a number of factors that may have affected the results of this study. First, the use of the CM assessment form may have interrupted or sidetracked the clinical thinking process by requiring the practitioner to ask about symptoms/signs that were not necessarily relevant. This may be likened to breaking a whole picture into too many (diagnostic) pieces. Some of the knee-related questions were more general, and it emerged that they may not have captured the experience of the condition of knee OA sufficiently. For example, some patients reported that they did not have pain or stiffness, but had “discomfort.”
Second, the difference in CM training and experience between the 2 practitioners may have contributed to the variability in observations and diagnosis. Some signs are open to interpretation, which is likely to be influenced by clinical training and experience.
Third, there was no training prior to the study. Other studies have found that prior training improved the level of agreement between practitioners. 22
Finally, there are limitations of the κ coefficient, which have already been discussed. κ can become unstable or even inappropriate as a statistic. 39
To improve the reliability of CM data collected using the four diagnostic methods, clear definitions for all outcome variables should be established and prior training of the examiners (as some other studies have done) should be incorporated into future study designs.
Conclusions
This study investigated the reliability of the CM diagnostic process in a comprehensive and systematic way, investigating all four diagnostic methods. The level of agreement was not sufficiently high to justify inclusion of CM diagnostic variables in the current authors' main study investigating the efficacy of CHM for treating symptoms of knee OA (manuscript in progress). Nonetheless, the study in this article contributed important information about the reliability of the fundamental stage of reaching a diagnosis of a CM syndrome—the first step of data collection. The reader is cautioned, however, that this is only 1 study conducted for one condition (knee OA), therefore, no definitive generalisations about the reliability of data collection in CM can be made at this point. Further studies for a range of conditions are needed.
Footnotes
Acknowledgments
This study formed part of the PhD candidacy of the first author and was supported by Victoria University and Nucleus Network (Baker Medical Research Institute), Melbourne, Australia. The support of these institutions is gratefully acknowledged. This study formed part of a clinical trial into the efficacy of a Chinese herbal medicine in the treatment of symptoms of knee OA, registered with the Australian and New Zealand Clinical Trials Registry (ACTRN12608000468325).
Disclosure Statement
The authors state that no competing financial interests exist.
