Abstract
Objectives:
The objective of this study was to determine the efficacy of individually designed herbal formulas according to the rules of Traditional Chinese Medicine (TCM) in patients with osteoarthritis of the hip and knee.
Design:
This was a randomized, controlled, double-blind study with two parallel groups.
Settings/location:
This study was conducted at the University-centre in Gars am Kamp/Austria and was organized by the Institute of TCM and Complementary Medicine of the Danube University Krems /Austria.
Subjects:
The study comprised female and male patients with osteoarthritis of hip or knee aged between 45 and 75 years.
Interventions:
Patients were randomized into a treatment with individualized, water-based herbal decoctions prepared in a standardized cooking process (Verum group) or to a treatment with nonspecific presumably ineffective, water-based herbal decoctions (Control group).
Outcome measures:
The primary outcome was the comparison of change between the intervention groups in the Western Ontario and McMaster Universities lower limb global index questionnaire (WOMAC global index) between baseline and week 20. Secondary outcomes included subscales of WOMAC for pain (A), stiffness (B), and functional impairment (C) and general quality of life in the form of the SF-36 questionnaire.
Results:
Altogether, 102 patients were randomized in this trial. The demographic and medical baseline characteristics were comparable in the 2 groups. The change of the WOMAC global index and all three subscales was significant in both groups between week 20 and baseline (verum group, global WOMAC: at baseline 47 [SD ± 11.8] and at week 20: 24 (SD ± 18.3); change of mean 23; p > 0.001; control group; global WOMAC: at baseline: 48 (SD ± 14.7) and at week 20: 25 (SD ± 18.3); change of mean 23; p > 0.001). However, there was no significant difference (p = 0.783) between the treatment groups. There were significant changes in the subscales “physical functioning,” “bodily pain,” “vitality,” “social-functioning,” and “role-physical” of the SF-36 in both study groups between 20 weeks and baseline, but again no significant difference between the groups. There were no drug-related serious adverse events.
Conclusions:
While the individual prescription consisting of medicinal herbs according to TCM diagnosis investigated in this trial tend to improve the osteoarthritis, the same effect was also achieved with the nonspecific prescription.
Introduction
In view of this evidence situation, it was decided to perform a prospective, randomized clinical trial with Chinese herbs, which were administered to patients as teas. The special feature of this trial was also an attempt to individualize the treatment on the basis of the principles of TCM. The main question was whether the selective, individualized prescription according to diagnostic patterns of TCM is more effective compared to nonspecific herbal treatment.
Materials and Methods
Design
This randomized trial in patients with OA is based on a study design comparing herbal medicine according the rules of the TCM compared to nonspecific herbal remedy. Approval of the protocol was obtained from the local ethics committee of the government in Lower Austria (St. Pölten, Austria). Patients were recruited via newspaper advertisement, radio announcement, and referral from other physicians. All patients gave their informed consent in writing before participating in the trial.
Patients
For inclusion in this study, patients had to meet the following criteria: Patients of both genders, ages from 45 to 75 years, were included, with clinical and radiological diagnosis of OA of hip 17 or knee 18 and a normalized Western Ontario and McMaster Universities (global WOMAC) score 19 between 30 and 80 on admission. 20 The exclusion criteria were one or more of the following diseases or conditions: joint surgery before the study or arthroscopic intervention in the target joint in the past year, and intra-articular injection or systemic corticosteroids during 8 weeks prior to study. Other exclusion criteria were rheumatoid arthritis or inflammatory disease, marked by elevation of C-reactive protein or positive screening for autoimmune disease, abnormal renal or hepatic function, serious psychiatric disease, severe alcohol abuse, uncontrolled type 1 or type 2 diabetes with glycosylated hemoglobin over 8 and body mass index over 35.
Randomization
Patients were randomly assigned to either the treatment group or the control group. Randomization was performed with computer-generated randomization lists, which produced blocks of 12 patients. The randomization code was given to the pharmacist, who prepared the study medication. There was no contact between the pharmacist and the patients, as well as between the doctor and the pharmacist except for e-mails ordering medication.
Study intervention
The study intervention was planned by the investigator in cooperation with experienced TCM herbalists (Yün Xiao Chen, Kassel, Germany and Andreas Höll, Vienna, Austria). Four (4) expected “clinical patterns” and matching herbal formulae were defined (Table 1). The single herbs used in this trial are listed in the literature 21 –25 to treat joint pain and inflammation. Of these herbs, combinations of two or three herbs were used, which are cited as standard combinations.
This formula is a variation of “Sophora Tang” by Dr. Fritz Friedl, Clinic Silima, Riedering, Germany.
TCM, Traditional Chinese Medicine.
By putting together these standard combinations, the basic formulae were constituted (7–10 herbs). All herbs used are listed as effective for the respective pattern. The exact combination and dosage of herbs in the formulae reflected the clinical opinion and experience of the investigator and the supervising doctors; also, due to safety reasons, the drugs were used in low dosage. Several drugs with potential efficacy for OA that might be more difficult to control were not used. During follow-up visits, the basic formulae could be substituted for one another or be adapted according to symptoms by adding predefined herbs to the basic formulae (Table 1).
The herbs were imported from China by an herbal medicine import company (Plantasia, Oberndorf, Austria). The products were analyzed by an independent qualified laboratory (Institut für Rückstands und Spurenanalytik der Sebastian-Kneipp-Forschung, Bad Wörishofen, Germany) and were found to be free of biocides, heavy metals, contaminant microorganisms, and aflatoxins.
Control formulae and blinding
The flavor of each verum formula was identified by tasting. Three (3) formulas with “control herbs” were then designed by a pharmacist (C.K.-D., Vienna, Austria) to imitate the prominent flavors of the verum prescriptions. The “control” formulas were classified in “taste categories,” sufficiently similar to the tastes of the verum formulae (four “taste categories”: A/B = bitter, C = sour-cinnamon, D = sweet) (Table 2). Herbs of the control group were routinely examined to be free of contaminants by a professional company (Mag. Ph.R. Kottas–Heldenberg und Söhne, Drogenhandel GesmbH, Vienna, Austria).
Intervention procedure
For each patient, the TCM pattern and the conforming individual formula were assessed by a physician trained and experienced in herbal medicine for a period of 10 years (M.L.). The formula was forwarded via e-mail to a pharmacy in Vienna (Apotheke zu unserer lieben Frau, Mag. Pharm. A. Höbinger, Wien 1010). The pharmacy prepared the verum or control formulae through a predefined cooking procedure (“decoction”) in hot water, allocating it to a verum or control preparation according to the randomization list. Bottles with the herbal extract were sent to the patients with liquid for 16 days. There were clinical follow-up visits every 2 weeks. Pain medication with nonsteroidal anti-inflammatory drugs (NSAIDS) was allowed and current physiotherapy was continued. The use of other pain therapies, such as drugs acting through the central nervous system or corticosteroids, was prohibited.
Safety measures
Laboratory tests were performed before enrollment, at weeks 3–4, and at the endpoint. All adverse events were documented by the physicians, and patients were advised to report any change in health. At the endpoint, blinding to treatment was assessed and patients were asked about their personal opinion on the tolerability of the herbal treatment (visual analogue scale [VAS]: 0 = no tolerability and 100 = optimal tolerability).
Outcome measures
All investigators were blinded to treatment. The primary endpoint was the comparison of change of the German version of the WOMAC global index 26 after 20 weeks of therapy between both intervention groups. The most affected joint was chosen as the target joint. Secondary endpoints were the difference of change between groups of the WOMAC global index after 8 weeks of intervention, of the WOMAC subscales for pain, stiffness, and functional impairment, the sum and the subscales of the general quality of life questionnaire SF-36 27 after 8 and 20 weeks of intervention, as well as the comparison of the above parameters in the intervention groups between endpoint and baseline. For each patient, diary data were analyzed: compliance was defined by the quotient between the number of days with study medication (T) and the number of days in the study (N), (T/N); pain medication was defined by the quotient between the number of days with pain medication (P) and the number of days in the study (N), (P/N). The weekly VAS was measured by the strongest pain in the target joint each week.
Statistics
Power calculation
For sample size calculation, 2 × 50 OA patients were planned to be included in this study. Typical literature 28,29 shows that the variability of the WOMAC enables the evaluation of efficacy between our study treatment groups with sample sizes of 50 per group. The sample size was calculated for comparing the two study groups with respect to the primary endpoint (global WOMAC). The detectable difference between the mean values of the two groups at the primary endpoint was 75% of the standard deviation (SD) from the above-quoted literature. All statistical tests were planned two-sided at the 5% level of significance, and a power of 95% was used. As software, the SPSS Predictive Analytics Software (PASW) Statistics program, 17.0 Version, 17.0.2 (11.03.2009) of the PASW statistics license server of the Danube University Krems was used.
Biometrical analysis
The statistical analysis was based on the intention-to-treat (ITT) population. The primary endpoint was analyzed on a confirmatory-test basis using the two-sided 5% level of statistical significance and a statistical power of 95%. All available data were evaluated using descriptive statistics consistent with the scale level (i.e., percentages for categorical and ordinal data, mean, median, standard deviation, and minimum and maximum for metric quantitative data). Between intervention arms, comparisons are based on t-tests for independent samples, quantitative data, and all intrapatient comparisons were based on t tests for paired samples. Categorical data were compared using χ2 tests in contingency tables between therapy arms, while intra-individual data were compared based on the McNemar or Bowker test for symmetry. The population was stratified into one group of patients with a global WOMAC score from 30 to 60 (stratum I) and another group with a global WOMAC score of 60 to 80 (stratum II). No interim analysis was performed.
Results
Between October 2007 and May 2008, about 368 patients with OA showed interest in participating in the study and were screened. A total of 102 patients were randomized (Fig. 1). Every patient who took at least 1 day of study medication was included in the ITT population. Patients who withdrew during the trial and had invasive treatment on the target joint were not evaluated at endpoint. Eligible patients were randomized and received the allocated treatment; 94 patients completed the study (ITT population); 72 patients fulfilled the protocol as planned and were counted as per protocol (PP) population; 8 patients withdrew from the study (2 from the study group and 6 from the control group). The most common reason for withdrawal was nonallowed treatments on the patients' demand (details in Fig. 1).

Patient flow chart. ITT, intention to treat; ind., individual.
The demographic and medical baseline characteristics were comparable in the two intervention groups (Table 3). The assignment to the various formulas in the verum and the control group is shown in Table 4.
n, all randomized patients.
Western Ontario and McMaster Universities (WOMAC) scales were normalized to a scale of 0–100 mm by dividing the sum subscale by the number of questions of each subscale score.
Note: No multiple entries were permitted.
SD, standard deviation; NSAIDs, nonsteroidal anti-inflammatory drugs; ASA, acetylsalicyclic acid.
The most commonly prescribed formula in each patient was counted as the assigned formula. Nominally statistically significant between treatments with p < 0.001.
Outcome of primary endpoint
In the ITT analysis, there was no statistically significant difference in change of global WOMAC when comparing the verum group and the control group after 20 weeks (p = 0.783; Table 5).
Mean changes of WOMAC at endpoint are calculated in relation to baseline.
Population at week 8 was reduced due to compliance reasons.
General life quality, SF-36 questionnaire: Higher value indicates better status.
WOMAC, Western Outario and McMaster Universities; SD, standard deviation; SF-36, 36-Item Short Form Health Survey.
In the verum group, the analysis of the global WOMAC score decreased from a mean baseline value of 47 (SD ± 11.8) to a post-treatment mean value of 24 (SD ± 18.3) at study end (change −23). Comparing the endpoint to baseline, the difference was highly significant (p < 0.001). In the control group, the global WOMAC score decreased from a mean baseline value of 48 (SD ± 14.7) to a mean value at endpoint of 25 (SD ± 18.3) (change 23); the p-value within the same group was significant (p < 0.001). These results are consistent with the analysis of the PP population. No significant differences were observed among the substrata.
Secondary outcome measures
The single scales of WOMAC for pain (A), stiffness (B), and functional impairment (C) after 8 and 20 weeks of intervention followed the above trend. There was no statistical difference between the two intervention groups, as well as definite improvement when comparing baseline and endpoint in each group (Table 5).
The subscales of the SF-36 questionnaire for quality of life showed similar findings compared to the primary endpoint. Again, there was no statistical difference between the two groups (Table 5). At endpoint, the verum group showed a significant improvement compared to baseline in the subscales “physical functioning” (p = 0.001), “bodily pain” (p = 0.001), “vitality” (p = 0.001) and “role-physical” (p = 0.001). In contrast, the control group showed an improvement at endpoint compared to baseline in “bodily pain” (p = 0.001), “physical functioning” (p = 0.001), “role-physical” (p = 0.016), and “social functioning” (p = 0.021; Table 5).
The mean compliance to the decoction of the verum and the control group measured within patient diaries was 75% (SD ± 11.8) in the verum group and 68% (SD ± 13.5) in the control group. Thus, the verum group showed a statistically significant higher compliance (p < 0.03).
Regarding pain medication and VAS in the verum group and control group, the compliance of the patients in providing the diaries of this study was low. Half of the patients provided medication diaries in the ITT group (verum group 58%, control group 45%). In those who provided diaries, the mean consumption of pain medication was unexpectedly low (7% [SD ± 10.1]) in the verum group and 6% (SD ± −8.9) in the control group). Owing to lack of compliance, the data were not evaluated on the weekly VAS, as they could only be obtained from <50% of the patients.
Adverse events
There was no difference in the incidences of the adverse events (AEs) among the groups (Table 6). The most common drug-related AE were gastrointestinal symptoms such as epigastric discomfort, diarrhea, constipation, and flatulence. These reactions occurred more often in the verum group, but without any statistical significant difference (p > 0.05). AEs were generally transient and not severe. One (1) patient required hospital treatment due to a strong migraine episode, 1 patient was admitted to the hospital due to a syndrome of the thoracic spine and concomitant palpitations (both control group), and another patient had a bronchiopulmonary infection (control group). These events were not considered drug related by the investigator. Two (2) patients suffered from incidents 1 month after having stopped the study medication: In the recovery phase of pneumonia, 1 patient (control group) had clinical signs suggesting pulmonary embolism, but this diagnosis could not be verified; the other patient (control group) suffered from deep vein thrombosis. These cases were not considered drug related and were clearly beyond the washout phase of 14 days (Table 6).
Hospital admissions: one strong migraine episode, one syndrome of the thoracal spine (both control group).
Laboratory findings
Pathologic serum laboratory AEs were rare in this study. One (1) patient (verum group) had slightly elevated alanine transaminase (GPT) (GPT 68 IU/L; normal 0–55 IU/L) during antibiotic treatment (Zithromax) of a bronchiopulmonary infection. This was considered not to be related to study medication by the investigator. One (1) patient (control group) had slight and reversible elevation of creatinine (1.96 mg/dL; normal 1.2 mg/dL), this was attributed to lack of hydration and not to study medication.
Tolerability of interventions
Individual tolerability of decoctions measured by a VAS (0 = worst tolerability, 100 = optimal tolerability) were 92 mm in both intervention groups (SD ± 13.0 verum group, SD ± 11.2, control group).
Control of blinding
There was no lack of blinding in the study. The statistical difference (p > 0.05) in the assessment of the correct intervention was comparable in both groups: 75% of patients of the verum group were believed to have taken verum and 69% of the patients in the control group also were believed to have taken the verum.
Discussion
In this trial, it has been shown that high-quality, randomized, double-blind studies with herbal medicines are feasible. There was a reasonable, and according to sample size calculation, sufficient number of patients in the active and control group; the dropout rate was small. The study medication was given in decoction form, being the traditionally preferred method of herb preparation in TCM. 30 The blinding procedure was sophisticated and included a “control” preparation with a taste practically indistinguishable from the verum.
Limitations of the study were the fact that a number of patients did not take the full dosage of the tea preparations. Unfortunately, it was not possible to evaluate the daily dose of pain medication in the diary, because the patients failed to fill in the diaries adequately. The demonstration of a reduced need for pain medications would have been an interesting and useful secondary measure of efficacy of the verum preparation.
Further possible limitations result from properties of the herbal formulas. The control formulae were chosen, because they were considered as not clinically effective. However, in experimental settings anti-inflammatory activity has been shown in some components of the control preparations, as in Agrimonia herba, Erythraea centaurium, and Cinnamoni zeylanicum cortex. 31,32,33,34 Thus, the control preparations were not true placebos. In order to facilitate safety controls and to minimize adverse events, there was a deliberate and strict limitation regarding the use of more toxic and also stronger “blood moving” formulae in the verum. For similar reasons, the herbal preparations were used in low dosage.
The results show that in both groups, there was a significant improvement of clinical symptoms as defined by the WOMAC score and other parameters between baseline and week 20. However, there was no difference in the efficacy between the “active” and the control group in all predefined parameters. Therefore, it must be concluded that the individualized herbal mixtures were not superior to a standard tea preparation. The beneficial effect in both groups may be mainly explained by nonspecific effects, which might have been enhanced by the intensive patient care. It is known that there are strong placebo effects in all treatments of osteoarthritis. 35 Also, the so-called “regression to the mean” might have played an important part. It is not out of the question that the tea preparation that the control group received has shown some efficacy on OA of the hip and knee. The above mentioned measures of putting restrictions on dosage and the use of certain potentially effective herbs due to safety issues might also have been a reason for the failure of the active preparation to provide greater effects.
In retrospect, it must be realized that the expectations to get a significant result in favor of the specific herbal formulae in this trial were too high. The selection of the trial herbs in the mixtures was not based on phase II trials, but rather on clinical experience of experts in TCM. On the basis of such results, it will be necessary to test the single components as well as the combinations in phase II trials. It is certainly reasonable to combine herbs, but only in the last step (if some efficacy is demonstrated) a personalized drug treatment according to the Chinese clinical “patterns” may be indicated. For future trials, the clinical TCM diagnosis must be more refined, and recent studies have shown that the reproducibility of these clinical diagnoses need considerable improvement. 36,37,38 On the basis of safety data in this study, trials with higher doses of these herbs are justified. Also, in future trials there should be more emphasis on “blood-moving” herbs.
The need for effective drug treatments without side-effects is enormous, because of the large number of patients with OA and the long duration of symptoms. Despite the failure of this trial, the authors still believe that many Chinese (or European) herbs contain useful substances for the treatment of OA and should be tested in further randomized trials.
Conclusions
The individual prescription consisting of Chinese herbs according to TCM diagnosis investigated in this trial reduced symptoms of the OA. However, the individualized treatment with low-dose Chinese herbs for OA of hip and knee is of no measurable additional value to standardized herb teas, which have no known specific effect on OA. Because of this result, nonspecific beneficial effects of the herbal treatment are likely, and treatment options with herbal preparations for OA should be evaluated in further trials.
Footnotes
Acknowledgments
The authors would like to thank the following persons for their contributions to this study: Mag. Erich Stöger, Dr. Andreas Höll, Astrid Gantschacher, Dr. Johanna Karminski-Pielsticker, Sabine Pichler, Elisabeth Schneider, Monika Zaiser, Prof. Dr. Christoph Male, Dr. Gerda Höfer, Prof. Dr. Michael Gottsauner, and Dr. Fritz Friedl. Financial support was provided by external funding. Grants were provided by the “Health and Social Fund of Lower Austria” (“NÖGUS”) and the “Fund of the Austrian National Bank” (“ÖNB”). The authors would like to thank the company “Plantasia,” (Oberndorf, Austria) and the pharmacy “Zu unserer lieben Frau bei den Schotten” (Vienna, Austria) for their contributions.
Disclosure Statement
No competing financial interests exist.
