Abstract
Objectives:
The study objectives were to establish Ojeok-san (Five Accumulation powder: Wu Ji San) administration criteria and a questionnaire to evaluate the holistic effects of Ojeok-san on patients with low back pain (LBP).
Methods:
Texts and literatures, recommended by specialists, were searched to gather Ojeok-san-related symptoms. Then, the opinions of Oriental Medicine Doctors (OMDs) practicing in Seoul were surveyed to ask which symptoms they consider the most in clinical practice. Based on the survey, selection of potential items for the questionnaire was made. The final version was established based on the results of the survey and Delphi process of musculoskeletal diseases specialists. In order to evaluate the reliability and validity of the newly developed assessment tool (Ojeok-san Low Back Questionnaire: OLQ), patients with chronic LBP were recruited. OLQ and other tools such as visual analogue scale, numeric rating scale, Roland-Morris Disability Questionnaire, Modified-Modified Schober test, and 36-Item Short Form Health Survey were applied to the subjects in a 2-week interval. Test-retest reliability, internal consistency, and convergent and discrimination validity were assessed.
Results:
A total of 90 potential items were generated by the research team. One hundred and two (102) OMDs fully replied to the survey. Based on the survey results, 34 items were initially selected as potential items. Through Delphi method of experts, 10 top items, rated more than 5 points on a scale of 10, were finally established. The 10 items were each established as a response scale of 0–10 (0 as no symptom and 10 as the most excessive form of symptom). Based on the above stages, an initial OLQ was established and used in the evaluation phase. The validity and reliability of OLQ assessment results showed high test-retest reliability, intraclass correlation coefficient, and internal consistency.
Conclusions:
The newly developed Ojeok-san administration criteria and questionnaire may be a promising tool for future Oriental medicine clinical study protocols.
Introduction
OM practitioners do not rely on biomedical and radiological examinations for their diagnosis and treatment of patients. OM practitioners have accumulated valuable experiences on capturing very detailed and accurate information about the signs, symptoms, and feelings of their patients. They rely heavily on observable signs and subjective feelings of patients. The four fundamental techniques that have been used by OM practitioners for thousands of years for diagnosis and treatment are observation (looking for visible signs), auscultation and olfaction (listening and smelling), interrogation (asking questions), and palpation (feeling the pulse). 1 However, because this process is highly reliant on the subjective judgments of the doctor, it was hard to establish objective evidences and standardize the diagnosis process. 3
Surveys and questionnaires are common tools used by clinicians and other professionals to objectively explain a subjective assessment. 4 Questionnaires in particular are frequently applied in medical examinations to cover the areas that are not visual or concrete and difficult to measure. 5 By using a questionnaire when asking about various symptoms of the patient, the limit of data a doctor has to obtain becomes unified. Also, the results of the questionnaire can be quantified, thus making it possible to objectify and standardize the diagnosis. 6 So in order to standardize OM assessment and diagnosis, a valid and reliable questionnaire based on OM theories must be established.
The reason why low back pain (LBP) and Ojeok-san (Five Accumulation Powder: Wu Ji San) was chosen to be studied was because, according to the 2008 statistics, the condition most frequently treated by OM in the Republic of Korea was LBP and the most frequently administered herbal medicinal extract for LBP was Ojeok-san. 7
In this study, administration criteria for Ojeok-san and a questionnaire to evaluate LBP and related symptoms were established. The validity and reliability of the newly established questionnaire (Ojeok-san Low Back Questionnaire: OLQ) was also evaluated.
Materials and Methods
Study design and overview
In order to establish questions for the questionnaire, a research team was organized. The research team was composed of 2 professors of Kyung Hee University, 2 professors of Dongguk University, and 1 fellow staff and 3 residents from each university, all of whom majored or are majoring acupuncture and moxibustion. The procedures of establishing questions were (1) initial generation of potential items based on review study of articles recommended by specialists, (2) item selection through survey of Oriental Medicine doctors (OMDs), and (3) final item selection and modification through Delphi method by experts.
Item generation (generation of potential items based on review study)
In order to establish questions for the questionnaire, a research team was organized. The research team sent out e-mails to OM specialists, composed of Korean professors of OM whose specialty is OM Classics, Meridian and AcuPoints or Acupuncture and Moxibustion, asking recommendations about traditional literatures and modern texts to use as references for item review. The recommended references were hand searched for symptoms treated by Ojeok-san and citations on LBP caused by Wind, Cold, Dampness, or Stagnation of Blood.
The results were translated into Korean in a manner that both the physician and the patients could understand and easily use. Some overlapping terms meaning the same symptom but expressed differently in different articles were unified and moderated. Through 5 discussion sessions, 90 symptoms that were unanimously selected by the 4 OM professors of the research team were chosen as potential items. Then, they were reviewed and modified by a statistician in order to confirm that they were in a statistically analyzable form.
Initial item selection
Among the 3931 Members of Seoul Korean Medicine Doctors' Association, 2906 who obtained licensure more than 5 years ago and who were practicing in local clinics were selected as the survey population. A statistician processed a stratified random sampling in order to obtain the representative opinions of this population. The number 306 was derived as the appropriate sample size. Thus, the selected 306 were phone interviewed from March 21, 2009 to April 4, 2009. They were asked whether or not they prescribe Ojeok-san for LBP, how they would rate the treatment effect, and whether or not they are willing to comply with the visited interview.
One hundred and six (106) subjects who answered that they do prescribe Ojeok-san for LBP and felt that the treatment effect was above normal were selected to be visited for interviews. The subjects were visited and interviewed from May 6, 2009 to May 23, 2009. They were asked to answer “yes” or “no” on whether or not they consider each symptom on the 90 potential items when prescribing Ojeok-san for LBP patients. One hundred and two (102) subjects successfully completed the interview. Frequency analysis of the obtained results was done using a computer-based statistical package, STATA/SE (Stata/SE 9.2 for Windows, StataCorp LP, College Station, TX, USA).
Based on a similar study, 8 items selected by more than 30% of the survey population were selected as potential items. A total of 34 items were selected.
Final item selection and modification
Professors of the 11 OM colleges in Korea, working at the department of Acupuncture & Moxibustion (n=48) or OM rehabilitation (n=29), and who were all specialists of musculoskeletal diseases, were invited to contribute to the development process as a board member of experts. Sixteen (16) professors, including the 4 professors of the research team, accepted the invitation and agreed to participate as the board of experts.
In order to derive a consensus of the experts, the Delphi method was adopted. The Delphi technique has been used widely in health research within the fields of technology assessment, education and training, and priorities and information, and in developing nursing and clinical practice. 9
The 16 experts were informed about the results of preceding surveys and asked through e-mail to rank each of the 34 potential items on a scale from 0 (not important) to 9 (extremely important). The minimum, median, and maximum values of the results were analyzed and announced to the experts.
Then, the experts were gathered for a face-to-face conference. After they were informed with an anonymous summary of the previous rankings as well as the reasons provided for the decisions, the experts were encouraged to revise their earlier rankings in light of the replies of other members of the panel 10 and through thorough debates of various opinions. A final secret ranking of the items was made. Ten (10) final items, rated more than 5 points on average on a scale of 10, were selected. Based on the decisions and under the editorial supervision of the Delphi group members, establishment and modification of the items into response scales were done.
Field test (validation of the derived OLQ)
In order to evaluate the reliability and validity of the derived assessment tool, OLQ, a clinical study in three medical centers (Dongguk University Ilsan Oriental Medicine Hospital, Dongguk University Bundang Oriental Medicine Hospital, and Kyung Hee University Oriental Medicine Hospital) was done. One hundred and eighty (180) patients suffering LBP for more than 3 months were recruited based on the established inclusion and exclusion criteria (Table 1). 11,12
Test-retest reliability was established utilizing patients who were seen twice during the study period, in an interval of 2 weeks. Intraclass correlation coefficient (ICC) values were measured. Internal consistency of OLQ was tested by examining the Cronbach α value.
External validity was established by correlating the scores of the OLQ with (1) pain measured by Visual Analogue Scale (VAS), (2) pain measured by numeric rating scale (NRS), (3) Roland-Morris Disability Questionnaire (RMDQ), (4) Modified-Modified Schober test (MMST), 13 and (5) 36-Item Short Form Health Survey (SF-36). To evaluate the construct validity of the questionnaire, exploratory factor analysis was done (Fig. 1).

Study design and outline of the Ojeok-san Low Back Questionnaire. LBP, low back pain.
In order to measure pain using VAS, the patients were asked to mark the relevant amount of pain they were experiencing at every evaluation visit on a 100-mm line where 0 represents “no pain” and 100 “unbearable pain.”
As for NRS, the patients were asked to orally rate the pain at every visit on a scale of 0–100.
To measure MMST, the examiner put his/her thumb on the inferior margin of the patient's posterior superior iliac spine (PSIS). A line along the midline of the lumbar spine horizontal to the PSIS (lower landmark) was drawn. While the examiner held a tape firmly against the patient's back, a second line 15 cm above the original one (higher landmark) was marked. Then the patients were asked to lean forward as far as they can without increasing the pain. The new distance between the lower and higher landmarks was measured. The patients were asked to return to the neutral position. The difference in the initial distance between the skin markings in the neutral position and the new measurements made in the flexion position was used to indicate the amount of lumbar flexion. 13
Statistical analysis
Data were presented as means (standard deviation) or medians (range), and baseline characteristics between treatment groups were compared using one-way analysis of variance (ANOVA) or χ2 test.
To evaluate the construct validity of the questionnaire, exploratory factor analysis, which could be used to categorize the 10 items based on similar characteristics, was performed.
For selection of factor, the factor having an eigenvalue (which measures the amount of variation accounted for by the factor) >1 was chosen. Varimax rotation was done to control for certain influences on the result.
For external validity of the developed questionnaire, Pearson's or Spearman's rank correlation tests were conducted by assessing its correlation with the other well-known pain assessment scales including VAS, NRS, RMDQ, MMST, and SF-36. Correlation r of 0–0.25 was considered as lack of correlation, 0.25–0.5 as fair, 0.5–0.75 as moderate to good, and >0.75 as very good to excellent correlation. 14
Cronbach coefficient α was obtained to evaluate internal consistency reliability for the overall scale, and individual subscales. Test-retest relationship was also assessed with the 2-week interval considering the reliability. Moreover, to perform the inter-rater reliability test between 2 raters, ICC was determined.
The statistical analysis was carried out using NCSS2007 (NCSS, Kaysville, UT) and SPSS 12 (SPSS, Inc., Chicago, IL) software.
Results
Potential item generation
The generation of items started from organizing a research team. The research team consulted experts through e-mail to establish a list of traditional literatures and modern texts to search citations concerning Ojeok-san and LBP-related symptoms. There were 18 items on LBP, 9 items on radiating pain, 10 on other forms of pain, 10 on digestion, 19 on general symptoms, 4 on gynecology, 14 on pulse, and 6 on tongue diagnosis. Thus, a total of 90 potential items was generated by the research team.
Initial item selection
One hundred and two (102) of 106 (96%) succeeded in completely answering the interview. Thirty-four (34) items, selected by more than 30% of the survey population, were selected as potential items.
Final item selection and modification
On September 5, 2009, a gathering of the Delphi group members was held in order to discuss and select the final items. Items concerning tongue and pulse diagnosis, which are inappropriate for patient self-assessment, were excluded. A result of 10 top items that were rated more than 5 points were finally established (Fig. 2).

Results of Delphi group's rating of item importance. Experts were asked to rank the items from 0 (not important) to 9 (extremely important); thus, higher mean scores represent higher-ranked items.
The 10 items were each established with a response scale of 0–10 (0 as no symptom and 10 as the most excessive form of symptom). Based on the above stages, an initial OLQ was established and used in the evaluation phase.
Field test (validation of the derived OLQ)
One hundred and eighty (180) patients (male, 58; female, 122) aged from 24 to 75 years were enrolled at three investigative sites. The mean duration of pain (median, range) was 7 (0.3–40) years.
The overall mean questionnaire scores by VAS and NRS were 44.7 and 44.5, respectively, and the distribution of the overall questionnaire scale showed the normal distribution pattern.
Test-retest reliability and internal consistency
Internal consistency of OLQ was tested by examining the Cronbach α value. Internal consistency reliability estimates (Cronbach α, 0.81) for the 10 items exceeded 0.7, the recommended score for good reliability. The ICC between 2 independent raters showed a relatively good agreement (ICC=0.77), 95% confidence interval: 0.93–0.97). The test-retest results assessed with the 2-week interval showed a significant correlation (r=0.84, p<0.01). (Fig. 3).

Test-retest reliability of Ojeok-san Low Back Questionnaire (OLQ). OLQ-0 is the score measured at baseline and OLQ-2 is the score measured 2 weeks after baseline.
Factor analysis
Of the 180 participants who received the 10-item OLQ, complete data were available for 178 participants. With respect to sample size, exploratory factor analysis is error prone even at relatively large sample sizes. One (1) simulation analysis reported that at a 20:1 subject-to-item ratio, there are error rates well above the field standard α=0.05 level. 15 The 178 sample size of the present study was at about 20 subjects per item.
Exploratory factor analysis of the OLQ was performed. When the appropriateness of factor analysis was examined for the 10 items, Bartlett's test of sphericity (480.60, p<0.01) showed that all the items were not uncorrelated, and thus, there was apparently some correlation structure in this data set that can be modeled for factor analysis.
The first three eigenvalues of the varimax-rotated model were >1 (2.81, 2.04, 1.08), suggesting a potential three-factor solution according to the Kaiser criterion (eigenvalue >1.0). In addition, examination of the scree plot indicated that the optimal number of factors was three.
For the retained three factors, five items were heavily loaded on the first factor. Three (3) of the remaining items loaded heavily on the second factor, and one loaded heavily on the third factor. One item (item 5) was cross-loaded across the first and second factors. The three factors explained 59.35% of the total variance (Table 2).
OLQ, Ojeok-san Low Back Questionnaire.
The items were grouped into three groups according to the factor analysis. Each group was named according to their characteristics. Item 1, loaded heavily on factor 3, was categorized into group 1, which was named “Heat Effect” because the item questions whether or not heat relieves LBP. Items 2, 5, 8, and 9, heavily loaded on factor 2, were categorized into group 2 and named, “Character of Low Back Pain” because the items concerned descriptions of pain type. Items 3, 4, 6, 7, and 10, heavily loaded on factor 1, were categorized into group 3 and named “Cold Symptoms” because the items concerned cold sensations in the legs, hands and feet, whole body, or lower abdomen.
Convergent, discrimination validity
Convergent, Discrimination validity was established by correlating the scores of the OLQ with (1) VAS, (2) NRS, (3) RMDQ, (4) MMST, and (5) SF-36. The OLQ showed fair correlation with RMDQ (r=0.40, p<0.01), SF-36 showed a fair negative correlation with OLQ (r=−0.39, p<0.01), and correlations were all statistically significant. However, VAS (r=0.22, p<0.01), and NRS (r=0.21, p<0.01) showed lower correlations (Fig. 4).

Validation results of the OLQ compared with Visual Analogue Scale (VAS)
Discussion
OLQ was developed through a large-scale survey of OM doctors of Korea and discussion and consent of experts.
In order to have confidence in the results of a study, it is necessary to establish that the questionnaire is both valid and reliable. 16 Validity is usually defined as the extent to which a test measures what it is intended to measure, while reliability refers to consistency in obtaining the same results again. 17 The validity and reliability must be considered not only in standardized assessment tools but also in newly developed ones.
In order to measure the validity and reliability of OLQ, test-retest reliability, ICC, and internal consistency were measured by applying to patients with LBP. The results showed high test-retest reliability, ICC, and internal consistency.
Ergil said that a general pattern of diagnosis and treatment can be established through randomized survey of practitioners. 18 This was adopted in this study to secure validity of the developed scale, through a large-scale survey of OM doctors in Korea. The consultation of experts was received in deciding which relevant references should be considered in establishing the first draft of the questionnaire; telephone survey and interview visits were done to upgrade this to the second draft, and finally a discussion of experts using the Delphi method took place in order to complete the third draft.
To evaluate and secure the external validity of OLQ, a comparison study with other scales was done. The results showed that OLQ showed fair correlation with RMDQ and SF36. Also, the factor analysis of 10 items had the result that one global factor can encompass all 10 items. OLQ showed low correlation with scales measuring pain or range of motion, such as VAS, NRS, and MMST, but showed fair correlations with RMDQ and SF-36.
The fact that OLQ was not particularly correlated with any of the standard measures suggests that it is suitable for measuring symptoms of interest, because the questionnaire was developed to look at the complex of symptoms that could be addressed by an herbal treatment.
Various OM assessment and outcome scales 1,6,19 –24 are being developed. These scales are developed based on literature research and opinions of experts and focuses on the signs and symptoms of the patients.
The newly developed OM assessment tools might be more sensitive to measure the efficacy of treatment in the OM perspective, providing a unique, ecologically valid perspective for designing trials, assessing clinical outcomes, formulating evidence-based clinical recommendations, and developing future study protocols. 2
However, until now, only a few formal attempts have been made to evaluate the validity and reliability of OM diagnosis and assessment tools. 1,25 If a systematic evaluation of validity and reliability as done in this study is applied and modifications are made based on the results, more diagnosis and assessment tools may become more applicable.
This study has the following limitations. OLQ was established in the form of a questionnaire, and this could only capture information (i.e., signs, symptoms, emotions, and feelings), that the subjects could be consciously aware of. There might be some other important information not included (e.g., the pulse/tongue diagnosis), which could only be assessed by OM practitioners. Also the patients recruited for the clinical trial may be exposed to selection bias. Even though efforts were made to eliminate this possibility by processing a multicentered study in three medical centers, there might be a chance that only patients who are interested in and favorable to OM treatment may have been recruited. Also, the experts who participated in this study were composed of OM and statistics experts, while it would have been more appropriate if Western medicine doctors and other health care providers took part.
Conclusions
The authors have developed an OM diagnosis and assessment tool, OLQ, through systematic survey, consents of experts, and evaluation of validity and reliability through a clinical trial. This is one of the very few systematically developed OM diagnosis and assessment instruments. It is hoped a guide has been presented for future development studies of an OM diagnosis and assessment instrument.
Footnotes
Acknowledgments
This study was supported by the Korean Ministry of Health & Welfare as part of a supporting project to promote Oriental Medical Technology (B082011).
Disclosure Statement
No competing financial interests exist.
