Abstract
Abstract
Background:
There have been few previous reports of the reliability of acupuncture-point location and no previous reports of reliability at points that are near anatomical landmarks.
Objective:
The objective of this research was to determine the internal agreement of 2 junior acupuncturists for marking surface acupuncture point locations (SAPLs) at 6 points bilaterally and to determine the agreement between an expert acupuncturist and each of the 2 junior acupuncturists at the same points.
Design:
This research was designed to measure intrarater and between-rater agreement of SAPL marking on several volunteers.
Outcome measures:
Total agreement probability (TAP) for intra-rater and between-rater markings.
Setting:
This research took place at an academic rehabilitation center.
Subjects:
The subjects were 22 healthy volunteers and the researchers were 3 physician–acupuncturists.
Interventions:
For intrarater assessment, 2 junior acupuncturists marked 6 SAPLs bilaterally 4 times on 11 volunteers, using different colors of invisible ink. The points from each acupuncturist within a 0.5-cm radius circle placed to “best fit” were counted as agreements. For between-rater assessment, 2 junior acupuncturists marked the same SAPLs with invisible ink on 15 volunteers. An expert acupuncturist then placed an index mark at each SAPL. The marks that fell within a 0.5-cm radius of the index mark were counted as agreements.
Results:
TAP within the observations of the 2 junior acupuncturists were 0.92 (confidence interval [CI] 0.89–0.94, p<0.05) and 0.96 (CI 0.94–0.97, p<0.05). TAP between the expert acupuncturist and the junior acupuncturists were 0.48 (CI 0.41-0.55, p=not significant [NS]) and 0.56 (CI 0.49-0.63, p=NS).
Conclusions:
This study demonstrated that there was excellent intrarater agreement in surface marking at the selected acupuncture point locations. Agreement between the junior acupuncturists and the expert acupuncturist was much less robust. Further research is needed to determine if these findings are reproducible, what factors influence consistency of point location identification, and whether variance in acupuncture-point location affects treatment outcomes.
Introduction
Previous investigators have documented low accuracy of marks placed by different acupuncturists at surface acupuncture point locations (SAPLs). 2 Precision of marking “fictitious points” varied widely in a similar study depending on the method of measurement. 1 Reliability measures of SAPLs among participating acupuncturists are not routinely reported in published acupuncture trials.
The current authors hypothesized that an acupuncturist is able to mark SAPLs many times at the same location consistently, and that, after participating in calibration sessions, an acupuncturist is able to mark SAPLs close to the marks made by another acupuncturist. To test this hypothesis, a study was designed to assess intrarater agreement and between-rater agreement of SAPLs.
Methods
Inclusion, Exclusion, and Withdrawal Criteria
The inclusion criterion for participation was ability to give consent. Exclusion criteria were embarrassment by exposure of or by palpation of arms and legs, or known hypersensitivity to ink. Withdrawal criteria were self-reported embarrassment or hypersensitivity occurring during participation. Volunteers were allowed to participate in many parts of the study.
Approval
This study was approved by the Northwestern University Institutional Review Board.
Intervention
The acupuncture points evaluated for marking were KI 3, HT 3, BL 60, LI 4, LR 3, and PC 7. These points were chosen because they are commonly used and can be found in proximity to anatomical landmarks. The locations used were based on the textbook and course materials from the Helms Medical Acupuncture Course (the Western medical tradition) and characterized further for the purpose of this protocol. 6 See Table 1 for the verbal descriptions used for these acupuncture-point locations. The verbal descriptions were used to find the area of the acupuncture points, and palpation was used to finalize the localizations. Two calibration sessions were held with all participating acupuncturists before data collection to standardize the verbal descriptions and operational definitions for the SAPLs (Table 1), to practice locations of these points, to resolve discrepancies in point location, and to assess the process of marking and data collection. The 3 physician–acupuncturists reported consensus on the verbal descriptions and the physical locations of the SAPLs after these sessions.
Descriptions adapted from: Helms JM, Claraco AE, Ng A. Point Locations and Functions. 3rd ed. Berkeley, CA: Medical Acupuncture Publishers; 2001. 6
The physician–acupuncturists had varying levels of experience. F.Z., an attending physician, had 8 years of years of experience as a practicing acupuncturist, and is referred to in this article as the “expert acupuncturist.” W.E.R. had 2 years of experience as a resident physician after training in acupuncture. C.C., an attending physician, had 1 year of experience after training in acupuncture during the course of a fellowship program. W.E.R. and C.C. are referred to as “junior acupuncturists” in this article. All of the participating acupuncturists trained in the Helms Medical Institute Medical Acupuncture for Physicians course. 7
For this study, the following assumptions were made:
(1) SAPL marking is a meaningful surrogate for the process of acupuncture point location in a clinical setting. (2) A SAPL was considered to be a point on the surface of the skin superficial to the acupuncture points that would be reached if a needle were to penetrate the skin and advance to the appropriate depth. (3) SAPLs are stable in location over a period of 15–20 minutes. (4) An acceptable degree of variability in SAPL is a 0.5-cm radius circle, as a “best fit” during intrarater assessment, or centered on the experienced acupuncturist's mark during between-rater assessment. (5) An acceptable degree of agreement probability is 80%, and 60% agreement could be expected for untrained people who were marking points, using the descriptions.
The skin of volunteer models was marked by 2 junior acupuncturists using pens (a Rapidograph Koh-I-Noor #4/1.20 mm) filled with nontoxic ink that visible only under ultraviolet (UV) light (All-Purpose Blue Invisible Ink and Invisible Red Ink). The expert acupuncturist used a visible-ink felt-tipped pen (Sanford Liquid Expresso Medium Point Blue).
The different colors of invisible ink were used to distinguish the marks made by the junior physician acupuncturists. None of the physician–acupuncturists were able to see these marks on the skin under the standard fluorescent lighting used during point-marking sessions. A 6″ UV light was used to visualize the marks in a darkened room by a research assistant who was blinded to the identity of the marking acupuncturist; this assistant had no acupuncture experience but was trained in the assessment technique at each of the surface locations used in the study. A transparent template with a single point, surrounded by a 0.5-cm radius circle was overlaid against the skin under the UV illumination to determine how many points fell within the circle, as described in the protocols below. If the ink from marks “bled” together, the interpreting researcher assessed how many marks were present using the number of points visible on the skin as a guide (Figs. 1 and 2).

Illustration of the LI 4 surface acupuncture point location SAPL during the data collection process of the intrarater evaluation, with the circle template placed to maximize the number of included dark gray marks. In this example, no disagreements would be recorded for dark gray. When the circle template is then placed to maximize the number of included light gray marks, one disagreement would be recorded. (The actual points were illuminated under ultraviolet light to fluoresce blue and red.)

Illustration of the LI 4 surface acupuncture point location during the data collection process of the interrater evaluation, with the circle template centered over the “gold standard” mark made by the expert acupuncturist (represented with asterisk). In this example, the light gray mark would be considered an agreement and the dark gray mark would be considered a disagreement. (The actual points were illuminated under ultraviolet light to fluoresce blue and red.)
Data collection was performed in two sessions: intrarater evaluation and between-rater evaluation.
Intrarater Evaluation
The 2 junior acupuncturists used invisible ink to mark the SAPLs bilaterally on volunteers. Each acupuncture-point location was marked by each junior acupuncturist 4 times. All of the point locations were marked only once by the junior acupuncturists during each pass, and 4 passes were made. The circle template was placed to include the most points possible for each of the colors. There was no “gold standard” acupuncture point marked for this part of the study. Any mark that was on or within the circle placed to “best fit” at each point location was counted as an agreement, and any mark that was outside of the circle was counted as a disagreement. Best fit was determined by the investigator who was using the UV light by placing the circle to include as many marks from the junior acupuncturist as possible (Fig. 1).
Between-Rater Evaluation
Between-rater evaluation of agreement between each of the junior acupuncturists and the expert acupuncturist was determined as follows: The two junior acupuncturists used invisible ink to mark the SAPLs bilaterally on 14 volunteers. The experienced acupuncturist then used visible ink to mark the acupuncture point locations on the same volunteers, which was then considered the “gold standard” to which the junior acupuncturists' marks were compared. Any mark that was on or within the circle template centered on the expert acupuncturist's mark was counted as an agreement, and any mark that was outside was counted as a disagreement (Fig. 2).
Data on age, self-reported race, self-reported height, and self-reported weight were collected from each participant, and body mass index was calculated from these data.
Data Analysis
Primary outcomes: Intrarater agreement was expressed as total agreement probability (TAP) for the aggregated point locations. Intraclass correlations based on time of rating were not calculated because the marks placed at different passes were indistinguishable. Between-rater agreement was expressed as TAP between each of the junior physician–acupuncturists and the expert physician–acupuncturist. A Cohen's κ statistic could not be calculated using this design, because the expert acupuncturist only marked each point once. Therefore, the “gold standard” marking had no intrinsic variability. Given that interrater reliability implies the use of a κ statistic in common statistical parlance, this article uses the term “between-rater” to describe this study's interrater assessment. Confidence intervals (CIs) were calculated using the Agresti-Coull method.
Power calculations indicated that 11 and 15 subjects would be more than adequate to calculate intrarater and between-rater reliability, respectively, if 80% agreement could be achieved, assuming a 60% agreement for untrained raters, and an α of 0.05.
The number of discrepancies at specific point locations was examined in post-hoc analysis to determine if specific point locations contributed disproportionally to disagreement. This study was not designed to evaluate statistical significance of differences in agreement between individual point locations.
Results
See Table 2 for participation and demographic information on subjects whose data were included in the analysis. Data from 6 subjects were withdrawn before statistical analysis because of incomplete consent documentation. No serious adverse events were noted (1 participant had a wheal and flare response without pruritus at the site of marking that faded within minutes; this subject's data set was a part of the excluded data). Subjects were allowed to participate in both data collection sessions. Twenty-two subjects participated in data collection; 11 subjects participated in the intrarater evaluation, 15 participated in the between-rater evaluation, and 4 subjects participated in both evaluations.
SD, standard deviation; BMI, body mass index.
Intrarater Evaluation Results
Intrarater agreement was very high for both junior acupuncturists (Table 3). Each acupuncturist made 528 marks. The junior acupuncturists had 42 and 20 discrepancies with TAPs of 0.92 (95% CI 0.89–0.94, p<0.05) and 0.96 (95% CI 0.94–0.97, p<0.05), respectively. Post-hoc analysis demonstrated that, of all the point locations, HT 3 had the greatest number of disagreements, alone accounting for >50% of the total disagreements for each of the acupuncturists. Disagreements were more common on the right side of the body than on the left side of the body for both raters.
TAP is reported as proportion followed by 95% CI and significance level in parentheses.
TAP, total agreement probability; CI, confidence interval.
Between-Rater Evaluation Results
Agreement between the junior acupuncturists and beween each junior acupuncturist and the expert acupuncturist was much less robust (Table 4). Each acupuncturist made 180 marks, but one mark from each acupuncturist could not be found during the UV light examination, leaving 179 marks for each available for evaluation. The acupuncturists had a rate of 93 and 78 disagreements with TAP of 0.48 (95% CI 0.41–0.55, p=NS) and 0.56 (95% CI 0.49-0.63, p=not significant), respectively. Post-hoc analysis demonstrated that the HT 3 location was the source of the greatest number of discrepancies, followed by BL 60 and KI 3. Disagreements were again more common on the right side of the body than on the left for both raters, but not as notably as what occurred in the intrarater evaluation.
TAP is reported as proportion followed by 95% CI and significance level in parentheses.
One mark at each of these locations could not be located.
TAP, total agreement probability; CI, confidence interval; NS, not significant.
Discussion
The high degree of agreement in the intrarater assessment of the junior acupuncturists supports the hypothesis that an acupuncturist can mark SAPLs consistently at the same location. This may indicate that a treatment from a single acupuncturist, using a consistent combination of points, is essentially the same across treatment sessions.
The lesser degree of agreement in the between-rater assessment contradicts the hypothesis that an acupuncturist can mark SAPLs close to the mark made by another acupuncturist. This finding is clear despite the proximity of anatomical landmarks to each of the selected points, the similar training of the acupuncturists, and the calibration sessions. The question remains whether this variability comes from the process of anatomical location or from palpation. This challenges the assumption that treatments given by different acupuncturists are identical, unless between-rater reliability has been measured and documented.
The finding of lower-than-expected between-rater agreement is consistent with the poor accuracy and precision reported when many raters trained at the same institution mark traditional or fictitious point locations.1,2 This finding also adds to the literature that offers constructive criticism of the methods of point location that are currently in use.1–3,8,9 The cun system uses anatomical landmarks combined with standardized measurements to localize acupuncture points, but it may not address anthropometric variability caused by gender, race, or obesity.3,8,9 Researchers have speculated that imprecision of SAPL increases for points distant from anatomical landmarks.1–3,8, 9 The present study calls attention to the variability in SAPL that exists among acupuncturists, even for points that are close to anatomical landmarks, and suggests an additional possible reason for the poor precision and accuracy reported in the studies cited above.1,2
This study raises several important issues. Finding positive effects of acupuncture despite unreliable point locations among acupuncturists could be interpreted as indirect support of the hypothesis that specific effects cannot be attributed to specific acupuncture points. 10 However, poor reliability of point location may also be invoked to help explain why research has not uniformly demonstrated a difference between “active” and “sham” acupuncture.1,2,10 Furthermore, if a specific combination of points is demonstrated to have a positive effect on a particular condition through rigorous research with a uniform point-location method, is it reasonable to believe that acupuncturists in the community, using many point-location techniques, will achieve the same treatment success?
To examine whether or not acupuncture point stimulation has specific effects best, detailed descriptions of point locations used in acupuncture research and the methods used to find these points should be reported, as recommended in the STRICTA guidelines, 11 as well as an assessment of the reliability of acupuncture-point location (and sham-acupoint location, if used) by the investigators who are performing the acupuncture treatments tested.
The results of the current post-hoc analysis were notable, although the statistical significance could not be assessed. For the intrarater assessment, most of the disagreements occurred at HT 3, and most occurred on the right side of the body. It is difficult to explain why the right side of the body would be prone to disagreement in point location, but the direction of lighting and/or the dominant hand (right-handed for both junior acupuncturists) may have played a role. Another consideration may be that, according to Traditional Chinese Medicine (TCM), feminine energy predominates on the right hemibody, and male energy predominates on the left hemibody. This difference may affect the energetic character—and thus the localization—of points on each side of the body. For the between-rater assessment, the least agreement occurred at HT 3, but most of the other point locations also had low agreement. The possibility that handedness played a role in the between-rater assessment was considered, given that the expert acupuncturist was left-handed. HT 3 is described as a “spiritual” point in the TCM tradition, and, thus, a contribution of the spiritual understanding of the acupuncturist or the spiritual state of the participants could conceivably confound efforts to locate that point within or between raters consistently.
The assumptions that were made to perform the study can be challenged. It should be emphasized that there is no validated measure of acupuncture-point size, location, or depth. Surface marking may be an inadequate representation of the phenomenology of acupuncture-point location, but landmark-based surface location is the norm in standardized point-location descriptions and in teaching point locations. In this study, surface location was followed by palpation to isolate the tactile and energetic qualities of each acupuncture point, which is consistent with standard clinical practice. Positional skin deformation and subcutaneous soft tissue could result in translation of the marks relative to the intended structures, and the amount of tension on the skin could also affect the distances between the marks. The effects of skin deformation were minimized by assessing the marks with the extremities in the same positioning that was used for placement. Marking SAPLs many times in a single session is not similar to clinical practice, but was is the only feasible way to perform this study with the tools available.
Some practitioners of acupuncture are likely to assert that that there can be no true standard acupuncture-point locations, sizes, or depths, because the characteristics of acupuncture points change constantly. Furthermore, from a Western scientific perspective, we do not know enough about the anatomy and physiology of acupuncture points to assert with confidence what should be the qualities of an acupuncture-point location. The most extensively studied physiological attribute of an acupuncture point, altered electrical resistance, still has not yielded consistent results.12,13 Other proposed physiological characteristics of acupuncture points have not yet provided consistent bases for identification. The assertion that there can be no standard for acupuncture-point locations contradicts efforts to standardize point locations and provide consistent scientific reporting.4,11
The 0.785-cm2 area chosen as an acceptable SAPL in this protocol was carefully considered. This area is the approximate size of a palpating fingertip used to identify the site of needle placement after landmark-based localization and minimizes the overlap between tested points and other nearby points. Some researchers have hypothesized that an acupuncture point has a 0.25-cm radius,14–16 so the 0.5-cm radius field size used in this protocol would have allowed for a reasonable degree of flexibility in clinical location, which could have been corrected by altering the angle of needle insertion. Another group of researchers has found a standard deviation of ∼ 0.5 cm in the distance from the midline to points on the inner Bladder meridian of the back. 17 Nevertheless, a single size was chosen for all the points in the current study, which does not reflect clinical reality. For example, it is easy to argue that larger areas must be considered for the inner Bladder meridian points of the back or the Conception Vessel points of the ventral body than for the Jing-well points found at the borders of the nails on the fingertips. Furthermore, points found deeply in soft tissue can be contacted from a relatively wide radius by alteration of needle angle, but points found more superficially will not allow the same flexibility.
The assumption that an 80% rate of agreement is acceptable either between or within acupuncturists may be challenged, as well as the assumption that an untrained marker would have a 60% rate of agreement. These assumptions were rigorous but did not affect statistical significance in either evaluation, because there were had many more subjects than required in the intrarater group, and there was not an 80% agreement in the between-rater evaluation.
Three acupuncturists who were trained by the same program were tested in this protocol, limiting generalizability. This protocol did not include an intrarater assessment of the expert acupuncturist, which should be included in future iterations of this study. A between-rater assessment comparing the junior acupuncturists to one another would have added valuable information to this inquiry. In retrospect, this information could probably have been collected as a part of the interrater assessment. It was not possible to evaluate variation in the expert acupuncturist's marks using this protocol because of the limitations of the marking technique. If this technical limitation can be surmounted, it will allow calculation of a κ statistic.
It remains to be investigated if further refinement of the calibration process could improve agreement in SAPL between acupuncturists. It would also be important to understand better what factors may influence consistency of SAPL identification by designing a study specifically to clarify: if side/handedness really do matter; if distance from a bony landmark is relevant; and/or if other specific factors may play roles in accuracy of SAPL. Such a study would, therefore, be a valuable next step. Another important unanswered question worth further attention is whether variance in SAPL is clinically relevant (i.e., does it affect the clinical outcome of acupuncture treatments or the results of controlled studies?).1–3,9
Conclusions
This study demonstrated that there was excellent intrarater agreement in SAPLs. Agreement between the junior acupuncturists and an expert acupuncturist was much less robust. Further research is needed to determine if these findings are reproducible, what factors influence consistency of SAPLs, and whether variances in SAPLs affect treatment outcomes.
Footnotes
Acknowledgments
The authors express deep appreciation to the Rehabilitation Institute of Chicago for providing the physical facilities, and for the intramural grant used to fund this study. Earnest thanks are due to Cherina Cyborski, MD, for her participation as one of the junior acupuncturists, Danielle Zelnick, MD, for her role in data collection, and to Jungwha Lee, PhD, for providing statistical consulting services and performing the statistical analysis.
Disclosure Statement
No competing financial interests exist for either of the authors.
