Abstract
Background:
A reliable manual examination has not been validated as a diagnostic tool for nociplastic visceral pain.
Aims:
To establish the interrater reliability of a manual examination protocol for functional pancreatic visceral dysfunction and the clinical criteria for a manual palpatory diagnosis based on the clinical features of the nociplastic visceral pain.
Methods:
This double-blind cross-sectional study involved 60 participants assessed by three raters using a manual protocol for diagnosing functional pancreatic visceral dysfunction. Five palpation-based criteria were evaluated: (1) local pain, (2) referred pain, (3) neurovegetative symptoms, (4) hyperalgesia or allodynia, and (5) tissue resistance/density. Interrater agreement was measured using percentage agreement and Fleiss’ kappa. The reliability of the Verbal Numerical Rating Scale (vNRS) was assessed using the intraclass correlation coefficient (ICC). Repeated measures analysis of variance and Cochran’s Q test (with Bonferroni correction) were used to analyze vNRS scores and categorical outcomes, respectively. Significance was set at p < 0.05.
Results:
Criteria 1, 3, and 4 showed particularly high levels of agreement, with overall agreement percentages of 93.3%, 95.6%, and 95.6%, respectively. The corresponding Fleiss’ kappa values were 0.863, 0.880, and 0.908, indicating almost perfect agreement. In contrast, Criteria 2 and 5 demonstrated substantial, but comparatively lower, agreement, with overall percentages of 86.7% and 87.8% and Fleiss’ kappa values of 0.679 and 0.755. The vNRS demonstrated excellent reliability across all three pancreas regions, with ICC values well >0.90: head (ICC = 0.943, 95% confidence interval [CI] = 0.913–0.964), body (ICC = 0.950, 95% CI = 0.923–0.968), and tail (ICC = 0.963, 95% CI = 0.944–0.977).
Conclusions:
Three blinded raters reached an almost perfect pair-wise interrater agreement on the presence or absence of functional visceral dysfunction in the topographic projection of the pancreas. This study provides preliminary evidence that a manual diagnostic protocol is a reliable and potentially useful diagnostic tool in diagnosing nociplastic pain in the topographic projection of the pancreas. Future research should prioritize evaluating the validity of the nociplastic visceral pain diagnosis.
Introduction
Visceral referred pain results from visceral nociception, where the pain is referred to somatic tissues.1,2 Sensitization of sensory afferents constitutes a significant underlying mechanism for visceral hypersensitivity and hyperalgesia. 3
A novel terminology for pain conditions characterized by increased nociception without clear tissue damage was proposed in 2016. 4 The International Association for the Study of Pain (IASP) introduced the term “nociplastic pain” as a third mechanistic pain descriptor alongside nociceptive and neuropathic pain. 5 Nociplastic pain is another term for functional pain. 6 The clinical criteria and grading system for nociplastic pain affecting the musculoskeletal system have been recently described. 7 However, the proposed criteria are intended for identifying chronic nociplastic pain (lasting at least 3 months). They are not applicable in cases of acute pain or patients presenting with sensitization without self-reported pain. In such cases, a palpation protocol could prove highly beneficial.
Regarding the clinical characteristics, gastrointestinal pain is diffusely located, accompanied by autonomic symptoms, and associated with decreased pain thresholds (hyperalgesia).2,6
Studies on nociplastic pain have employed advanced methodologies to accurately identify the associated dysfunctions. Nevertheless, these methodologies may not always be accessible for clinical application or in all research environments. 7 Research is needed to examine the reliability and validity of the different clinical criteria for nociplastic pain. 5
Visceral osteopathy techniques have gained widespread adoption among osteopaths and are integral components of osteopathy training programs worldwide.8,9 The World Health Organization incorporated visceral techniques into its benchmarks for teaching osteopathy in 2010. 10
In the United Kingdom, 2.6% of osteopaths, compared with 28% of Spanish osteopaths, reported using visceral techniques during the first consultation. In Portugal, 23% of osteopaths frequently incorporate these techniques in clinical practice.11–13 Despite their widespread use, evidence supporting the reliability and efficacy of visceral osteopathy remains limited. 8
Clinical guidelines recommend physical examination to rule out the possibility of visceral referred pain in individuals with neck pain. 14 Nevertheless, most clinical trials assessing neck pain lacked the consideration of visceral referred pain in their eligibility criteria. 15
Despite the lack of a gold standard and the clear clinical criteria for nociplastic pain established by the IASP, a reliable manual examination, grounded in the clinical features of nociplastic pain, has not been validated as a diagnostic tool for nociplastic visceral pain in controlled studies.
This article aims to establish (1) the interrater reliability of a manual examination protocol for functional pancreatic visceral dysfunction and (2) the interrater reliability of the clinical criteria for a manual palpatory diagnosis based on the clinical features of the nociplastic visceral pain.
Methods
Design
Double-blind cross-sectional study.
Sample
Individuals aged ≥18 years who had provided signed informed consent and did not meet any exclusion criteria.
Individuals were excluded from the study if they had a history of previous abdominal surgery, peptic ulcer, previous or present gastrointestinal cancer, systemic or neurological diseases, pregnancy, or recent trauma. 16
Sixty volunteers were recruited from a consecutive sample of patients diagnosed with pain referred by their physicians to a private physical therapy practice, from the family members of these patients, and first- and second-year students from the Osteopathy School of Madrid.
To mitigate rater bias, this study included individuals with and without pain. 17 Before the examination, each subject completed a questionnaire about their clinical history to identify possible exclusion criteria.
Ethical considerations
All participants signed an informed consent to participate in the study.
The study was approved by the Research and Ethics Committee of the School of Health, Polytechnic Institute of Bragança (P521566-R634901-D1919267), and followed the ethical standards in human experimentation contained in the World Medical Association Declaration of Helsinki. 18
Raters
Three raters participated in the study. Two were experienced osteopaths with over 10 years of practice, while the third had graduated just a year prior but had >6 years of experience in manual therapy.
Before the study, all three underwent training and conducted assessments on several subjects not participating in the subsequent research to minimize discrepancies in the interpretation of physical findings related to visceral dysfunction, until unanimity was achieved among the examiners.
To assess pancreatic functional dysfunction, the examiners collectively agreed to apply a pressure level they judged would not be painful in a normal situation. Sufficient but firm pressure was applied to elicit the desired features or to confirm their absence. Each diagnostic criterion was categorized dichotomously as either “A” (absence) or “P” (presence). Ultimately, a decision was made to categorize pancreatic functional visceral dysfunction as either positive or negative.
The sequence in which each examiner conducted the test was determined using the “Research Randomizer” program. The first evaluator explained the study protocol to the patient and provided additional explanations if it remained unclear.
To ensure a double-blind design, participants were not informed of their results and were instructed not to discuss their diagnoses. Evaluators were blinded to the participants’ clinical histories and group affiliations (patients, family members, or students), and they did not share information with one another. 19
The variable being tested in a reliability study may change due to the application of the test. 20 Consequently, the palpation process was conducted for the shortest duration required to ascertain clinical characteristics. To mitigate accommodation effects and ensure a consistent procedure across all participants, a standardized interval of 5 min was established between assessments by different evaluators.
Testing procedure
The patient was comfortably positioned on the table in a supine position, with the lower limbs flexed and the abdomen exposed.
The evaluator positioned himself on the right side of the patient. With overlapping hands, the clinician applied firm and gradual pressure with the palmar regions of the index, middle, and ring fingers in the epigastric region, on the topographic projections of the head, body, and tail of the pancreas, assessing density, tissue tension, and sensitivity (Fig. 1). 21 Additional information on palpatory anatomy can be found elsewhere. 22

Five palpation-based diagnostic criteria were established to identify visceral functional dysfunction of the pancreas.
Local pain upon palpation reported by the patient with an intensity of ≥5 on the Verbal Numerical Rating Scale (vNRS) in one or more zones
5
; Referred pain or sensations of discomfort, tension, or paresthesia in an area remote from or contiguous to the palpation site2,6,23; Neurovegetative symptoms (e.g., vagal reflexes, heart and respiratory rhythm alterations, perspiration, palpitations, nausea, headache, shortness of breath/dyspnea) or emotional signs
6
; Hyperalgesia or allodynia (i.e., the patient reaching the maximum pressure pain threshold tolerated before the therapist reaches a pressure level that he considers to be tolerable in the absence of dysfunction); palpation elicits a “jump sign” in the patient2,3,5,7; and
Criteria 2 through 5 were scored dichotomously:
present: if the rater was certain of the presence of the clinical feature and absent: if the rater was certain of the absence of the clinical feature or if the rater was unsure of the presence or absence of the clinical feature.
A diagnosis of functional visceral dysfunction in the topographic projection of the pancreas was considered positive under the following conditions:
when criterion 1 or criterion 2 was individually satisfied and when two or more criteria were concurrently present (regardless of which ones).
The vNRS is a self-report measure of pain intensity. Its application consists of verbally requesting an estimated value of pain intensity using numbers from 0 (no pain) to 10 (maximum pain). 25
This study did not attempt to relate the presence of nociplastic pain to any particular clinical diagnosis or medical condition, focusing solely on the reliability of identifying a clinical feature.
The ambient temperature was maintained at a stable 23°C to create uniform environmental conditions.
Statistical analysis
The statistical analysis utilized IBM SPSS software, version 29, for Windows. 26
Mean and standard deviation (SD) were employed to characterize continuous variables, while frequencies (n and %) were utilized for categorical variables.
The agreement among the three raters regarding the overall diagnosis (positive vs. negative) and for each criterion (present vs. absent) was assessed with the overall percentage of agreement, the Cohen’s kappa, the Light’s kappa, and the Fleiss’ kappa. The classification proposed by Landis and Koch (1977) was used to assess the strength of agreement with kappa statistics: 0 poor, 0.01–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect. 27
The reliability of the vNRS for pain evaluated by the three raters was assessed through the interclass correlation coefficient (ICC [2,3]) for average measures, based on a two-way random effects model with absolute agreement. The following ICC thresholds were used to determine the strength of the interrater agreement: <0.50 poor, 0.50–0.75 moderate, 0.76–0.90 good, and >0.90 excellent. 28 A repeated measures analysis of variance was used to compare the scale scores among the three pancreas regions (head, body, and tail). Cochran’s Q test was used to compare the percentages (<5% vs. ≥5%). Multiple comparison tests with the Bonferroni correction were used in both tests to identify pancreas regions that differed significantly from each other. A significance level of 5% was considered.
Results
Sample
The sample included 60 individuals, mostly females (n = 36, 60.0%), aged between 22 and 79, with a mean age of 40.6 years (SD = 14.9). On average, the individuals weighed 69.2 kg (SD = 12.8), were 169.0 cm tall (SD = 8.9), and had a body mass index (BMI) of 24.1 (SD = 3.0).
Diagnosis
Results in Table 1 show almost perfect diagnosis agreement among the three raters, with a Fleiss’ kappa of 0.812 (95% confidence interval [CI] = 0.807–0.817), a Light’s kappa of 0.812 (Cohen’s kappa ranged from 0.787 to 0.863), and an overall agreement percentage of 91.1%. The three raters agreed on 19 positive diagnoses (31.7%) and 33 negative diagnoses (55.0%). There was no agreement among the three raters in eight cases (13.3%) (Table 1).
Pain Diagnosis (N = 60)
aCohen’s kappa for agreement between the raters two-by-two: R1–R2 agreement between Raters 1 and 2, R1–R3 agreement between Raters 1 and 3, and R2–R3 agreement between Raters 2 and 3.
bAverage of the three Cohen’s kappa.
CI, confidence interval.
Diagnostic criteria
Table 2 presents the overall agreement and kappa statistics for the five criteria evaluated by the three raters. Results show varying levels of agreement across different criteria. Criteria 1, 3, and 4 show particularly high levels of agreement, with overall agreement percentages of 93.3%, 95.6%, and 95.6%, respectively, and Fleiss’ kappa values indicating almost perfect agreement (0.863, 95% CI = 0.859–0.868; 0.880, 95% CI = 0.875–0.884; and 0.908, 95% CI = 0.903–0.912). Furthermore, the Light’s kappa (0.863, 0.880, and 0.908) and the Cohen’s kappa values to assess the agreement between raters two-by-two were all higher than 0.80. Criteria 2 and 5, while still showing substantial agreement, have lower overall agreement percentages (86.7% and 87.8%), Fleiss’ kappa (0.679, 95% CI = 0.674–0.684; 0.755, 95% CI = 0.750–0.759), Light’s kappa (0.680 and 0.755), and Cohen’s kappa values (from 0.649 to 0.708 in Criteria 2; from 0.732 to 0.800 in Criteria 5), indicating substantial agreement (Table 2).
Palpation-Based Diagnostic Criteria (N = 60)
aCohen’s kappa for agreement between the raters two-by-two: R1–R2 agreement between Raters 1 and 2, R1–R3 agreement between Raters 1 and 3, and R2–R3 agreement between Raters 2 and 3.
bAverage of the three Cohen’s kappa.
CI, confidence interval.
Verbal Numerical Rating Scale
Overall, the vNRS demonstrates excellent reliability across all three pancreas regions, with ICC values well >0.90: head (ICC = 0.943, 95% CI = 0.913–0.964), body (ICC = 0.950, 95% CI = 0.923–0.968), and tail (ICC = 0.963, 95% CI = 0.944–0.977). This indicates that the three raters were highly consistent in their assessments of pain. The pain levels were assessed using the mean score from the evaluations of the three raters. Results are presented in Table 3.
Verbal Numerical Rating Scale (N = 60)
a,bThere are no significant differences between pancreas regions with the same superscript letter: p > 0.05 in the multiple comparison tests with Bonferroni correction. SD, standard deviation.
*p-Value of repeated measures analysis of variance.
**p-Value of Cochran’s Q test.
The intensity score differed significantly among the three pancreas regions (p < 0.001). The score was significantly higher in the head (4.09 ± 2.4) and in the body (4.30 ± 2.46) than in the tail (2.98 ± 2.54). Considering a cutoff of 5 points on the scale, 45.0% of the individuals had a score ≥5 in the body, 38.3% in the head, and 26.7% in the tail. The differences were only significant between the tail and the body (p < 0.05) (Table 3).
Concerning item number 2 “Referred pain or sensations of discomfort, tension, or paresthesia in an area remote from or contiguous to the palpation site,” the symptoms most frequently reported by patients were difficulty breathing/shortness of breath, sensation of tension referred to the throat and continuous regions of the abdomen, nausea and heat in the face (data not shown).
Discussion
Interrater reliability refers to the agreement between two or more raters who observe the same phenomenon. Diagnostic accuracy refers to the level of agreement between an index test and a reference standard. However, estimating diagnostic accuracy can be challenging when no established reference standard exists. Reliability is a crucial indicator of the test’s potential accuracy in such cases. 20
This study focuses on identifying the palpatory clinical features of functional visceral dysfunction, rather than on their prevalence in clinical conditions, such as low-back pain or neck pain. Results show an almost perfect agreement in the diagnosis among the three raters.
Criteria 1, 3, and 4 show particularly high levels of agreement, with overall agreement percentages of 93.3%, 95.6%, and 95.6%, respectively, and Fleiss’ kappa values indicating almost perfect agreement (0.863, 0.880, and 0.908). Criteria 2 and 5 presented lower percentages (86.7% and 87.8%) and Fleiss’ kappa values (0.679 and 0.755), indicating substantial agreement. A possible explanation for the fact that criterion 2 (referred pain or sensations of discomfort, tension, or paresthesia in an area remote from or contiguous to the palpation site) has slightly lower reliability than criteria 1, 3, and 4 may be the fact that, as the tissues are evaluated, desensitization may occur. As a result, when the second or third evaluator performs the test, the tissues no longer produce referred pain. Still concerning criterion number 2, subjects frequently reported a sensation of tension in the throat. This is likely due to the phrenic nerve containing sensory nerves from the gallbladder, pancreas, and diaphragmatic peritoneum. 29
Criterion 5 (Abnormal perception of tissue resistance or density) also demonstrated slightly lower overall agreement than criteria 1, 3, and 4. Criterion number 5 required a longer training period to reach a consensus since each evaluator’s subjective perception of “normal tension” and “abnormal tension” differed. To increase the reliability of the test, it was decided to categorize this item as absent whenever the rater was sure of the absence of the clinical feature or if the rater was unsure of the presence or absence of the clinical feature.
The skin overlying the upper abdominal region can be used to assess sensitization of the central pain system by nociceptive input from the pancreas, owing to the convergence of afferent fibers in the spinal cord. Sensitization will manifest as a segmental lowering of the pain threshold to quantitative sensory testing of the skin and deep tissue. 6
Pain evaluation is problematic because there is no uniform pressure that can be applied to all individuals at all times, due to differences in tissue compliance and BMI. In the examination of fibromyalgia tender points, a 4 kg pressure is recommended. 30 For this study, examiners collectively agreed to apply a pressure level they judged would not be painful in a normal situation. Sufficient but firm pressure was applied to elicit the desired features or to confirm their absence. No attempt was made to measure the degree of force applied by the evaluator. Overall, the vNRS demonstrated excellent reliability across all three pancreas regions, with ICC values well >0.90: head (ICC = 0.943, 95% CI = 0.913–0.964), body (ICC = 0.950, 95% CI = 0.923–0.968), and tail (ICC = 0.963, 95% CI = 0.944–0.977). This indicates that the three raters were highly consistent in their assessments of pain.
Using a cutoff of 5 points on the vNRS scale, 45.0% of the individuals exhibited scores equal to or higher than 5 in the body, 38.3% in the head, and 26.7% in the tail of the pancreas. This indicates that the head and body of the pancreas were the most commonly sensitized areas. The sensitization of primary sensory afferents represents a critical underlying mechanism contributing to visceral hypersensitivity and hyperalgesia. 3 The 5-point cutoff on the vNRS scale was selected based on its established use in clinical practice, where researchers observed that most patients perceived stimuli rated ≤4 as nonpainful, making this threshold a practical reference for identifying hyperalgesia.
Prior studies on the reliability of visceral diagnostic tests in osteopathy have demonstrated poor outcomes. Comparing this study’s findings with those of earlier research is complex due to the diverse methodologies employed.
In one of those studies, two trained osteopaths evaluated the intraobserver reliability of a visceral tension test performed in the abdominal region. The results were classified as “normotension,” “hypertension,” and “hypotension.” No intraobserver reliability was found. 31
In a systematic review of the reliability of the diagnosis and the clinical efficacy of techniques used in visceral osteopathy, seven of the eight studies focused on assessing reliability. Among these, three studies examined visceral mobility, all of which yielded unreliable results. 8 Another study used a test of “Global Listening” by feeling the traction of the fascia. Neither an intrarater nor an interrater reliability was found. 32 The remaining four studies were designed to evaluate different outcomes. In one of these studies, two osteopaths evaluated diaphragm tension through the movement of the costal arch, determining whether the tension was greater on one side or equal on both sides. The raters’ judgments did not surpass the level of agreement expected by chance. 33 In a different study, three osteopaths administered the “Sotto Hall test.” The test demonstrated good intraexaminer reliability and moderate to good interexaminer reliability in determining whether the test results were positive or negative. However, the authors found no interrater reliability in identifying the specific viscera in dysfunction. 34 The last study included in this systematic review intended to evaluate the intra- and interobserver reproducibility of different palpation parameters, such as the depressibility of the stomach, sigmoid colon, cecum, and transverse. No evidence for the reliability was found. 8
The results of our study demonstrated an almost perfect agreement in the diagnosis among the three raters. A possible explanation for the poor results in previous reliability studies is that they were based on the subjective perceptions of the evaluators, such as organ mobility or the perception of tension. In contrast, this study relied on five palpation-based diagnostic criteria for identifying visceral functional dysfunction, thus reducing the risk of bias attributed to subjective assessment by the evaluators.
Although this study includes criterion number 5 based on “Abnormal perception of tissue resistance or density,” all other criteria are based on the patient’s perceived pain and symptoms that are easier to reproduce and interpret. In this way, the risk of bias can be substantially reduced.
Conclusions
In this study, three blinded raters reached an almost perfect pair-wise interrater agreement on the presence or absence of functional visceral dysfunctions in the topographic projection of the pancreas. This study provides preliminary evidence that a manual diagnostic protocol is a reliable and potentially useful diagnostic tool in diagnosing nociplastic pain in the topographic projection of the pancreas. Future research should prioritize evaluating the validity of the nociplastic visceral pain diagnosis.
Limitations
In this study, the diagnosis of functional pancreatic dysfunction was made dichotomously (positive vs. negative). However, from a clinical perspective, it would be beneficial in future studies to categorize patients according to the degree of sensitization.
The manual examination protocol for functional pancreatic visceral dysfunction described in this study was designed to diagnose functional dysfunctions in nociplastic pain. It should not be extrapolated to medical conditions such as pancreatitis.
Although the study included a substantial sample, the absence of a formal sample size calculation may be considered a limitation.
One potential limitation of this study is that the exclusion criteria did not account for whether participants had engaged in recent physical activity, particularly involving the abdominal muscles, in the days preceding the evaluation. As a result, some reported pain may have reflected nociceptive or myofascial origins rather than visceral dysfunction.
Clinical Relevance
This study establishes the clinical criteria for a manual diagnostic protocol for functional visceral dysfunction.
Researchers studying nociplastic pain and functional visceral dysfunctions need to define criteria by which the presence of visceral dysfunction is made for their study to interpret the study’s reliability properly.
Studies using osteopathic visceral manipulations must demonstrate that the clinician performing the physical examination can agree on the presence or absence of the clinical features.
Authors’ Contributions
M.F.: Writing—original draft (lead), writing—review and editing (lead), conceptualization (lead), methodology (lead), resources (equal), investigation (equal), and project administration (equal). N.M.: Writing—review and editing (supporting), conceptualization (supporting), resources (equal), investigation (equal), and project administration (equal). M.C.d.S.: Writing—review and editing (supporting), conceptualization (supporting), investigation (equal), and project administration (equal). P.M.P.: Writing—review and editing (supporting), formal analysis (equal), and project administration (equal). T.P.: Writing—review and editing (supporting), formal analysis (equal), and project administration (equal).
Footnotes
Author Disclosure Statement
The authors declare that they have no conflicts of interest.
Funding Information
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
