Abstract
Objectives:
Chronic atrophic gastritis (CAG) is the precancerous stage of gastric carcinoma. Traditional Chinese Medicine (TCM) has been widely used in treating CAG. This study aimed to reveal core pathogenesis of CAG by validating the TCM syndrome patterns and provide evidence for optimization of treatment strategies.
Design:
This is a cross-sectional study conducted in 4 hospitals in China. Hierarchical clustering analysis (HCA) and complex system entropy clustering analysis (CSECA) were performed, respectively, to achieve syndrome pattern validation.
Results:
Based on HCA, 15 common factors were assigned to 6 syndrome patterns: liver depression and spleen deficiency and blood stasis in the stomach collateral, internal harassment of phlegm-heat and blood stasis in the stomach collateral, phlegm-turbidity internal obstruction, spleen yang deficiency, internal harassment of phlegm-heat and spleen deficiency, and spleen qi deficiency. By CSECA, 22 common factors were assigned to 7 syndrome patterns: qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, and yin deficiency.
Conclusions:
Combination of qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, and yin deficiency may play a crucial role in CAG pathogenesis. In accord with this, treatment strategies by TCM herbal prescriptions should be targeted to regulating qi, activating blood, resolving turbidity, clearing heat, removing toxin, nourishing yin, and warming yang. Further explorations are needed to verify and expand the current conclusions.
Introduction
Chronic atrophic gastritis (CAG) is an inflammatory disease of gastric mucosa from various etiologies, which is defined as the precancerous stage of gastric carcinoma (GC). 1 –3 Global cancer statistics for 2012 estimated that there were nearly 1000,000 new GC cases reported worldwide annually, the incidence and mortality of which also have been increasing in China. 4 The transition from CAG to GC is acknowledged as a typical disease model of uncontrolled inflammation leading to malignant transformation, the treatment of which is costly to individuals. 5,6
Active treatment of CAG arrests the transition from gastric mucosal atrophy to GC. Based on the theory of holism and time-honored principle of syndrome pattern differentiation, Traditional Chinese Medicine (TCM) has been widely applied in treating digestive disorders among Chinese people and has distinguished itself with satisfying cost and efficacy since ancient times. Modern-day TCM practitioners remain performing syndrome pattern differentiation to identify the pathogenesis in relation to CAG and thus treat the root.
The TCM concept of four diagnostic information subsumes manifold signs and symptoms. It is collected systematically in syndrome pattern differentiation process from the four diagnoses (symptoms, physical signs, tongue appearance, and pulse reading) and is regarded to have implications for disease pathogenesis. After assessing all the information collocated, the potential link between pathologic nature and disease in the current stage can be determined by TCM practitioners. Herbal prescriptions can then be formulated in accordance with conclusions drawn from the syndrome pattern differentiation process.
There has been growing literature demonstrating the notion that TCM herbal prescriptions can treat CAG effectively, including relieving symptoms, inhibiting mucosal inflammation, and reversing atrophy. 7 –13 However, no consensus exists across these studies on syndrome patterns and treatment strategies. 14 –16 The authors conducted the present study to ascertain core pathogenesis of CAG by validating the TCM syndrome patterns. It will provide evidence and potential translation for optimization of treatment strategies from the perspective of TCM.
Clustering analysis is one of the most important research fields of exploratory data mining. It has been used in many fields as a kind of unsupervised learning methods. In this study, syndrome pattern validation was achieved using exploratory factor analysis (EFA) based hierarchical clustering analysis (HCA) and association rule analysis (ARA) based complex system entropy clustering analysis (CSECA). Original TCM four diagnostic variables contain specific combinations and complex transactions of disease features, which have significantly higher dimensional characteristics. 17,18 The abovementioned unsupervised analytic methods can detect syndrome features by achieving multicollinearity elimination and dimensionality reduction. 19,20 Using EFA and ARA, large numbers of variables can be remodeled as combinations of linear and simpler common factors (variables with specific combination of features), which lay the foundation for further clustering analysis. HCA and CSECA can then be applied, respectively, based on the common factors revealed in previous steps to develop dimensional taxonomies and detect the relationships among variables.
Materials and Methods
Study population
This cross-sectional study was conducted in 4 medical centers in China, including Dongzhimen Hospital Affiliated to Beijing University of Chinese Medicine; Peking Union Medical College Hospital Affiliated to the Chinese Academy of Medical Sciences; Beijing Hospital Affiliated to the National Health and Family Planning Commission of China; and Xiamen Traditional Chinese Medicine Hospital. The study began in September 2010 based on participant survey and ended in October 2012. Patients 20–75 years old, who visited the aforementioned hospital for upper gastrointestinal endoscopy, met diagnostic criteria of CAG, and signed informed consent, were recruited.
Ascertainment of CAG
Diagnostic criteria of CAG were based on guidelines issued by Chinese Medical Association in 2012. 2
Case report form
Case report form (CRF) was designed based on literature research, expert advice, and clinical guidelines. 21 –23 CRF content included general information (name, age, sex), disease feature and history (chief complaint, history of present illness, past medical history, family medical history), modern medicine diagnosis, TCM diagnosis, the TCM four diagnostic information, and copy of upper gastrointestinal endoscopy report. To reduce measurement bias, all the investigators who filled in the CRFs were trained in standard operating procedures (SOP). Each study participant was examined and followed up by at least two resident physicians or graduate students. At least two senior staff physicians supervised interview sessions to ensure consistency and authenticity of data collection.
Data analysis
HCA and CSECA were performed on data collected from the four diagnostic methods, based on EFA and ARA, respectively, to achieve syndrome pattern extraction (Fig. 1). All analyses in this study were processed by SPSS Statistics software (version 17.0; SPSS, Inc., Chicago, IL) and SPSS Clementine software (version 12.0; SPSS, Inc.).

Flow chart of clustering analysis based CAG syndrome pattern extraction. CAG, chronic atrophic gastritis.
Results
Characteristics of participants
One hundred thirty-five CRFs were distributed, and a total of 131 were gathered. One hundred twenty forms deemed eligible after eliminating 15 CRFs with incomplete information were answered. Of the 120 eligible participants, 58 were female and 62 were male, with an average age of 52.56 years. All the four diagnostic variables detected were preanalyzed, and 40 variables with more than 21% frequency of occurrence were chosen for subsequent analysis. The abovementioned variables were tabulated based on the distribution frequency (Table 1).
Four Diagnostic Frequency for Chronic Atrophic Gastritis (N = 120)
EFA based HCA for syndrome pattern extraction
EFA based assessment of common factors
Before analysis, the Kaiser-Meyer-Olkin test and Bartlett's test of sphericity were used, respectively, to assess suitability of collocated diagnostic variables for EFA. Principal component analysis was then performed to extract common factors from the original variables. Through consultation with TCM experts, the diagnostic variables, nature of disease, and disease location obtained from the four diagnostic information were ultimately assigned to 15 common factors (Table 2).
Exploratory Factor Analysis Based Common Factor Extraction and Their Corresponding Four Diagnostic Variables, Disease Nature, and Disease Locations
HCA based extraction of syndrome patterns
Based on HCA, 15 common factors were combined and 6 syndrome patterns were extracted. After consultation with experts, name of syndrome patterns, nature of disease, and disease locations were assigned (Fig. 2 and Table 3). The six syndrome patterns extracted by HCA were: (1) Liver depression and spleen deficiency and blood stasis in the stomach collateral; (2) Internal harassment of phlegm-heat and (blood) stasis in the stomach collateral; (3) Phlegm-turbidity internal obstruction; (4) Spleen yang deficiency; (5) Internal harassment of phlegm-heat and spleen deficiency; and (6) Spleen qi deficiency.

Tree diagram of HCA based syndrome pattern extraction. HCA, hierarchical clustering analysis.
Hierarchical Clustering Analysis Based Syndrome Pattern Extraction and Their Corresponding Four Diagnostic Variables, Disease Nature, and Disease Locations
ARA based CSECA for syndrome pattern extraction
Based on ARA and CSECA, 22 common factors were combined, and 7 syndrome patterns were extracted. After consultation with experts, name of syndrome patterns, nature of disease, and disease locations were assigned (Tables 4, 5). Core combinations of four diagnostic variables were also revealed (Fig. 3). The 7 syndrome patterns extracted by ARA and CSECA were: (1) qi deficiency; (2) qi stagnation; (3) Blood stasis; (4) Phlegm turbidity; (5) Heat; (6) yang deficiency; and (7) yin deficiency.

Network diagram of core combinations of four diagnostic variables.
Association Rule Analysis and Complex System Entropy Clustering Analysis Based Common Factor Extraction and Their Corresponding Four Diagnostic Variables, Disease Nature, and Disease Locations
Association Rule Analysis and Complex System Entropy Clustering Analysis Based Syndrome Pattern Extraction and Their Corresponding Disease Nature, Disease Locations, Common Factors, and Four Diagnostic Variables
Discussion
The holistic theory that TCM is based on determines the complex multivariate nonlinear relationship of the variables of the four diagnostic information. TCM syndrome can be regarded as a giant complex system with temporal dynamics and nonlinear high-dimensional interfaces. The fundamental concept of EFA based HCA is projecting high-dimensional TCM clinical information onto a lower plain. In the present study, HCA was first carried out based on EFA to measure the intimacy of variables (common factors) under a certain definition of “distance.” 24 –27 In this classical process, the number of clusters should be set artificially by data analyst and TCM practitioners. Ultimately, common factors with sufficient similarity were assigned in the same clusters through dimensionality reduction, by which syndrome patterns of CAG can be ascertained.
Complex systems represent new perspectives and solutions for investigating how relationships between variables contribute to the collective behaviors of a system and how the system interacts with each variable. 19,20 As one of the typical complex systems, the TCM concept syndrome is composed of various clinical information sets. Entropy can be applied in measuring the uncertainty and variety of information of a self-adjusted complex system in different dimensions and clusters precisely. 19,20 Because of the better adaptability for measuring the correlation between multidimensional and nonlinear variables compared with EFA based HCA, ARA based CSECA was also operated on data mining of TCM syndrome related variables in this study. 28,29
According to TCM theory, syndrome pattern is considered the most important unit for evaluating pathogenesis. 17,18 Results of emerging researches have not been consistent on CAG syndrome features, especially in disease nature, disease location, and differentiation of primary syndrome patterns, which indicate the complexities of pathogenesis. 19 –23 However, it may also be due to a lack of established reliability of observations and judgments in previous studies. It is critical to provide a much more detailed assessment by their present study to explored core syndrome patterns of CAG by applying HCA and CSECA. All information the common factors, syndrome patterns, disease location, and disease nature belonged to were appraised and determined based on professional knowledge.
HCA showed a total of 15 common factors composed of 6 syndrome patterns (liver depression and spleen deficiency and blood stasis in the stomach collateral, internal harassment of phlegm-heat and blood stasis in the stomach collateral, phlegm-turbidity internal obstruction, spleen yang deficiency, internal harassment of phlegm-heat and spleen deficiency, and spleen qi deficiency). CSECA showed a total of 22 common factors assigned to 7 syndrome patterns (qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, yin deficiency). Both HCA and CSECA results indicated that the TCM concept disease nature of CAG was a combination of qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, and yin deficiency, as well as disease location in the stomach, spleen, and liver.
Compared with HCA, CSECA results were more concordant with previous findings. 19 –23 CSECA suggests that combination of qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, and yin deficiency may form the core pathogenesis of CAG. Therefore, they should be taken into account during clinical treatment as the crucial diagnostic elements. The target organs of treatment should be the stomach, spleen, and liver. Treatment strategy by herbal prescriptions should hence be targeted to regulating qi, activating blood, resolving turbidity, clearing heat, removing toxin, nourishing yin, and warming yang. However, in terms of HCA results, yin deficiency was not included in the disease nature of CAG and, thus, may not be the most vital syndrome pattern in CAG patients. According to the seemingly contradictory results observed, their findings add support to the notion that herbs that nourish yin may receive less prominence or be prescribed in lesser amounts than herbs that focus on other abovementioned aspects when formulating a prescription, but it should not be totally underestimated when treating CAG.
The present study design has several strengths and limitations of note. Multidimensional and full-scale information was gathered in this research with strict quality control, for which the authors may adapt in multivariate data mining models. In addition, the whole study process is closely monitored and controlled with all the investigators trained in SOP and, therefore, has relatively high data quality. Several limitations are also noteworthy. Selection bias may exist, so that conclusions may be generalized only to CAG patients in TCM hospitals in Beijing and, therefore, are likely not representative of the disease features in patients in modern medicine hospitals and the rest of China. Moreover, their study is designed with a relatively small sample size. Given the limitations of their study, the current findings need to be repeated with a larger population and with guaranteed inter-rater reliability of all observations and judgments before the results can be accepted in clinical practice. Further explorations through multicentric and full-scale studies are needed to verify and expand the conclusions.
The study has demonstrated the suitability of using the new analytic methods as tools for development of TCM syndrome patterns and associated treatments of CAG. In terms of data mining methodology, there are also some potential advantages and limitations to be addressed. The results from both HCA and CSECA need to be validated and expanded with other supervised data mining methods. In HCA, each TCM variable has merely one chance to be assigned in a single common factor, thus resulting in an inability to describe multiple correlations between variables. Compared with conventional models, HCA can achieve rules more objectively and avoid the interference of noise variables to some extent. However, aforementioned flaws may still result in failure to describe the internal property and external relevance efficiently in this dimension-reducing process. In addition, a series of parameters must be set artificially in HCA, the analyst blinding can hardly be achieved. 26 In terms of CSECA, strengths include good algorithmic adaptation for describing relationship among multidimensional and nonlinear TCM variables. Flexible data presplitting in CSECA allows each variable a chance to be assigned in different common factors repeatedly, which is more in line with TCM theory. 30 –32 Application of additional methods is needed to address the broader problem of generalizability in future studies.
Conclusions
HCA and CSECA are important methodologies for clustering multidimensional and multilinear TCM syndrome-related variables, by which evidence for both pathogenesis and core treatment principles of CAG can be revealed. Their findings support that the TCM concept pathogenesis of CAG may be a combination of qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, and yin deficiency. Treatment of CAG by herbal prescriptions should therefore focus on regulating qi, activating blood, resolving turbidity, clearing heat, removing toxin, nourishing yin, and warming yang. Further explorations are needed to verify and expand the current conclusions.
Footnotes
Acknowledgments
Generous support for this study was provided by the National Natural Science Foundation of China (No. 81630080, 91129714, 81270466, 81173424, 81673793, and 81373796) and Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20120013110014). Yin Zhang was additionally supported by National Undergraduates Innovating Experimentation Project of the China Ministry of Education (No. 081002609) and Science Research Foundation of Beijing University of Chinese Medicine (No. 2014-JYBZZ-XS-134).
Xia Ding and Yin Zhang conceived the study conception and design. Yin Zhang, Yue Liu, Li Zhang, Shiyu Du, and Daming Liu trained investigators on SOP of this study. Yannan Li, Zeqi Su, and Cen Chen performed methodology development and data analysis. Yin Zhang, Yue Liu, Yannan Li, Xia Zhao, and Lin Zhuo performed writing and revision of the article with contributions from all other authors. All authors carried out the clinical investigation and approved the final article. Yin Zhang, Yue Liu, and Yannan Li contributed equally to this work. The authors thank all the participants for their contribution to the research.
Author Disclosure Statement
No competing financial interests exist.
