Abstract
How to more effectively perform anomaly detection of combination information has always been an important issue for the scholars in various fields. In order to identify and extract the geochemical anomaly information related to polymetallic mineralization in the Hunjiang area, this article uses the hybrid method that combines multivariate canonical harmonic trend analysis (MCHTA), singularity analysis with radius-areal metal amount and improved adaptive fuzzy self-organizing map (IAFSOM). First, multiple sets of combination feature information with multi-dimensional variables will be obtained through the MCHTA method, which information is considered as the initial information for the subsequent analysis. Next, the singularity analysis method is used to process the combination concentration value to calculate the singularity indexes. Finally, the singularity indexes are classified by the IAFSOM method, and nine groups of sample data are obtained. The analysis results found that the samples information in fourth group covered most of the low α-values. The main conclusions in this study are as follows: (1) The MCHTA method can effectively detect the combination information related to geochemical anomaly; (2) The application of singularity analysis method with radius-areal metal amount can reveal the significant characteristics of mineralization combination elements; (3) IAFSOM can be used as an effective tool for the classification and identification of geochemical anomaly with combination information; (4) the hybrid method that combines MCHTA method, singularity analysis and IAFSOM model has a good indication significance in the prospecting of geochemical anomalies, and could provide a good method for geochemical prospecting.
Keywords
Introduction
In the big data era, as the rise of artificial intelligence, machine learning research and their broad industry application prospects, the development of machine learning methods has also brought new opportunities for anomaly detection [4, 47]. Because machine learning methods do not rely on data distribution, also do not require linear correlation between the variables and predicted values, and have a strong ability to characterize non-linear relationships, which have been successfully applied in many fields, such as image processing, semantic recognition, mineral resources exploration, etc.
All the time, exploration geochemical methods have played a very important role in the practice of mineral exploration [5, 53]. By studying the distribution and concentration change rules of elements in the nature materials (rock, soil, sediments, water and so on), mining anomalies in the geochemical measure data that deviate from the normal geochemical model to discover the prospecting information [34]. Geochemical anomaly recognition (or divide background and anomaly) methods [2, 54] have experienced a development process from simple to complex, from extensive to fine, from single element to multiple elements integrated anomaly recognition, and from mathematical statistics to pattern recognition [3, 37]. The fluctuations in the content of regional geochemical elements are the result of the long-term interaction of many processes such as primary geological processes, secondary geological processes, and human activities [19, 50]. It is affected by many factors such as regional geological background, mineralization type, mineralization intensity, weathering intensity, and overburden. Especially in the complex geological background, the superposition of these influencing factors makes the geochemical background and the mineralization information mixed, which brings difficulties to the identification of the mineralization information in the geochemical data, and causes the poor effect of traditional data processing methods.
On the other hand, the years of accumulation in regional geochemical surveys have yielded a large amount of regional geochemical data with high-quality, multi-elements. The richness and completeness of geochemical data provide powerful data support for machine learning in the applications of geochemical anomalies identification. With the dual support of application technology and rich data, more and more machine learning methods have been applied to multivariate exploration geochemical data processing and anomaly identification, such as neural networks [46], Support vector machines [1, 52], Bayesian networks, random forests [18, 44], restricted Boltzmann machine, extreme learning machines, etc., and achieved better results [48– 51, 56]. Machine learning methods has stronger applicability than traditional statistical methods and can better describe the multiple relationships and complex non-linear relationship between mineralization points and evidence elements. At the same time, machine learning methods are not only limited to discover background information, but also good at mining background patterns of geochemical elements. These advantages can obtain the better comprehensive forecasting effect. In view of these advantages, using machine learning methods to carry out multivariate data mining has become one of the important research directions [41, 54].
In this article, the hybrid method that combines multivariate canonical harmonic trend analysis (MCHTA) method, singularity analysis with radius-areal metal amount and IAFSOM method were proposed for the anomaly detection of combination information. Combination information is usually more significant for anomaly detection rather than individual information, which is the advantage of this hybrid method. The research in this article could provide the well method and research direction for geochemical prospecting.
Methods
Multivariate canonical harmonic trend analysis model (MCHTA)
For n samples obtained in the study area, suppose each sample has m variables. Original data matrix is X = (β ij ) n×m = (β1, β2, ⋯ , β m ), α1 is variable coefficient.
In Equation 2, v is the canonical variable of the harmonic trend of the coordinates; k and l are, respectively, the harmonic order number in the x direction and y direction; M and N are, respectively, the highest harmonic order number in the x direction and y direction; a
kl
, b
kl
, c
kl
, d
kl
are harmonic coefficient; L and H are half of the sampling length, respectively. Among the following terms, suppose:
The canonical coefficients of u and v satisfy:
From the perspective of geological application, especially the mineralization process related to hydrothermal activity presents the characteristics of multi-stage repetition, and each mineralization may lead to the enrichment or depletion of trace elements in the rock. The superimposition of mineralization with multiple phases in space eventually leads to the spatial distribution pattern of trace elements with multifractal distribution characteristics [10– 13, 55]. Therefore, the multifractal model can be used to describe the spatial distribution characteristics and enrichment rules of trace elements related to mineralization. Because the singularity process leads to the huge release of energy or the enrichment of matter in a small time or space, the ore-forming process as a singularity process leads to the huge accumulation and enrichment of useful minerals in the ore body. Often the local spatial structure patterns of element concentrations have a strong similarity, which helps to identify anomalies and can be quantitatively estimated through singularity analysis. This singularity can be described by fractal or/and multifractal theory, as the following expression:
Assume that μ (ɛ) is a quantity or field based on the scale ɛ, which is the total amount of metal within area ɛ. ρ (ɛ) is the density of metal within area ɛ, and α is termed the singularity index.The singularity index α defined through the multifractal theory can be used to measure the local singularity. The local singularity analysis actually measures the field strength in the fractal space to determine the fractal density (ρ) and the fractal dimension (α): When α < 2, it means that elements enrichment due to mineralization in the area, and element density increases as the distribution range decreases. When α < 2, it means that element depletion due to mineralization, and the element density decreases as the distribution range decreases. When α ≈ 2, it means that the mineralization has little effect on the area, the element density does not change significantly, that is, there is no geochemical singularity [10–13].
The following algorithm is used to implement this method:
Fuzzy self-organizing mapping (FSOM) was proposed by Bezdek in 1994, also known as FKCN (fuzzy Kohonen clustering network) [22, 29]. This algorithm overcomes some of the existing shortcomings of the SOM algorithm. The FSOM network structure is a neural network with only two layers of input and output layer, which is the same as the SOM network structure (Fig.1). It is except that the output layer of FSOM is a one-dimensional or two-dimensional grid arranged by fuzzy neuron nodes. Each input mode has a weighted connection with output layer neuron node. The number of input mode nodes is equal to the dimensionality of the sample points. In particular, the number of fuzzy neuron nodes in the output layer is equal to the number of grid nodes initially given.
IAFSOM is based on the FSOM method to use truncation threshold and fuzzy convergence operators to define the adaptive learning efficiency, which speeds up the velocity of the weight of the neuron node close to the actual class centres. IAFSOM not only speeds up the convergence velocity of the network, but also improves the accuracy of clustering. IAFSOM calculation steps are as follows (Fig.2):
Assuming that there are input patterns with n samples initially, each pattern is x j = { xj1, xj2, …, x jm } , j = 1, 2, 3, ⋯ , n .
The initial iterations number is t = 1, the maximum iterations number is T max . The weight of the randomly initialized neuron node is w i = { wi1, wi1, …, w im } , wi1 ∈ (0, 1) , i = 1, 2, 3, ⋯ , c.
If the termination algorithm is satisfied; otherwise, jump to Step2 and make t = t+1.
Geological setting
Hunjiang County in the Baishan region located in the centre of Changbai Mountains, which is in the southeast of Jilin Province. Baishan region is an important production base for raw materials such as steel, energy, uranium, rich iron ore, high-quality manganese ore, copper, lead, zinc, tungsten, tin, bismuth, molybdenum, antimony, gold, niobium, tantalum, geothermal, and mineral water. There are two main groups of fault structures in the Baishan area: an NNE fold structure and NW fault structure. The outcrops in this region are mainly Archean and Proterozoic, and the parts are the Paleozoic and Cenozoic strata. The magmatic rocks contain early Yanshan quartz diorite, porphyritic biotite granite, diorite porphyrite, and quartz porphyry veins. Regional minerals include magnetite, gold, lead, zinc, and hematite. At present, Banshigou iron ore, Hangou iron iron, Yaolin talc ore, Badaojiang limestone ore, etc. are being developed. Abundant natural resources and the favourable geographical location provide excellent development advantages for the Baishan region. Therefore, this region has been researched and explored by many geologists and geological institutes. The study area in this paper is located at E126°06′49″– 127°00′00″, N41°21′40″– 42°00′00″. The regional geological sketch map is shown in Fig. 3.

Self-organizing mapping neural network structure.

Training flowchart of IAFSOM.

The regional geological sketch map.
A total of 900 samples were collected in the whole study area, the sampling density is about 1 sites per 4 km2. The concentrations of the nine elements including Ag, Au, Co, Cu, Mn, Mo, Pb, Zn and Fe2O3were analyzed. This study conducted pretreatment for all the samples, which includes eliminating invalid data values, after which 895 samples remained. Table 1 shows the statistics summary results for eliminating invalid data. Table 2 shows the statistics summary results of the valid samples.
Statistics summary results for eliminating invalid data
Statistics summary results for eliminating invalid data
Statistics summary results of the valid samples
Note: The relative content unit: constant element is w t%, Au and Ag are×10–9, and other elements are×10–6.
Dimensionality reduction
The purpose of this research is to comprehensively apply the hybrid method of MCHTA method, singularity analysis method with radius- areal metal amount and IAFSOM method to conduct the anomaly detection of combination information in the study area. It is usually believed that geochemical anomalies of multi-dimensional elements are often more significant to indicate the mineralization rather than those of individual elements.
Firstly, the MCHTA method was utilized to extract combinations information related to iron polymetallic mineralization from multivariate geochemical data. Table 3 shows statistical parameters of elements combinations. Table 4 and Figure 4 shows the loadings of MCHTA method with nine elements.
Statistical parameters of elements combinations
Statistical parameters of elements combinations
Note: 1 in the test is significant, 0 is not significant.
Loadings of MCHTA method with nine elements

Loadings maps of MCHTA method with 9 elements.

Map showing target areas delineated for the hybrid method of MCHTA method, singularity analysis and IAFSOM method. (Red dots are the discovered iron polymetallic deposits).
Table 4 illustrates the relative importance of the five combinations, the five combinations almost account for 65% of the total variance. Among, C1 carries more information, C1 is interpreted as a Fe, Co and Mn combination; which may represent high temperature elements combination related to iron polymetallic mineralization. In addition, after mutual verification with the improved weighted principal component analysis (IWPCA) [4, 28], the results of IWPCA method are generally similar to MCHTA method.
Therefore, in this study, it can be regarded as that the MCHTA method is effective for the anomaly detection of combination information.
Because multi-dimensional elements carry more information than single element, in order to enhance local geochemical anomaly of C1 combination, then, the singularity analysis method with radius-areal metal amount was utilized to process the concentrations values of this combination elements for calculating singularity indexes. Here, the singularity indexes are used to select best group in the next classification. Table 5 shows descriptive statistics of α-values.
Descriptive statistics of α-values
Descriptive statistics of α-values
Finally, the IAFSOM method was used to classify and identify the multi-elements related to geochemical anomalies. In IAFSOM modelling, the parameters will affect the accuracy and efficiency of the training results, and even led to a substantial reduction in various indicators. In the parameter setting of IAFSOM model, the two parameters (the number of training steps and the grid size in the IAFSOM layer) have a greater impact on the experimental results. In order to improve the accuracy of the classification results and obtain the optimal parameters, different parameter are used to obtain and observe the experimental results in this study.
In this experiment, the optimal parameters of the IAFSOM training model are that the learning rate is between 0.5 and 0.8×10–3, the clustering radius is between 0.2 and 2.0, the number of iterations is 10,000, and the IAFSOM layer size is set to 5×5. The IAFSOM model finally divides the α-values into nine groups. Table 5 shows the classification results of α-values. The average and median values of the samples in the fourth group are lower than 2, and the distribution of the samples in the fourth group contains almost 88.89% of the discovered mineral points in the study area, shown in Fig.5. These results show that the samples information in fourth group better reflects the enrichment of iron polymetallic mineralization.
In this study, the hybrid method that combines multivariate canonical harmonic trend analysis (MCHTA) method, singularity analysis with radius-areal metal amount and IAFSOM method were proposed for the anomaly detection of combination information. Combination information is usually more significant for anomaly detection rather than individual information, which is the advantage of this hybrid method.
First, the MCHTA method is used for dimensionality reduction and feature extraction, and the C1 combination related to Fe polymetallic mineralization was detected. Next, the singularity index α-values with the elements combination were obtained through the singularity analysis method. Finally, the fourth group obtained by IAFSOM method detected the geochemical anomalies related to iron polymetallic mineralization.
The following conclusions can be drawn from this research: (1) The MCHTA method can effectively detect the combination information related to geochemical anomaly; (2) The application of singularity analysis method reveals the remarkable characteristics of the Fe, Co and Mn elements combination related to the mineralization; (3) The IAFSOM method can be used as an effective tool to classify and identify geochemical anomalies related to iron polymetallic mineralization; (4) Combining the MCHTA method, singularity analysis and IAFSOM method has certain guiding significance for the further prospecting work in this area, and it can be used as a good method for geochemical prospecting.
With the rapid development of artificial intelligence, the complex optimization and simulation for deep learning and big data analysis will be the focus of research work. In the future research work, we will use the power of artificial intelligence to solve complex problems in the real world and promote the intelligent development in all fields of society.
Contributions by authors
C.M. and L.L. assisted with the study design and interpretation of the results, Z.Y. improved algorithm, C.M. compiled and wrote the paper.
Footnotes
Acknowledgments
This research was supported by the National Natural Science Foundation of China under Grant No. 41802245.
