Abstract
The complexity of the coalface environment determines the non-linear and fuzzy characteristics of the drum adjustment height. To overcome this challenge, this study proposes an adaptive fuzzy reasoning Petri net (AFRPN) model based on fuzzy reasoning and fuzzy Petri net (FPN) and then applies it to the intelligent adjustment height of the shearer drum. This study constructs adaptive and reasoning algorithms. The former was used to optimize the AFRPN parameters, and the latter made the AFRPN model run. AFRPN could represent rules that had non-linear and attribute mapping relationships and could adjust the parameters adaptively to improve the accuracy of the output. Subsequently, the drum adjustment height model was established and compared to three models neural network (NN), classification and regression tree(CART) and gradient boosting decision tree (GBDT). The experimental results showed that this method is superior to other drum adjustment height methods and that AFRPN can achieve intelligent adjustment of the shearer drum height by constructing fuzzy inference rules.
Introduction
In underground coal mining operations, frequent coal mining disasters seriously threaten the lives of coal miners [1, 2]. The harsh mining environment and the tightly correlated production schedules have severely impacted the safety and stability of mining processes [3–5]. The unmanned intelligentization of coal mining is currently an important research direction. The intelligent height adjustment technology of the shearer drum is one of the key technologies that can increase productivity and reduce mortality [6].
A solution is to obtain data from multiple sensors in real time based on memory cutting technology and use machine learning and other models to establish a drum height adjustment controller. Tiemei Y. et al. usesd the wavelet neural network to establish the identifier and controller and employed the time-frequency characteristics of the wavelet transform including the parameter optimization ability of the neural network to learn the relationship between the height change of the drum and multiple variables to precisely control drum movement [7]. Zhongbin W. et al. acquired data collected by the sensor as an individual, and ran the artificial immune algorithm with the drum raising decision as the antibody. The authors achieved the control accuracy to fulfill the production needs [8]. Li W. et al. considered the change in rock strata as a Markov process, and combined the Markov matrix and the gray prediction theory to predict the height of the next sampling point based on the memory sampling point [9]. Si L. et al. proposed an intelligent multi-sensor data drum height adjustment method based on the parallel quasi-Newton neural network and the Dempster–Shafer theory [10]. This method improved the accuracy of the judgment result of the cutting state and reduced the judgment time. It proved that performance was more dependent on the recognition results from multiple sensors. Although the coupling of multiple sensor data can effectively increase the robustness of the controller, improvements in robustness and accuracy that can be introduced by sensor data are limited. This is because the height adjustment methods based on these sensors only verify whether the drum cuts to the rock wall [11, 12], these studies do not take into account the position, attitude information, and movement status of the shearer. On the one hand, the relationship between these variables and the drum height is ambiguous, whereas on the other hand, the value of these variables is irrelevant to the amount of drum height adjustment. This mapping relationship between the variable attributes and the control output is called attribute mapping. Therefore, enabling the controller to adapt to non-linearity and attribute mapping is the key solution to improving the robustness and accuracy of the controller.
For the attribute mapping relationship, it is difficult to predict the output probability distribution of a set of input data; thus it is difficult to obtain a satisfactory result to express the fuzzy mapping relationship with the non-linear fitting ability. The fuzzy set theory can be used to reduce the ambiguity and uncertainty of object extension [13]. In some studies, researchers have attempted to use the fuzzy set theory to reduce the ambiguity of the input data. Zhipeng X. et al. fuzzified the cutting current and vibration signal. In addition, a fuzzy control space was constructed and the fuzzy control principle was used to optimize the memory cutting technology [14]. However, when the sensor data dimension increased, its fuzzy control space became very complicated and difficult to construct, requiring optimization algorithms to construct the controller. Qiang Z. et al. proposed a method of drum height adjustment based on the fuzzy neural network, with multi-sensor data association [15], whereby the fuzzy neural network learned the fuzzy rules of mapping between the sensor data and drum height. After training, the fuzzy neural network model could simulate the drum height change track more accurately. Based on particle swarm optimization and minimum fuzzy entropy, Wang H. et al. proposed a method to obtain the amount of drum height adjustment [16]. This method could accurately make drum height adjustment decisions with higher decision speeds. The drum adjustment height method proposed in previous studies has been mainly based on judging whether the drum has cuts into the rock wall to make a decision, whereby it is difficult to improve the decision accuracy. However, when the input variables of the decision-making method are increased, it is difficult to achieve the desired effect from the original method. Therefore, the main motivation of this study was to find a method of drum height adjustment that could express the non-linear and attribute mapping relations.
Petri net (PN) is a mathematical tool to used describe the system [17] through directed graphs that are constructed by using directed arcs to connect places and where transitions are used in modeling, simulation, state analysis, reachability analysis, fault diagnosis, system analysis, and other fields [18–22]. PN has a discrete and logical operation mode. This special operation mode enables PN to possess special functions and gives it the ability to express knowledge. The fuzzy Petri net (FPN) is an advanced extended network of PN, inheriting the special operation mode of PN and combining the characteristics of the fuzzy set matrix operations [23]. Its operation mode can also be continuous. FPN can express non-linear relationships, and attribute mapping relationships, and is especially suited for modeling and simulating of knowledge systems [24, 25]. Yuan-jiang C. et al., Ivasic-Kos M. et al., Fei-xiang M. et al., Hu-chen L. et al.’s research results showed that FPN, as a modeling tool, can accurately represent relations, such as attribute mapping [26–29]. In FPNs that have already established the transition and place, the most important step for improving its knowledge expressing ability is to determine the weight of its directed arc. For a network structure, the gradient descent method is the most suitable method one for weight training, and the Adadelta gradient descent method [30] in gradient descent method is the most suitable method for weight adjustment of FPN because it has the advantage of dynamically adjusting the adaptive learning rate. Tuan Linh G. et al. employed a variety of optimizers to optimize the U-Net convolutional network and explain land cover types, among which the accuracy of Adadelta optimization was greater than 83% [31]. This study shows that the Adadelta algorithm has some advantages when applied to adaptive weight adjustment. Fuzzy reasoning is a reasoning process that draws possible imprecise conclusions from a set of imprecise premises. This process is similar to the reasoning of the human brain. Garibaldi Jonathan M. et al. believe that it is necessary to use fuzzy methods to express and reason out knowledge uncertainty [32]. José Roberto R. et al. used the fuzzy reasoning theory to describe the fuzzy relationship between the environment and the risk to improve the accuracy of safety assessment [33]. Sarkheyli-Hägele A. et al. revealed that fuzzy reasoning can extract logical relationships in the case to reproduce the fuzzy reasoning process [34]. The above research also showed that the fuzzy reasoning theory can perform reasoning calculations on attribute mapping relations with inaccurate properties in knowledge systems. Combining it with FPN can strengthen the expression of its attribute mapping relationship and its reasoning calculation ability. Therefore, based on FPN and fuzzy reasoning, this study proposes an adaptive fuzzy reasoning Petri net (AFRPN) to solve the issue of shearer drum height adjustment.
There are three main contributions of this study: The fuzzy reasoning PN model proposed in this research can simulate the process of the human brain drawing conclusions under certain conditions. In addition, it has the characteristics of fuzzy reasoning and the computing characteristics of PN. Based on the Adadelta method, a network parameter adaptive algorithm was constructed, so that the modeling can adaptively optimize the model parameters. Finally, this study verified the application of the AFRPN model to the height adjustment of the shearer drum through experiments.
The rest of this study is structured as follows. Section 2 outlines the technical background of shearer memory cutting. Section 3 details the model theory, an adaptive and reasoning algorithm of the AFRPN. Section 4 uses real coal mining data to establish and introduce the attributes of the data set. In Section 5, an AFRPN model will be used to model the shearer’s memory cutting process and this model will be compared to the neural network (NN), classification and regression tree (CART) and gradient boosting decision tree (GBDT) models to verify the effectiveness of the proposed method. Section 6 presents the conclusions of this study.
Memory cutting background
In the mid-1980s, West German scholars first proposed a memory cutting automatic height adjustment system, that was successfully applied in production practices. The shearer memory cutting system is a cutting system that collects manual cutting process data, and controls shearer cutting based on the data collected during the automated cutting process. Most fully mechanized mining faces nowadays use the memory cutting technology.
In the memory cutting process, the starting point of the shearer is taught to the system and then it begins normal cutting. The control information is recorded during the working cycle of the normal cutting process. If the information about drum height change is combined, it can be used as the cutting track of the teaching cycle. After teaching the next phase is the memory cutting stage, where the shearer performs repeated operations according to the control information in the teaching stage, However, manual intervention is required for exceptions during repeated operations.
Figure 1 shows the working state of memory cutting of the shearer-loader in a mine. During production, the shearer has the pitching angle α and the rolling angle β. Although the memory cutting technology is being used widely and can provide many benefits to the coal mining industry, the technology still has some drawbacks. First, the ideal cutting trajectory during memory cutting is always different from the predicted cutting trajectory, necessitating manual and frequent drum height adjustment. Second, as the number of memory cutting cycles increases, the error between the ideal cutting trajectory and the predicted cutting trajectory also gradually increase, creases and, consequently teaching needs to be repeated. Finally, during the coal mining process dust and signal interference are inevitable. The former interferes with the judgment of workers, while the latter causes missing memory points during the teaching process. This will in turn indirectly increase the error between the ideal cutting trajectory and the predicted cutting trajectory.

Shearer-loader memory cutting.
Fuzzy Petri net
The fuzzy Petri net (FPN) theory was first proposed by Carl G. Looney in 1988 [35]. He extended the fuzzy reasoning technique that the fuzzy rule matrix transforms the fuzzy true state vector to the PN. Following over 30 years of development, FPN has become one of the most widely used advanced PN extension networks. Current research generally focuses on complex system modeling, fault diagnosis, knowledge representation, and other fields [24, 36–38]. FPN can be defined as an eight-tuple ∑:
P : P = {p1, p2 … p n } is a finite non-empty set of place nodes
T : T = {t1, t2 … t m } is a finite non-empty set of transition nodes
I : I → P × T is the input matrix, I (p i , t j ) indicates whether there is a directed arc ofp i → t j , where i = 1, 2, …, n ; j = 1, 2, …, m. where
O : O → T × P is the output matrix, O (t j , p i ) indicates whether there is a directed arc of t j → p i ,where i = 1, 2, …, n ; j = 1, 2, …, m. where
W : W → I is the weight of the directed arc of the input matrix
Ω : {ω1, ω2 … ωm×n2} , Ω → O (t j , p i ) → [0, 1] is the threshold value of the directed arc of the output matrix
M : M = {m1, m2 … m n } , M → P is the token vector for all places at the time t
CF : O → [0, 1] is the confidence degree of the directed arc of the output matrix
Classical PN consists of place, transitions, directed arcs, and tokens. When PN is used to describe a discrete system, they represent the conditions, events, flow relations, and resources (information) of the system. A transition (event) res when a token (resource) exists in the input place (condition) of the transition (event). Additionally, directed arc (flow relationship) passes the token (resource) to the output place (condition) of the transition (event).
For a transition t, •t (t •) represents the set of the input (output) place of t. For a place p, •P (p •) represents the input (output) transition set of p. For the transition t, the sufficient and necessary condition of firing is that the threshold of the arc o should be less than the inner product (• t · w) of the input place of the transition and the weight on the input arc. Several transitions can be enabled under any mark M, and a new mark M will be obtained after any one of the transitions is enabled M′, which is M [′t > M, among them, "[" that M′ from M can reach.
For ∀p ∈ p, if M [t > M′,

FPN the sample: (a) before FPN transition enablement, (b) after FPN transition enablement.
In 1965, L. A. Zadeh first proposed the fuzzy set [39]. After decades of development, it has formed a relatively complete branch of mathematics and has many important applications in other fields. Reasoning is a kind of behavior that people must have, which can be abstract and summarize the existing conditions. When using binary logic for reasoning, the conditions and conclusions are discrete and accurate, but most of the information is lost in the process of reasoning; thus the process becomes inaccurate. When using multi-valued logic for reasoning, although it can ensure the accuracy of the reasoning process on the basis of less loss of information, the condition of continuous values may cause an explosion of state. Fuzzy reasoning is a reasoning method that is closer to the human brain than the above two. It is a reasoning process that draws potentially imprecise conclusions from imprecise conditions.
P = {x ∈ x| (P) true } for x
If X = p, then (p) is forever true; If X =∅, (p) is forever fake.
∀x ∈ X, P (x) = T ((p) (x))
This sentence pattern can be called (p) → (r), where x ∈ X, y ∈ Y, (p) is the condition of the reasoning sentence, and (r) is the conclusion of the reasoning sentence. We think that,
Adaptive fuzzy reasoning Petri net
Definition
The AFRPN this study proposes is a model theory developed based on the combination of fuzzy reasoning and FPN. This paper combines the advantages of both and proposes AFRPN. AFRPN can be defined as an eight-tuple ∑:
P : P = {p1, p2 … p m } is a finite non-empty set of place nodes
D : D → P is the set of propositions represented by the place
T : T = {T fuzz , T reasoning }, T fuzz = {tf1, df2 … dfn1} is a finite non-empty set of fuzzy transition nodes, T reasoning = {tr1, dr2 … drn2} is a finite non-empty set of reasoning transition nodes, having n1 + n2 = n.
F : F → T × P is the directed arc of a linked place node and a transition node, where
Ω : {ω1, ω2 … ωm×n2} , Ω → F (p i (pi j) , t ri ) → [0, 1] is the threshold of the directed arc of the linked reasoning place node and the transition node
Λ : {λ1, λ2 … λm×n2} , Λ → F (p i (pi j) , t ri ) → [0, 1] is the confidence degree of the directed arcs for linked reasoning place nodes and transition nodes, and ∑Λ (• t ri , t ri ) =1, ∑Λ (p i , • p i ) =1
M : {m1, m2 … m m } , M → P is the token vector for all places at time t
According to Definition 4, a control rule can be expressed as a fuzzy reasoning sentence:
If condition “X is P″ is true, then the inference sentence is true, and the conclusion “Y is R″ can be drawn.
The judgment sentence “X is P″ can be expressed as the intersection of several clauses:
For the premise “XisP″ to be true, any of the clauses of
where f(p i ) (x i ) is the membership degree function of p i .
“YisR″ can be expressed as the union of multiple clauses:
When “YisR″ is established, at least one of
Due to the difference of confidence degree different propositions, Equation (5) could be changed to:
When the membership degree function of fuzzy transition is Gaussian function, marking M if and only if the M (p i ) >0, t fi can make, and M [t fi > M′, for ∀p ∈ p:
After the fuzzy transition of enablement, the process is equivalent to calculating the truth value T ((di j) (d i )) of the fuzzy judgment sentence “d i is di j”.
For the reasoning transition in net ∑, t fi is enabled if and only if ∀ • t ri ≥ Ω (• t ri → t ri ) and • t ri · Λ (• t ri → t ri ) ≥ Ω (t ri → t ri •), and M [t fi > M′, for ∀p ∈ P. According to Eq. (6):
After the reasoning transition of enablement, its function is equivalent to executing the fuzzy reasoning sentence
Figure 3 is an example of an AFRPN, where D (p1) represents an input number. IF m1 = 4, D (p11) represents the membership degree of a “small” number, D (p12) represents the membership degree of “medium”, D (p13) represents the membership degree of “large”, D (p2) represents the membership degree that deduces the figure to be “not large” and D (p3) represents the membership degree that deduces the figure to be “not small”. For tf1, M (p1) >0, that is M (p1) [tf1 > M′ (p1). According to Eq. (7):

AFRPN the sample: (a) is before AFRPN transition enablement, (b) is after AFRPN transition enablement.
For tr1, M (p1 1) > ω1 and M (p1 2) > ω2, that is M (p2) [tr1 > M′ (p2). According to Eq. (8):
The confidence degree Λ and threshold Ω of the directed arc in the AFRPN network are the keys to the whole directed graph, and these parameters directly determine the accuracy of the whole network. Therefore, optimizing these parameters to minimize the reasoning error of the AFRPN is vital to modeling. In the traditional Petri net, the confidence degree is expressed through the experience of experts, which is inefficient and not applicable to larger models.
As one of the most famous optimization algorithms, the gradient descent method is often used for parameter optimization in black box systems, and Adelta is one among such methods. Adadelta is an extended algorithm of Adagrad and can solve the problem of the learning rate of the latter decreasing monotonically to an infinitely small. In this algorithm, the gradient is summed in a window ω, instead of it being always accumulated. Since storing all gradients in ω is inefficient, an exponential decay of the mean of all the previous gradients can be used as an alternative for implementation (use the root mean square value).
1:
2:
3: Initializes lambda
4: Construction error formula loss = (pre - f) 2
5: Initializes the accumulator E [g2] 0 = 0, E [Δx2] 0 = 0
6:
7: running AFRPN, work out pre
{Calculate the derivative}
8: gΛ(p•→p) = 2 (pre - f) (• t · Λ (• t → t))
9: gΛ(•t→t) = 2 (pre - f) (p · Λ (p • → p))
{Accumulate the derivative}
10: E [g2] i = ρE [g2] i-1 + (1 - ρ) g2
{Calculates the amount of updates}
11:
{Update confidence degree}
12: Λ = Λ - Δx
{Accumulates the amount of updates}
13: E [Δx2] i = ρE [Δx2] i-1 + (1 - ρ) Δx2
14: Normalizate Λ to (0, 1)
15: i = i + 1
16:
The reasoning process of AFRPN is similar to fuzzy reasoning. First, the input variables are fuzzified by the fuzzy transition. The conditional closeness degree of fuzzy reasoning rules is then calculated. The result of the conditional closeness degree determines whether the transition is enablement (the conclusion of fuzzy reasoning rules is drawn). When multiple transition output places are the same, the calculation results of multiple transitions are the output by logic OR. The pseudocode of the reasoning algorithm is shown in algorithm 2:
1:
2:
3: Initializes the place matrix P
4: Initializes the fuzzy transition matrix T fuzz
5: Initializes the reasoning transition matrix T reasoning
{Fuzzy condition variables}
6:
7:
8:
9:
10:
{Reasoning}
11: N2 = T reasoning
12:
13: m = ⋀ • tr1
{To judge reasoning transition enablement}
14:
15: mediate (t ri • , q) = • tr1 · Λ (• tr1 → tr1) ·
{Store the intermediate variable of reasoning transition}
16: q = q + 1
17:
18: N2 = N2 - tr1
19:
{Output conclusion}
20: D = Max (mediate) · Λ (t ri → t ri •)
Establishing the experimental data set
The experimental data set was derived from the 43101 fully mechanized mining face in Yujialiang Mine, Shaanxi, China. The face of the mine is 351.4m, the average thickness of the coal seam is 1.47m, the average height of the working face is 1.4m, and is inclined to the extent of 3° - 5°. The The details of the comprehensive mining equipment are as follows: the hydraulic support adopts the ZY9200/09/18D double-pillar shield type, the shearer adopts the MG2x200/890-WD1 type, and the scraper conveyor chooses the SGZ800/1400 type. Data was retrieved from the workers’ operations during the memory cutting process of the coal mining working face. The collection time interval was from March 1 to March 17, 2020, and the sampling period was 1.0s.
The original data setcomprised 28 condition attributes and one decision attribute (The condition attribute was continuous, and the decision attribute was discrete). After deleting duplicate samples, the total number of samples was 26,343. The condition attributes were further divided into four categories: shearer status attributes, drum status attributes, shearer status change attributes, and drum status change attributes. Among them, the shearer status attribute represented the position information and speed information of the shearer at the sampling time, the drum status attribute represented the position information and cutting status information of the drum at the sampling time, and the shearer status change attribute and the drum status change the attribute represented the degree of change between the sampling time of the attribute and the previous sampling time. The conditional attributes are shown in Table 1. According to the decision attribute, the data set could be divided into three categories: Right drum decision data set: The decision attribute of this data set was the decision of the right drum at that moment. They were rest, rises and drops. Left drum decision data set: The decision attribute of this data set was the decision of the left drum at that moment. They were rest, rises and drops. The left and right drum co-decision data set: the decision attribute of this data set was the decision of the left and right drum at that moment. They were rest, rises and drops.
The meaning of the conditional attribute(input place)
The meaning of the conditional attribute(input place)
For the missing values and outliers in the data set, we used Lagrangian interpolation to interpolate, data with multiple missing or abnormal attributes in a sampling point that caused serious distortion of data at that sampling point. Therefore, this type of data in the data set was to be eliminated first. The data imbalance in this data set was relatively serious. The data that did not change on the drum was much larger than the data that did change. Therefore, it was necessary to downsample the data that did not change to make the scale of this type of data roughly the same as the other data.
The data set obtained, after the above processing eliminated data imbalance, and there were no missing values and outliers. Among them, the numbers of data contained in the decision attribute value in the right drum decision data set were 2,063, 2,129, 1,473, and the data set size was 5,665; the number of data contained in the decision attribute value in the left drum decision data set were 2,063, 2,276, 1,662, and the size was 6,001. The numbers of data contained in the decision attribute values of the right and left drum co-decision data sets were 516, 528, 516, 488, 550, 603, 549, 530, 470, and the data set size was 4,750.
Finally, we used the k-means method to cluster all the data of each attribute, calculate the mean and variance of the divided classes as the parameters of fuzzy transition, and discretize the data according to the divided classes.
Establishing the model and model evaluation method
From the fuzzy transition parameters obtained in the previous section, fuzzy transition, whose input place was the place that corresponded to the input properties of the dataset was created (the meaning of the input library is shown in Table 1), and the output library was the feature library for each attribute. To remove some input attributes irrelevant to the decision to simplify the modeling, Rosetta software was used to reduce the discretized data. Subsequently, feature attribute sequences were extracted from the discrete data, and a reasoning transition was created for each feature sequence. The input place of the reasoning transition was the sequence of the extracted feature place, while the output place was the decision place (the meaning of the decision place is shown in Table 2). Since there was more than one results of the reduction, multiple subsets are divided to build subnets according to the result of reduction. Each subnet was represented by Roman numerals, where the last subnet of the different datasets was a synthesis of the previous ones. Figure 5 shows the AFRPN partial network model of the right drum decision data set I (network parameters and partial transition were omitted).
Decision place meaning for each dataset
Decision place meaning for each dataset
p i Represents only the Marking of the place under the current dataset

Structure and function of AFRPN.

Right drum decision data set I AFRPN partial net structure model.
To verify the performance of this method, three commonly used classification methods were used to construct the drum adjustment height model and compared it to the method proposed in this study. Among the methods, NN has outstanding nonlinear fitting ability, and exhibits strong performance for problems with obvious nonlinear mapping. In addition, CART and GBDT show strong generalization ability for attribute mapping problems. Table 3 shows the hyperparameter settings for the different classifiers. To prevent model overfitting, this study pruned all the generated CART and GBDT models based on the principle of error. The preliminary tests found that more the number of GDBT subtrees, the higher the accuracy of the model. Therefore, to give full play to the potential of the algorithm, we selected the number of generated subtrees as 28. For NN, choosing a deeper network structure or more nodes affected the generalization ability of the model or made the model overfit. Through the preliminary tests, we determined the topology of NN. ReLU wsq selected as the activation function for the hidden layer to achieve more efficient gradient descent and back propagation, thereby avoiding gradient explosion and gradient disappearance. Sigmoid was chosen as the activation function for the output layer because it is smooth and easy to differentiate.
Hyperparameters of different classifiers
This study used the 10-fold crossover method to evaluate the performance of the proposed model by first dividing the data set into a training set of 90% and a test set of 10%, using the training set of 10% for testing with each epoch and the rest for training, calculating and recording the accuracy under the principle of average error and maximum membership for each 1/10 epoch. The attenuation rate of each model adaptive algorithm was set as ρ = 0.9, and the constant was set as ɛ = 0.1. Ten epochs were trained to ensure the convergence of the model. After training, the test set was used to test the model output and calculate the accuracy for 1% precision of the threshold value ζ ∈ [0, 1] in the recalling rate of (Recall), accurate rate of (Precision), and F1 values. Accuracy is defined as the proportion of the instances with accurate prediction in all instances.
We take the right drum decision data set as an example to briefly explain the calculation method of Recall, Precision, F1 - Score, Table 4 is a confusion matrix with ζ as the threshold. Table p29p30 indicates that the actual right drum was determined to be static, and the model output was the right drum rising under the threshold of ζ. Formula Recall, Precision, F1 - Score is (where n is the quantity of output place) as follows:
Confusion matrix with ζ as the threshold
This study judged the merits and demerits of the AFRPN model from five perspectives: (i) The maximum accuracy rate of P, representing the maximum value of the proportion of the correct positive example data in the predicted positive example data under different thresholds. (ii) The recall rate under the maximum precision rate P was R, where R represents the proportion of the data predicted as positive examples in the threshold value under the maximum precision rate of P. (iii) Under the maximum accuracy rate of P, F1 was the harmonic average of P and R, representing the predictive ability of this model. (iv) Accuracy is calculated by taking the decision With a maximum membership degree in the output of multiple decisions as the final decision, and (v) The running time, which represents the time needed to make a decision. The smaller the running time, the model could adapt to a higher sampling frequency, with the model’s decision response being more sensitive. The results of all the data sets are shown in Tables 5, 6, 7. We used boldface to indicate the optimal values of the four indicators under a data set.
Decision model performance for data sets I
Decision model performance for data sets II
Decision model performance for data sets III
It can be seen from the table that the indices of the different models in the same data set varied greatly, among which AFRPN showed the best performance, followed by GBDT and CART. On account of these three data sets all of them had more attribute mapping relations, and CART and GBDT showed better performance than NN. There was no significant difference between the same model and different data sets, but the performance of the NN model of the left and right drum co-decision data set was better than that of CART and GBDT. This could be because the output decision of this data set was more than that of the other data sets, making this data set easier to become non-linear.
The running times of the different models are shown in Fig. 6. It can be seen from the figure that the difference in the running time of the same model with different data sets is not obvious. GBDT had the longest running time and could generally reach more than 100ms. The running time of NN was the shortest, usually within 1ms. The main factors that restrict the operation speed of the AFRPN model are the advantages and disadvantages of the algorithm and the number of transitions. The more the transitions, the slower the AFRPN model becomes, so controlling the size of the model is a key step in the application of the model.

Comparison of running times between different models.
CART is the most widely used decision tree algorithm, and its time complexity and decision time are small. However, since the model is essentially a partition of data sets, the number of rules that can be obtained is limited
GBDT comprises multiple CART subtrees. Although it is different from the parallel relationship of random forest, GBDT involves multiple partitioning of data sets compared with to CART; thus, it can achieve high accuracy. Although increasing the number of subtrees can improve the accuracy, the calculation time also increases.
For NN, the performance of the single drum decision was relatively poor and was vastly different from the result of the common decision. This could be because the fuzzy of the single drum decision was strong, affecting the training of the neurons.
The key of the AFRPN model to obtain high accuracy lies in the accurate description of decision rules rather than the number of rules. The state event confidence degree could be obtained by fuzzing the input variables in the fuzzy layer. The decision was made by confidence degree competition for the reasoning layer.
The track diagram of the drum height is shown in Fig. 7, which is the Y-axis offset diagram. We used circles to mark the points in the trajectory where the decision was wrong. By comparing the trajectories generated by different models, it can be seen that the trajectories generated by the AFRPN model are relatively smooth and had a little deviation from the real trajectories. However, the trajectories produced by the other models were rough and deviated greatly from the real trajectories. To minimize the impact of the decision error points on the actual use process, some mechanisms should be added. A soft sensor is a model for industrial process monitoring, control, and optimization. The model can estimate some parameters that cannot or are difficult to measure. It can be integrated with other models to enhance the performance of the model [40, 41]. The soft sensor model can optimize the result of the decision to some extent, which will be our next research plan.

The figure is the Y-axis offset figure, three of which are the height track of the coal seam cut by the drum. (a) The left drum data set track. (b) The right drum data set track; (c) Left and right drum co-decision data set tracks.
To sum up, the AFRPN model could effectively predict the adjustment height of the shearer drum height, achieve higher accuracy, obtain faster response speed, and reasonably adapt to changes in the coal seam roof. This method can be applied to production practice and is of practical significance, offering research benefits.
In this study, an AFRPN was proposed that could express non-linear and attribute mapping relations. An adaptive algorithm based on Adelta was also proposed to optimize the model parameters and reduce the decision error. Finally, the reliability of the proposed AFRPN model was verified through experiments by applying it to drum height adjustment. The experimental results are summarized as follows: Experimental results showed that AFRPN can represent fuzzy rules in the process of drum height adjustment to construct a controller with both high robustness and decision accuracy. The maximum accuracy of the drum height adjustment decision could reach 96.5%, and the output results could be obtained faster. Increasing the complexity of the model will not necessarily increase the accuracy of the output, or make the model overfit and decrease the accuracy. The increase in model complexity is accompanied by an increase in model training time and model running time.
To conclude, AFRPN is effective in solving the intelligent adjustment height of the shearer drum. The theory can potentially be applied as a knowledge representation tool and a modeling tool. In the following research, we will expand the application scenario of this theory and optimize it to deal with more challenges.
Acknowledgment
The authors would like to express their gratitude for the support from the National Key R&D Program of China (No. 2017YFC0804310) and Xi ’an Science and Technology Plan Project (2019113913CXSF017SF027). We thank wordvice (www.wordvice.cn) for its linguistic assistance during the preparation of this manuscript.
