Research on data mining system based on artificial intelligence and improved genetic algorithm

Abstract

Due to the explosive increase of data scale, the traditional database management technology can no longer satisfy and analyze these data. Data acquisition technology is a tool that can process data effectively. The research of data acquisition has produced many new concepts and methods, which enrich and improve the data acquisition technology and establish the theoretical system. The relevant extraction criteria are an important branch of data extraction and one of the most important research fields. The use of genetic algorithms to mine related standards has been widely used, but traditional genetic algorithms are easy to be used. Therefore, under the best conditions, the application of better genetic algorithm to mine the relevant standards is a key problem to be dealt with in this paper.

Keywords

Artificial intelligence improved genetic algorithm genetic BP neural network data mining

1 Introduction

With the rapid development of information technology, the ability to collect, store and process data will increase. As a result, the database will gradually expand and diversify, its scope of application is wider. The maturity of data management technology contributes to the computerized development of society and the modernization of public business services, including the rapid growth of information on the network, resulting in a large amount of data. Therefore, in order to analyze the large amount of data generated, the database has been expanded and widely used in business and science. With the development of database technology, database technology has also increased. Traditional database management techniques can not satisfy and analyze the hidden knowledge of these data because of their explosive expansion. it is obvious that database management technology is no longer able to process large data sets. therefore, data acquisition technology is the effective tool we want to process large volume data. we hope that this will help us to better understand the status of the data.

Data extraction is usually defined as a process through which a large amount of data is collected to obtain information. At present, data acquisition plays an increasingly important role in all sectors, not only to describe the development process of past data, but also to determine the development model that can predict the results. In fact, from any point of view, knowledge is implicit information, it is necessary to classify the data when there is data. Therefore, with the development of science and technology and the increase of information, the classification of massive data is becoming more and more important.

The research of data acquisition has produced many new concepts and methods, which enrich and improve the data acquisition technology and establish the theoretical system. the collection of these standards is an important branch of data acquisition research. using genetic algorithms to mine relevant standards has been widely used, but traditional genetic algorithms are often the most suitable conditions. therefore, better application of genetic algorithms to mine relevant standards is a key issue to be addressed in this paper.

2 Related work

About the improved genetic algorithm, the literature [1] analyzed the phenomenon of convergence of traditional genetic algorithms. the reasons are mainly related to the selection of genetics, population distribution and problem allocation. it lists some individuals who can have a significant impact on convergence of genetic algorithms and combine the concept of the joint role of these individuals and environmental evolution literature [2] analysis combines traditional genetic algorithms into static optimization research through algorithm selection, intersection and mutation. the model theorem can not guarantee that traditional genetic algorithms follow the optimal approach when solving optimization problems [3] literature, the inherent algorithm parallelism of genetic algorithms is studied, and the choice to achieve this is summarized. Parallel genetic algorithms discuss the possible fields of general algorithm parallelism. Document [4] designed an arithmetic option with control parameters, introduced the concept and mathematical basis of the design, analyzed the relationship between arithmetic selection and control parameters, compared the weak defects of adjustment scale and global convergence used in the selection of arithmetic, and adopted the “power gradient method” to control the selection of selective arithmetic [5] the literature, some suggestions for improvement are put forward, which are based on the traditional genetic algorithm accumulated in the early stage, which recognizes the direction of evolution and uses this index to guide gene chromosome adaptation. A modified gene algorithm [6] developed in literature. This algorithm proposes a particle swarm optimization method and combines it with a digital coding gene algorithm. Using a series of chaos to produce the initial population, the selection process uses nonlinear classification, the progressive method based on competitive selection, and the genetic algorithm is improved by optimizing the particle mass to create new individuals [7] literature, the algorithms of particle mass, synthetic mass, ant and artificial immunity are analyzed, including: genetic algorithm combined with particle mass algorithm, artificial immune algorithm, formed a hybrid genetic algorithm, the advantage is that the polymerization speed is not easy to decline, and a test function is used to verify the effectiveness of particle mass and artificial immunity literature [8] explores the relationship between crossover and variable probability and individual adaptation, and designs crossover and variable probability functions for individuals. during the whole operation, highly adapted individuals are protected from damage and the results are good through the test [1] literature has improved the formula of crossover and variable probability to solve its shortcomings and adapt to the change of crossover and variable probability, and the formula of variation probability evolves relatively slowly in the early stage of development [9] literature describes other genetic activities and macro-operation of biomes and genetic algorithms, including mutations, visible genes, diploid and polyploid structures, etc.

As to the research of the data acquisition system of the relevant standards, the literature [1] focuses on the use of the relevant standard algorithms in large databases. These algorithms are successfully applied to the extraction of massive data through analysis and improvement. Literature [10] the basic concepts and processes of data extraction, as well as the usual methods and techniques. Document [11] describes the programming of computer standard algorithms and how to extract data about standards from computers literature [12] provides detailed information on the use of WEKA software (a set of machine learning algorithms for data extraction tasks) for pre-data processing, classification, regression, and grouping. Literature [13] introduces the methods and techniques used in data acquisition, and analyzes these methods in combination with biological data acquisition cases. Document [14] provides a vision: an overview of new data acquisition research areas and a detailed description of the data acquisition process and related data acquisition methods. Information [1] literature has studied the method of mining related standards by using genetic algorithm to improve the structure and data coding of adaptive function to avoid premature variation adaptability [1] literature, a mining algorithm based on multi-level correlation genetic algorithm is proposed. according to the common characteristics of multi-level data, a preliminary self-definition of critical value is proposed to improve mining efficiency and improve the accuracy of mining a better simulated annealing genetic algorithm has been merged [1] the literature. in order to be applicable to the relevant standards, a new mining algorithm is proposed according to the improved simulated annealing gene algorithm. within the framework of this algorithm, the intersection and mutation possibility of the algorithm are selected by dynamic adjustment [15] literature suggests improving the gene algorithm and improving the efficiency of the gene algorithm [16] literature has incorporated simulated annealing algorithms into genetic algorithms and cited a new adaptive function for standard algorithms to build partnerships to determine support standards and reliability [17] literature, the degree index of interest in the connection standard algorithm is adopted, which effectively avoids the error standard, and makes appropriate dynamic adjustments to the crossover probability and variable probability.

In short, with the increase of data in the database, the simple application of connection rules is inefficient. Search by genetic algorithm can greatly increase the efficiency. Almost all researchers improve the overall search ability by studying genetic algorithm. And it is used in the mining of relevant standards.

3 Basic analysis of traditional algorithm

3.1 Analysis of common algorithm for data mining

Data mining methods are as follows:

(1) Statistical methods: statistical analysis of scale data in databases using statistical principles. Statistics play a very important role in the whole data collection process and provide a series of traditional methods for data use, including regression analysis (multiple regression, spontaneous regression, etc.), differential analysis (BAYES standards, fishery standards, non-standardization, etc.), group analysis (systematic grouping, dynamic grouping, etc.), exploratory analysis (meta-analysis, correlation analysis), etc.

(2) Neural network method: a nonlinear prediction model based on neural structure, based on research. The usual algorithms include front-end neural network (BP algorithm), self-organized neural network.

(3) Tree decision method: Tree decision method is to use the information advantage in information theory to find the most informative attribute field in database for searching, which is a very important classification and mining method, among which the most famous methods include: ID3、IBE and so on.

(4) Fuzzy mathematical method: fuzzy mathematics is an important new mathematical thing, in the field of data extraction, it is mainly comprehensive distinction fuzzy, connection fuzzy, grouping fuzzy and fuzzy classification.

(5) Genetic algorithms: algorithms that simulate the natural selection of biological communities, mainly by selecting operators’ cyclic calculations, cross accounts and variables, which have overcome the problem of nonlinear optimization and polarization in some places.

3.1.1 An overview of association rule algorithm

For a long time, the link standard has become an important field of data extraction research, because it can find the relationship between attributes in the database, and it is one of the first problems to be considered in the extraction process. As shown in Table 1, the scope of application of connection rules is not only limited to sales data, but also increasingly uses connection rules to discover the relationship between various things and their meaning see Table 1 for details.

Table 1
Basic concepts of association rules

Concept Analysis

Item A field in each transaction database, each item is an attribute value

Items set An item set is a set of items that are an attribute value

Support Support is used to measure the frequency of an item set

Confidence levels One conditional probability reflects the conditional probability of B transactions in A transactions.

Minimum support Indicates that the user is only interested in itemsets and rules not less than that threshold

Minimum confidence Indicates that the user is only interested in certain rules, which have a high probability

Frequent itemsets The minimum degree of support is used as the threshold to measure the frequency of itemsets

Strong association rules The association rule that satisfies the minimum support degree and the minimum confidence degree

Concept	Analysis
Item	A field in each transaction database, each item is an attribute value
Items set	An item set is a set of items that are an attribute value
Support	Support is used to measure the frequency of an item set
Confidence levels	One conditional probability reflects the conditional probability of B transactions in A transactions.
Minimum support	Indicates that the user is only interested in itemsets and rules not less than that threshold
Minimum confidence	Indicates that the user is only interested in certain rules, which have a high probability
Frequent itemsets	The minimum degree of support is used as the threshold to measure the frequency of itemsets
Strong association rules	The association rule that satisfies the minimum support degree and the minimum confidence degree

Where the support degree of the item set X={A,B} is the ratio of the number of transactions containing both the transaction A and the B to the total number of transactions. Use Sup (X) to represent the number of occurrences of itemsets X. That is, probability P (A∪B). $Suppot (A = > B) = PA \cup B) = \frac{Sup (X)}{| D |} = \frac{Sup (A \cup B)}{| D |}$ (1)

The ratio of the number of transactions containing A and B to the number of transactions containing A, that is, the confidence of the A=>B is {the degree of support of the A} divided by the degree of support of the item set {A,B}. That is, probability P (A |B). $Confidence (A = > B) = P (A | B) = \frac{Sup (A \cup B)}{Sup (A)}$ (2)

For the A=>B of this rule, the degree of interest is defined as:

$\begin{matrix} - 20 ptInte (A = > B) & = \frac{P (A | B)}{P (A) P (B)} = \frac{Sup (AB)}{Sup (A) Sup (B)} \\ = \frac{Confidence (A = > B)}{Support (B)} \end{matrix}$ (3)

Formula (3) when the interest is 1, it shows that the transaction A does not affect the transaction B, and the interest is more than 1, then the transaction A will lead to the transaction B, which is a rule. When interest is less than 1, the transaction A prevents the B. of the transaction

3.1.2 BP neural networks

BP learning process of the pattern is as follows: provide training samples for the network pattern and disseminate data, that is, the difference between the actual output and the actual output. Through learning, the network is gradually stable and the network weight is changed one after another. The actual output of the network will tend to the ideal output and will be kept within the prescribed limit.

BP network is a network composed of access, output and intermediate networks, usually with only one default layer, in special cases, increasing the number of hidden layers as needed. The real data processing is as follows: the input vector of the input layer is usually the value of the sample information characteristic, and after the input layer receives the data, the data is transmitted directly to the ground, which is a set of data in the sample received. Because no special processing of data is required. Thus, the expression is as follows: $\overset{⇀}{X} = [x_{0}, x_{1}, \dots, x_{n}]$ (4)

Implicit layer: is an effective network processing layer, which usually adjusts the input layer by the activity function in a specific region, receives weighted data from the input layer, and processes the data through a threshold to obtain more data. An expression of the neural i of the hidden layer is as follows: $x_{i} = \overset{⇀}{X} W_{i}^{⇀}$ (5)

Connection weights between the hidden layer neuron i and the input layer are as follows: $W_{i}^{⇀} = = [\begin{matrix} w_{i 0} \\ w_{i 1} \\ w_{i 2} \\ \cdot \\ . \\ w_{in} \end{matrix}]$ (6)

And the expression of neuron i is: $x_{i} = [x_{0} w_{i 0}, x_{1} w_{i 1}, \dots, x_{n} w_{in}]$ (7)

When the neuron i get the input data, it will be constrained to the expected value by the activation function. The activation function is S type activation function, as shown below. $f (x) = \frac{1}{1 + e^{- ax}} (0 < f (x) < 1)$ (8)

So the output value of the neuron i after the activation function F(·) constraint is: $o_{i} = f (x_{i}) = f (\overset{⇀}{X} {\overset{⇀}{W}}_{i})$ (9)

Hence the output vector of the whole hidden layer is:(where k is the number of neurons in the hidden layer) $\overset{⇀}{O} = [o_{1}, o_{2}, \dots, o_{k}]$ (10)

Output layer: the input of the output layer is the weighted sum of the output of the hidden layer, and the received data is processed by the activation function, and then the actual output of the whole network is obtained. An input expression j the output layer neuron is as follows: $x_{j} = \overset{⇀}{O} W_{j}^{⇀}$ (11)

Output layer is also a S activation function, which constrains the data of data output layer. and the output of the neuron j is: $o_{j} = f (x_{j}) = f (\overset{⇀}{O} W_{j}^{⇀})$ (12)

So the output vector of the whole network is: $\begin{matrix} \vec{OO} = [o_{0}, o_{1}, \dots, o_{m}] \\ = F_{2} (F_{1} (\overset{⇀}{X} W_{1}^{⇀}) W_{2}^{⇀}) \end{matrix}$ (13)

First, the mean square error function of the network is calculated, and its mathematical expression is as follows: $E = \frac{1}{2 P} \sum_{P} \sum_{K} (t_{pk} - o_{pk})^{2}$ (14)

When the error of the network is obtained, the error first reaches the output layer and adjusts the weight of the output layer. The mathematical expression of weight matrix adjustment is as follows: $\begin{matrix} Δ w_{ij} = lr * δ * {oh}_{i} \\ w_{ij} = w_{ij} + Δ w_{ij} \\ δ_{j} = {oo}_{j} * (1 - {oo}_{j}) * (y_{i} - {oo}_{j}) \end{matrix}$ (15)

And after substitution, the above formula is simplified as follows: $\begin{matrix} W_{jk} (t + 1) = W_{jk} (t) + Δ W_{jk} \\ Δ W_{jk} = - η \frac{\partial E}{\partial W_{jk}} \\ δ_{k} = (t_{k} - O_{k}) O_{k} (1 - O_{k}) \end{matrix}$ (16)

The expected output of the general hidden layer is uncertain, and the adjustment formula of the weight matrix of the hidden layer is as follows: $\begin{matrix} δ_{j} = {oh}_{j} * (1 - {oh}_{j}) * \sum_{i = 1}^{m} (w_{ji} * δ i^{o}) \\ v_{ij} = v_{ij} + Δ v_{ij} \\ Δ v_{ij} = lr * δ_{j} * χ_{i} \end{matrix}$ (17)

The formula is simplified as follows: $\begin{matrix} W_{ij} (t + 1) = W_{ij} (t) + Δ W_{ij} \\ Δ W_{ij} = - η \frac{\partial E}{\partial W_{ij}} = η δ_{j} x_{i} \\ δ_{j} = \sum_{k} δ_{k} W_{jk} O_{j} (1 - O_{j}) O_{i} \end{matrix}$ (18)

Learn for the second time according to the next set of sample data and repeat until the network performance is optimal see Fig. 1 for details.

Fig.1

BP Network training process.

3.2 Basic principles of genetic algorithms

GA encode each possible solution as a vector, each chromosome vector element is called a gene. all chromosomes are evaluated according to the expected objective function of each chromosome and the fitness values are assigned according to their respective characteristics. starting from the random generation of certain chromosomes, their adaptability, chromosome selection, exchange, and mutation are calculated by the elimination of low-adaptive chromosomes and the retention of high-adaptive chromosomes; in general, the new chromosome group is larger than the generating group. By analogy, until the optimization goal is achieved, Fig. 2 illustrates the basic principles of genetic algorithms: see Fig. 2 for details.

Fig.2

Schematic diagram of genetic algorithm.

(1) Selecting operators: The basic operation of genetic algorithms includes the selection of screening, intersection and mutation operators, also known as operators, whose function is to determine whether the individual will be eliminated or retained in the next generation, from which the best parents are selected according to their merit. In general, three different types of specific choices in the field are most common when there is a mix of clear options and options:

The specific population is N, individual adaptation is F (i), the individual adaptation is I, the probability of individual selection is the cumulative probability is QI, and the cumulative probability is compared with the R[0.1] random average generated by the probability. Determine which individual replicates in the next generation. $P_{i} = \frac{f (i)}{\sum_{i = 1}^{N} f (i)}, Q_{i} = \sum_{j = 1}^{i} P_{j} (j = 1, 2, \dots, i)$ (19)

Therefore, the probability reflects the proportion of individual adaptation in the whole group adaptation and the greater the individual adaptation, the greater the possibility of selection: conversely, the higher the probability of selecting each individual in the group.

(2) Cross operator: Cross chromosomes, called “recombination and pairing”, are between two paired chromosomes, exchanging some of their genes in one way or another, resulting in two new chromosomes. The effectiveness of genetic algorithms mainly comes from the selection of cross operations, which play a central role and determine the overall search ability of general algorithms.

The first is to randomly select the two chains of the father’s generation, and then randomly determine the intersection point; finally, the intersection point is L, the length of the chain is L, the intersection point is L-1, the result is see Fig. 3 for details.

Fig.3

Examples of single-point crossover.

(3) Variant: The so-called variant, which includes the selection and intersection of most of the search functions of genetic algorithms, replaces some genetic value of each chromosome with other chromosomes, thus creating a new individual, is the best measure against the general algorithm.

3.3 Simulated annealing algorithm

Simulated annealing algorithm was originally proposed in 1953. the algorithm is mainly based on the complexity of the NP. through the optimization process, the partial optimization is achieved. the optimization process is compared with the thermal equilibrium problem of statistical thermodynamics. the physical images and statistical features of the solid annealing process are used as the physical environment to avoid local optimization of the algorithm. the solid reaction is to heat the solid to a sufficiently high temperature so that the molecules are randomly placed, then cooled, and finally the molecules are sorted in a low energy state. Table 2 compares the optimization problem with the solid similarity see Table 2 for details.

Table 2
Comparison of similarities

Combination optimization problem Solid

Solution Particle state

Optimal solution The lowest energy state

Set initial temperature Melting process

Metropolis sampling process Isothermal process

Decline in control parameters Cooling

Objective function Energy

Combination optimization problem	Solid
Solution	Particle state
Optimal solution	The lowest energy state
Set initial temperature	Melting process
Metropolis sampling process	Isothermal process
Decline in control parameters	Cooling
Objective function	Energy

The basic concepts of the simulation defense algorithm:

(1) Objective function: the objective function is to optimize the minimum value of the objective function in general when the maximum value of the desired function is converted to the minimum value of the objective function multiplied by -1.

(2) Temperature: Temperature is an important parameter in the simulation cooling algorithm, because the cooling process of solid flame changes with its composition. The distance between the new solution produced by the simulated annealing algorithm and the existing solution is controlled. Secondly, the possibility of accepting the new explanation by the simulated annealing algorithm is determined. The objective function values of these explanations are lower than the current objective function values.

(3) Annealing schedule: Annealing schedule involves the use of algorithms to reduce temperature. The slower the temperature decreases, the slower the annealing decreases. The simulated annealing algorithm is the best solution found at present. The time schedule includes parameters such as initial temperature and control temperature function.

(4) Metropolis criterion: Metropolis criterion is a method explanation of simulated annealing algorithm. This paper is used to optimize the selection of target function to optimize objective function. The possibility of new solution is: $P = {\begin{matrix} 1 & f_{1 + 1} ⩾ f_{1} \\ exp (- \frac{k (f_{1 + 1} - f_{1})}{T}) & f_{1 + 1} < f_{1} \end{matrix}$ (20)

It can be seen from formula (20) that if the new solution is less than the current solution, the higher the temperature, the greater the possibility of the differential solution. Therefore, the simulated annealing algorithm can be optimized locally more easily, and the probability of accepting the differential solution decreases with the decrease of temperature. Metropolis criterion mainly involves simulated annealing algorithm see Fig. 4 for details.

Fig.4

Flowchart of simulated annealing algorithm.

4 An improved genetic algorithm — — simulated annealing genetic algorithm

4.1 Ideas for genetic algorithm improvement

The basis of genetic algorithm is: choosing to pass the best model of the present individual to the next generation individual for arithmetic, using the cross equation to adjust the structure of the model, some bad models are phased out, and some good models are left behind. And gradually get the best results.

However, in the operation of practical algorithms, multiple models affect the efficiency of genetic algorithms. In the case of limited resources, it is necessary to choose the “best choice “. For example, half of the less suitable models are eliminated each time the gene is operated. Genetic algorithms tend to be highly adaptive models, but because of the limited size of genetic algorithms, it may lead to more reproduction of the next generation of individuals above the average level of individual adaptation. This will continue after some individuals have absolute advantages in the individual group. Genetic algorithm enhances this advantage, the community begins to meet, the individual becomes more and more similar, the bad individual does not have more chance to reproduce, finally, the population will break the deadlock, which causes the genetic algorithm to appear precocious. There are two strategies or ideas for improving genetic algorithms:

The first is to maintain as much diversity as possible, or to ensure that the diversity of populations is not lost, as in small genetic algorithms, if the evolution of genetic algorithms is not complete.

The second is that the loss of group diversity may occur during the evolution, but it provides a mechanism to generate new forms of individual participation in group evolution, thereby increasing group diversity. First, the new methods of individuals are used to increase the diversity of groups, which often combine other algorithms and genes to produce new individuals.

In this paper, the traditional genetic algorithm is improved, the chromosomes of immune mechanism are selected, and mutual adaptation and mutation are carried out according to the model to overcome the precocious phenomenon of genetic algorithm.

4.2 Operation of simulated annealing genetic algorithm

Simulated annealing gene algorithm is a combination of gene algorithm and simulated annealing algorithm. Its main operation is screening, interleaving and mutation.

(1) Selective manipulation based on immune mechanisms: Selective manipulation is the selection of the most environmentally appropriate individuals from the population and their use in the next generation of reproduction.

(2) Adaptation to cross-operation: Cross-operation refers to the exchange of the same gene between different individuals to create a new gene, which is an important step to protect the diversity of clusters. This paper adopts a self-contained adaptation method to dynamically adjust the probability of interaction between PC and PM and further reduce the best probability.

(3) Adaptive variability manipulation: mutation manipulation is a heterogeneous transformation of an individual specific gene, another important operation of biodiversity and an important component of analog gene algorithms, based on Metropolis criteria, which will affect the convergence behavior of the whole algorithm. The usual annealing functions are as follows:

Fast cooling: t_k=α/+k 1

(Index decline: tk=αt per cent_k - 1

Decline: t_k=(1- k/K) t₀

Logarithmic decrease (K number of decay steps): t_k=α/log (k + 1)

5 Application of simulated annealing genetic algorithm in association rule mining

Genetic algorithm is a random search method based on biological natural selection and genetic mechanism. The object is all individuals in the population. The spatial parameters are encoded by effective search technology. The search ability is reviewed by genetic algorithm to find concentration and frequency.

First, the user problem information is processed through a predetermined processor, the information is encoded as information with limited time, then the image is drawn for each attribute, and then the temporary information table is detected in the database SQL the search engine, and then separated.

5.1 Application steps

5.1.1 Coding

The coding of genetic algorithm is used to describe the feasible solution of the problem, that is, the feasible solution to decompose the spatial problem is transformed into the search space method which can be processed by genetic algorithm. In this paper, the coding method is adopted. The selection value of each service attribute is represented by the number after the decimal point, the number after each decimal point represents a gene, connects a service attribute, and forms a decimal string.

5.1.2 Design of fitness function

The degree of adaptation is usually used to measure the degree of excellence of a group to achieve or close to the optimization calculation. It is the basis of the application of genetic algorithm. The adaptive function is used to evaluate the degree of adaptation. The criteria for adaptive function are: $F (X) = {\begin{matrix} aSupp (X) + β Conf (X) Inte (X) ⩾ 1 \\ \frac{1}{α} Supp (X) + \frac{1}{β} Conf (X) Inte (X) < 1 \end{matrix}$ (21)

5.1.3 Selection operator of immune mechanism

The traditional roulette strategy is often immature. Therefore, this paper adopts the selection strategy based on immunization mechanism. The probability is as follows: $P_{d} = {\begin{matrix} \frac{1}{M} (1 - d) Individuals with the highest \\ concentrations in the population \\ \frac{1}{M} (1 + \frac{d^{2}}{1 - d}) Other individuals in groups \end{matrix}$ (22)

The advantages of this approach are as follows: the higher the value of individual adaptation, the greater the possibility of PF adaptation, the greater the possibility of choosing P, the greater the possibility of individual selection (catalytic effect), and the higher the degree of convergence of the accelerated algorithm; the smaller the possibility of PF adaptation, the lower the degree of individual concentration, the lower the likelihood of commodity dependence and the lower the likelihood of commodity selection.

5.1.4 Adaptive crossover operator

Cross operation is a process in which two matched chromosomes exchange a part of the gene in some way, resulting in two new chromosomes. individuals in this document, the intersection probability and the probability of polychlorinated triphenyl variability were dynamically adjusted using various methods. When the degree of adaptation is different, if the population difference is large, the heterogeneity is large, the possibility of crossover and mutation is small, and the population diversity is low, the adaptation tends to converge or optimize locally, it is allowed to change with the degree of adaptation. The possibility of interlacing and mutation increases, thus effectively preventing “premature” phenomena. The intersecting possibilities of self-adaptation used in this document are as follows: $P_{c} = {\begin{matrix} P_{c 1} & f^{'} < f_{avg} \\ P_{c 1} - P_{c 2} \frac{{f^{'} - f}_{avg}}{f_{\max} {- f}_{avg}} & f^{'} ⩾ f_{avg} \end{matrix}$ (23)

5.1.5 Adaptive mutation operator

Variation is a simulation of gene mutations in biological evolution. In this paper, the following adaptive mutations are used: $P_{m} = {\begin{matrix} P_{m 1} - P_{m 2} \frac{f_{max} - f}{f_{max} - f_{avg}} & f^{'} ⩾ f_{avg} \\ P_{m 1} & f^{'} < f_{avg} \end{matrix}$ (24)

5.1.6 Simulated annealing crossover and mutation operation

The basic idea of simulated annealing algorithm: from the point of view of statistical physics, with the decrease of temperature, the energy of matter will gradually approach a lower state, and finally reach a certain level, that is: $P = {\begin{matrix} 1 & f_{i + 1} ⩾ f_{i} \\ exp (\frac{k (f_{i + 1} - f_{i})}{T}) & f_{i + 1} < f_{i} \end{matrix}$ (25)

5.1.7 Extraction and evaluation of rules

If the average adaptation of nearby generations is lower than a certain level, the flow algorithm of the above rules can improve the simulated annealing gene algorithm in Fig. 5: see Fig. 5 for details.

Fig.5

Flowchart of association rules of simulated annealing genetic algorithm.

5.2 Empirical analysis

In the actual coding process, the coding of each attribute is added “0”, indicating that ownership is independent of other attributes (the user does not have to worry about applying random production standards to genetic algorithms).

Table 3 attribute values have been converted to numeric values and appropriate attributes have been selected as needed, and the results of the database tables have been plotted against the above ratios, as follows: see Table 3, and Figs 6–9 shows attribute value mapping results, Breakdown1, Breakdown2 and Breakdown3.

Table 3
Attribute value mapping results

Month Temp RH Wind Rain Area

3 4 2 2 1 4

3 4 2 2 1 4

3 3 3 2 1 4

3 2 2 1 1 4

3 2 2 1 1 4

3 2 2 2 1 4

3 4 2 2 1 4

3 3 2 2 1 4

3 2 1 2 1 4

3 3 2 2 1 4

3 2 2 1 1 4

3 4 2 2 1 3

3 4 1 2 1 3

3 4 2 2 1 3

3 2 2 2 1 3

3 2 2 2 1 3

3 3 2 2 1 3

2 3 2 2 1 3

3 4 2 2 1 3

2 2 2 4 1 3

3 3 2 2 1 3

3 3 2 1 1 3

3 3 3 2 1 3

1 2 3 1 1 3

3 4 2 2 1 3

4 2 2 2 1 3

3 3 2 2 1 3

3 2 3 3 1 3

3 4 2 2 1 3

3 3 3 1 1 3

... ... ... ... ... ...

Month	Temp	RH	Wind	Rain	Area
3	4	2	2	1	4
3	4	2	2	1	4
3	3	3	2	1	4
3	2	2	1	1	4
3	2	2	1	1	4
3	2	2	2	1	4
3	4	2	2	1	4
3	3	2	2	1	4
3	2	1	2	1	4
3	3	2	2	1	4
3	2	2	1	1	4
3	4	2	2	1	3
3	4	1	2	1	3
3	4	2	2	1	3
3	2	2	2	1	3
3	2	2	2	1	3
3	3	2	2	1	3
2	3	2	2	1	3
3	4	2	2	1	3
2	2	2	4	1	3
3	3	2	2	1	3
3	3	2	1	1	3
3	3	3	2	1	3
1	2	3	1	1	3
3	4	2	2	1	3
4	2	2	2	1	3
3	3	2	2	1	3
3	2	3	3	1	3
3	4	2	2	1	3
3	3	3	1	1	3
...	...	...	...	...	...

Fig.6

Attribute value mapping results.

Fig.7

Breakdown1.

Fig.8

Breakdown2.

Fig.9

Breakdown3.

Coding through real sets facilitates the operation of gene chromosomes. Tables 6 to 9 show the correspondence between populations see Table 4 for details.

Table 4

Table of correspondence between arrays and attributes

A[1]	A[2]	A[3]	A[4]	A[5]	A[6]
month	temp	RH	wind	rain	Area

According to the algorithm described above, the association rules are excavated as follows: see Table 5 for details.

Table 5

Selected association rules mined

Rule code	Parameters
002010	61.7% support; 100% confidence; 1.01 interest
320011	16.4% support; 51% confidence; 1.07 interest
332210	13.7% support; 51% confidence; 1.65 interest
002122	11.9% support; 54% confidence; 0.94 interest
320210	17.9% support; 56% confidence; 0.97 interest
300013	11.6% support; 98% confidence; 1.00 interest
302110	17.8% support; 84% confidence; 1.17 interest
... ...	... ...

6 Conclusion

This paper introduces the basic theory and development of data extraction more systematically, summarizes the methods, tools and techniques used in data acquisition, describes the data extraction techniques in connection rules more completely, classifies the mining techniques of connection rules, and introduces the steps of the top-down algorithm of traditional connection rules in detail.

Footnotes

Acknowledgment

Inner Mongolia Science and Technology Agency: BeefNet-Construction of cloud platform for precision breeding and breeding system of beef cattle (NO: 2019GG350); Project Supported by Basic Foundation of Inner Mongolia Agricultural University (NO: JC2013001).

References

Rafferty

, et al., Automatic summarization of activities depicted in instructional videos by use of speech analysis, In: Pecchia L et al (eds.) Ambient assisted living and daily activities. Lecture notes in computer science. Springer, New York 35(8) (2014), 123–130 .

Rafferty

, et al., NFC based provisioning of instructional videos to assist with instrumental activities of daily living, In: 2014 36th annual international conference of the IEEE engineering in medicine and biology society, EMBC 56(8) (2014), 4131–4134.

Rafferty

, Chen

, et al., Goal lifecycles and ontological models for intention based assistive living within smart environments, Comput Syst Sci Eng 30(1) (2015), 7–18.

Rafferty

, Nugent

, et al., Automatic metadata generation through analysis of narration within instructional videos, J Med Syst 39(9) (2015), 1–7.

Shabani

A.H.

, Zelek

J.S.

and Clausi

D.A.

, Multiple scale-specific representations for improved human action recognition, Pattern Recognit Lett 34(15) (2013), 1771–1779.

Yang

and Meinel

, Content based lecture video retrieval using speech and video text information, IEEE Trans Learn Technol 7(2) (2014), 142–154.

Ababneh

J.I.

and Bataineh

M.H.

, Linear phase FIR filter design using p swarm optimization and genetic algorithms, Digital Signal Process 18(4) (2008), 657–668.

Aziz

M.A.E.

, Ewees

A.A.

and Hassanien

A.E.

, Multi-objective whale optimization algorithm for content-based image retrieval, Multimed Tools 77(4) (2018), 26135–26172.

Aziz

M.A.E.

and Hassanien

A.E.

, Modified cuckoo search algorithm with rough sets for feature selection, Neural Comput 29(4) (2018), 925–934.

10.

Boqing

, Wang

, Liu

and Tang

, Automatic facial expression recognition on a single 3D face by exploring shape deformation, In: Proc. 17th ACM Int. Conf. Multimed 58(6) (2009), 569–572.

11.

Buciu

, Kotropoulos

and Pitas

, ICA and gabor representation for facial expression recognition, In: Proceedings International Conference on Image Processing 89(5) (2003), 855–858.

12.

Chang

H.T.Y.

, Facial expression recognition using a combination of multiple facial features and support vector machine, Soft Comput 22(2) (2017), 4389–4405.

13.

Chu

S.C.

, Tsai

P.W.

and Pan

J.S.

, Cat Swarm Optimization, LNAI 3(1) (2006), 854–858.

14.

Cossetin

M.J.

, Nievola

J.C.

, Koerich

A.L.

, Facial expression recognition using a pairwise feature selection and classification approach Neural networks (IJCNN). In: 2016 international joint conference on, IEEE 63(1) (2016), 5149–5155.

15.

Ewees

A.A.

, EL Aziz

M.A.

and Hassanien

A.E.

, Chaotic multi-verse optimizer-based feature selection, Neural Comput Appl 31(4) (2019), 991–1006.

16.

Fan

and Tjahjad

, A dynamic framework based on local zernike moment and motion history image for facial expression recognition, Pattern Recognit 64(9) (2017), 399–406.

17.

Fuentes

, Herskovic

, Rodríguez

, et al., A systematic literature review about technologies for self-reporting emotional information, J Ambient Intell Human Comput 8(3) (2017), 593–606.

18.

Gross

, Matthews

, Cohn

, Kanade

and Baker

, Multi-PIE. In: 8th IEEE International Conference on automatic face & gesture recognition, Amsterdam 46(2) (2008), 1–8.

19.

Happy

S.L.

, Member

and Routray

, Automatic facial expression recognition using features of salient facial patches, IEEE Trans Affect Comput 6(4) (2015), 1–12.

20.

Sikkandar

and Thiyagarajan

, Soft biometrics-based face image retrieval using improved grey wolf optimization, IET Image Process 14(3) (2020), 451–461.

21.

Kanan

H.R.

, Faez

, Hosseinzadeh

, Face recognition system using ant colony optimization-based selected features, In: Proceedings of the 2007 IEEE symposium on computational intelligence in security and defense applications (CISDA 2007) IEEE 62(5) (2007), 57–62.

22.

Karaboga

, A new design method based on artificial bee colony algorithm for digital IIR filters, J Frankl Inst 346(4) (2009), 328–348.

23.

Kazemi

and Sullivan

, One millisecond face alignment with an ensemble of regression trees, In: 2014 IEEE conference on computer vision and pattern recognition 43(9) (2014), 1867–1874.

24.

Krizhevsky

, Sutskever

and Hinton

G.E.

, Image net classification with deep convolutional neural networks, Adv Neural Inf Process Syst 82(6) (2012), 1097–1105.

25.

Pappula

and Ghosh

, Cat swarm optimization with normal mutation for fast convergence of multimodal functions, Appl Soft Comput 66(2) (2018), 473–491.

26.

LeCun

, Bengio

and Hinton

, Deep learning, Nature 521(3) (2015), 436–444.

27.

Lucey

, Cohn

J.F.

, Kanade

, Saragih

, Ambadar

and Matthews

, The extended Cohn–Kanade Dataset (CK+): a complete dataset for action unit and emotion-specified expression, IEEE Comput Soc Conf Comput Vision Pattern Recogn 26(7) (2010), 1325–1338.

28.

Lyons

, Kamachi

and Gyoba

, The Japanese Female Facial Expression (JAFFE) Database, Zenodo 10(5) (1998), 235–249.

29.

Mehrabian

, Communication without words, Psychol Today 2(4) (1968), 53–56.

30.

Minsky

and Papert

, Perceptrons: an introduction to computational geometry, MIT Press, Cambridge 78(3) (1969), 780–782.

Research on data mining system based on artificial intelligence and improved genetic algorithm

Abstract

Keywords

1 Introduction

2 Related work

3 Basic analysis of traditional algorithm

3.1 Analysis of common algorithm for data mining

3.1.1 An overview of association rule algorithm

Table 2 Comparison of similarities Combination optimization problem Solid Solution Particle state Optimal solution The lowest energy state Set initial temperature Melting process Metropolis sampling process Isothermal process Decline in control parameters Cooling Objective function Energy

4.1 Ideas for genetic algorithm improvement

4.2 Operation of simulated annealing genetic algorithm

5 Application of simulated annealing genetic algorithm in association rule mining

5.1 Application steps

5.1.1 Coding

5.1.2 Design of fitness function

Footnotes

Acknowledgment

References

Table 2
Comparison of similarities

Combination optimization problem Solid

Solution Particle state

Optimal solution The lowest energy state

Set initial temperature Melting process

Metropolis sampling process Isothermal process

Decline in control parameters Cooling

Objective function Energy