Abstract
Cities are the products of human social and economic development, and urbanization is the inevitable result of productivity improvement and social progress. Urban green space network is the only living infrastructure in the city, which provides essential space for the protection of urban biodiversity. Therefore, this paper proposes a heuristic decision tree model to construct the ecological urban green space network. The construction of ecological urban green space ecological network is studied based on the Geographical Information Systems (GIS) concepts. GIS and data mining technology are combined, to build the proposed decision tree model with monitoring and evaluation as the core, exploring its texture scale and characteristics, with green space types classified. Finally, the experimental results show that the proposed model can classify the types of green space in the natural geographical environment and play a decisive role in guiding the judgment of environmental monitoring and assessment. Therefore, the proposed algorithm has a positive effect on the monitoring and evaluation of the natural environment.
Introduction
With the rapid development of the economy, the number and scale of cities have been increasing and expanding, the industrial scale and output value have been growing, and the living standard of residents has also been improving remarkably [1]. However, the people in the feeling of urbanization brings the rich material and spiritual life at the same time, having to face the increasingly severe ecological, densely populated, traffic congestion, environmental pollution, green land scarcity, which has become a limiting factor for the sustainable development of city [2, 21–24, 2, 21–24].
This paper proposes a heuristic decision tree algorithm for ecological urban green space network construction. The construction of ecological urban green space ecological network is studied based on GIS concepts. GIS and data mining technology are combined, to build the proposed decision tree model with monitoring and evaluation as the core, exploring its texture scale and characteristics, with green space types classified.
The proposed decision tree classification method based on spectral and texture features can significantly improve the accuracy of a single ground class, the overall accuracy, and the kappa coefficients compared with the results obtained from simple spectral data.
This paper consists of five parts. In the first part, the background and significance of this paper are introduced. In the second part, the previous literature is reviewed, which lays a theoretical foundation for the writing of this article. In the third part, the research methods are discussed, including the fuzzy decision tree algorithm, judgment structure pruning algorithm, etc., and the decision tree model is constructed. In the fourth part, the constructed decision tree model is verified, and the evaluation results are obtained. The full text is summarized in the fifth part.
Related work
Since the 1990 s, ecological networks have been the concerned hotspots in the fields of ecological protection, landscape ecology, urban planning, and design. The interdisciplinary understanding of ecological networks has converged, and the network has gradually realized the functions of ecological leisure, aesthetics, and landscape [6]. However, the planning and practice of China’s ecological network started late, and the theoretical experiment on urban green space structure and the ecological network is still in its infancy, and its practical application is extremely scarce [7]. The lack of spatial structure ecological network makes it difficult to exert the ecological function of green space fully, and the optimization of the ecological role of green space is difficult to achieve on paper [8]. Therefore, optimizing the green space ecological network is the key to improving the efficiency of the urban green space system [9]. This theme is based on the national natural fund project: “based on high resolution remote sensing image evaluation index optimization study of ecological garden city” and “11th five-year plan” national science and technology support plan: “urban green space and ecological construction control key technology research and demonstration” of “building green ecological function optimization based on ecological network is the key technology research” as the background [10]. As part of the research project, the urban green space ecology network building technology and the problems existing in the evaluation system are aimed. Besides, GIS technology is discussed from the decision tree, and in the construction of green landscape ecology network theory and graph theory as guidance, to apply to the field of green ecological architecture and evaluation of the network [11–13].
Decision tree, bayesian model, and pruning algorithm
Fuzzy decision tree algorithm
With the general application of fuzzy set theory in decision tree algorithm, many excellent algorithms have been proposed successively [14]. Two representative fuzzy decision tree algorithms are introduced in this section. Regardless of which decision tree algorithm, the resulting decision tree structure is roughly similar. The node is composed of attribute names, and the edge is composed of a fuzzy subset of attribute values after fuzzy processing. The heuristic decision tree algorithm based on Min-Ambiguity is introduced, referred to as Min-Ambiguity algorithm. The method classifies uncertainty as a basis for the split attribute selection, which is closer to people’s way of thinking, having high intelligibility [15, 16]. In the definition, for each optional attribute value
For non-leaf node S, for each optional attribute value
Where
Where p
i
is consistent with the meaning stated in the definition. The Min-Ambiguity decision tree algorithm selects the attribute with the minimum average classification uncertainty each time as the split attribute [17]. When data is preprocessed, significant horizontal α is applied to filter the data, and the true degree parameter β is applied to determine the leaf node [18]. The specific process of this algorithm is the same as that of FuzzyID3 algorithm except that the criteria for selecting split attributes are different [19]. The most prominent feature of the FuzzyID3CART algorithm is that it will build a binary tree. The internal nodes of each model have two left and right child nodes [20]. This is because the algorithm divides the idea of recursive construction, so when the algorithm generates branches, it no longer only divides the data set with different attribute values, but first finds the best binary division of each attribute, and then responds to it. The attributes are divided to produce new branches. The algorithm uses the Gini coefficient as the criterion for the selection of splitting attributes. The Gini coefficient can be used to judge the content of impurities in a data set. It is also an indicator of precise disorder measurement. The smaller the Gini coefficient is, the more the data set is a sequence. Assuming that the random variable X has a total of {x1, x2, …, x
n
} values, the probability of occurrence of each value is {p1, p2, …, p
n
}, and the Gini index of X is:
For the data set is X, the total is divided into n categories. The value of the category C is {c1, c2, …, c
n
}, the value of the attribute A is {a1, a2, …, a
m
}, and S is the division of the w element of A. If A is used as a split attribute, then the conditional Gini index of the system is defined as:
In the general case (5), w = 2. It can be seen from the formula of the Gini index and the formula of information entropy that the structures of the two formulas are similar, and at the same time they are the basis for the degree of disorder of the data set, which serves as the criterion for the splitting attribute in the two decision tree algorithms. It is easy to conclude that both samples have the highest purity when P is 0 or 1, and the sample is most disordered when taken at 0.5. The process of selecting the splitting attribute of the CART algorithm is first to obtain the binary partitioning of the data set by using all possible partition values of the respective variables. Assuming that attribute A has a total of three values of {a1, a2, a3}, attribute A can have a division of three different cases of {{a1, a2} {a3}} , {{a1, a3} {a2}} , {{a1} , {a2, a3}}. Then calculate the Gini index of different divisions separately, select the division with the minimum condition Gini index as the division of A, and take the value of the Gini index as the conditional Gini index of the attribute A at this time. Finally, the attribute with the smallest condition Gini index among all decision attributes is selected as the split attribute. And the sample space is divided by the corresponding attribute division.
The estimation and application of the MF-VAR model are generally based on the state space model to include the time-dependent total variables in the model, and the Kalman filter method is used for recursive estimation. However, it is easy to construct, but maximizing the likelihood function is more difficult. Therefore, using the Bayesian method is used to perform MCMC sampling from the posterior distribution of the mixing VAR parameters, applied to the data with random mixing or time interval, and using the default method in Bayesian measurement. Specifically, the BMF algorithm obtains a corresponding substitute value from the missing data and unknown parameters in the model by constructing a Gibbs sampler. An estimate of the missing data is obtained from the Gaussian conditional distribution under the assumption of the normal distribution exogenous impact. The parameter values of the model are obtained from Gaussian and inverse Wishart (starting now referred to as IW) conditional posterior distribution. A simple first order mixing vector autoregressive model can be expressed as:
Where ɛ
t
∼ N (0, ∑), y
t
= (x
t
, z
t
), assuming x
t
is a high-frequency variable with N
x
that can be fully observed. z
t
is N
z
observable low-frequency variables, so equation (6) can be regarded as a first order mixed VAR model. In macroeconomic applications, x is often used to represent monthly variables, & is used to represent observable quarterly variables, and quarterly data is treated as regular default monthly data. At this time, the default value of the quarterly data in equation (6) has
Because there are many types of variables, the contribution to the results is different. Therefore, before constructing the judgment structure, the method of “oriented attribute specification” is adopted. The data is stipulated according to the attributes. The statistic method adopts the closed value control of relevance, that is, the correlation analysis is performed on each attribute, and the attribute whose correlation degree is less than the previously specified final value is eliminated, thus reducing the repetition of the subtree and simplifying the judgment structure. Secondly, some discrete data is quantized. The elements such as grades and ages are converted into numbers of the classification of the mark, and the option elements of the judgment class are converted into Boolean values. After the above processing, the collected data sample set is based on spectral clustering to judge the structure of the structure, and the pre-pruning of the structure is judged. Below, spectral clustering is introduced. Spectral clustering is based on the ID3algorithm. From the knowledge of information theory, it is known that the smaller the expected information, the greater the information gain, and the higher the purity. Therefore, the core idea of the algorithm is to select the information gain metric attribute and select the attribute with the largest information gain after splitting to split. Let’s define a few concepts to use first. Let D be the division of the training tuple by category, then the entropy of D is expressed as:
Where Pi represents the probability that the i time category appears in the entire training tuples, and the number of elements belonging to this class element can be divided by the total number of training element ancestors as an estimate. The actual meaning of entropy is the average amount of information required for the class label of the D-gram ancestor. Assuming that the training element ancestor D is divided according to attribute A, the expected information of A to D division is:
The information gain is the difference between the two. The decision tree algorithm calculates the gain rate of each attribute each time it needs to split, selecting the attribute with the largest gain rate for splitting. Problem with decision tree algorithms is that they are biased towards multi-valued attributes. For example, if there is a unique identifier attribute ID, the decision tree will select it as a split attribute, which makes the division sufficiently pure, but this division is almost useless for classification. The successor algorithm C4.5 of the decision tree uses information gain expansion of the gain rate to try to account for this bias. Spectral clustering first defines “split information,” and its definition can be expressed as:
The meaning of each symbol is the same as other algorithms. Then, the gain rate is defined as:
The algorithm selects the attribute with the largest gain rate as the split attribute, and its specific application is similar to the decision tree, and will not be described again.
Data source and parameter settings
The above decision tree model is run to obtain preliminary classification results of SPOT-5images; then, post-classification processing is performed by using re-encoding, clustering and removal analysis. Get the wetland classification map of the relative area (Fig. 1). The random sampling method is used, reference resolution fusion image data and combined with fusion image field investigation, selecting 30 sample points for each land type, a total of 150 samples are selected. The accuracy evaluation is shown in the figure. To adequately test the above geospatial map acquisition algorithm, it is compared with the algorithm in the literature. Besides, as described in the above section, when the line segment is grown until the closed polygon formed, the number k of line segment growth has a significant influence on the extraction result and efficiency of the algorithm. The k value is too substantial, and the line segment growth forms a lot of unnecessary graphics, which leads to the algorithm running period too long; while the k value is too small, the line segment may not grow to form a room candidate set, resulting in a decrease in the number of algorithm extraction results.

Classification using multiscale texture.
An analysis is performed on the spatial unit where the algorithm extraction failed. It is known whether the algorithm extracts mainly originate from complex polygons, like relatively large entities such as corridors. Combined with the discussion in the previous section, the spatial unit of a complex polygon area is in the application. It is generally necessary to divide it into many smaller convex polygons, that is, even if the complex unit space in which this part of the extraction fails is correctly identified, whose implementation is divided. Individuals who fail to extract this part can take a correction system and divide the system implementation subdivision.
It can be seen from Figs. 1 and 2 that the various types of ground objects are less broken. Among them, the classification effect of the water body, woodland and reed beach is the best. The accuracy of users is more than 90%. Producers of mudflats and mossy beaches have the lowest precision, and the former is easily confused with water bodies and reeds, which are easily confused with reed beaches. Due to the combination of single scale texture information, the progress of the mossy beach and the reed is not enough because the textures of both are relatively small.

Compare the extraction number and accuracy of CSGP algorithms.
Figure 3 is the spectral information of the image. It can be seen that the classification information based on multi-spectrum because it ignores the texture information of the feature, the overall classification accuracy is much worse than the first two. The forest classification is the best, and user accuracy is only 89.29%, hitting 90%. The producer accuracy of the reed land is only 72.97%. Comparing the three classification results, it can be found that the classification accuracy of wetland combined with multi-scale texture information is up to 78.57%, and the kappa coefficient is 0.7558, which is 1.9% higher than the combined single-scale texture information classification, and 6.59% higher than the spectral classification result (Fig. 5). The importance of ecology in the classification of natural spaces is an improvement measure for increasing environmental pollution, and it also raises higher classification requirements for classifiers. An empirical experiment is conducted on the ecological classification of office space based on the theory of residential ecology. It focuses on the multi-objective and dynamic hybrid control problems that are common in various practical optimization problems of office space, thoroughly studying three conventional pseudo-ecological system algorithms—genetic algorithms, ant colony algorithms, principles and characteristics of immune algorithms.

Image spectrum.

Compare the extraction number and accuracy of GLGP algorithms.

Confusion matrix of classification based on DTC combined with multispectral.
According to the above comparative analysis, the decision tree classification method, based on the general features such as spectrum and texture used, has a more significant improvement than the pure spectral data concerning single ground class precision, overall precision or kappa coefficient. Higher precision can meet the requirements for obtaining data accuracy in actual work. By selecting the optimal texture scale combination and using the decision tree to classify the spectral data and multi-scale texture data, the classification accuracy is 78.57%. The classification accuracy of the spectral data classification and the combined single scale texture data is 71.98% and 76.76%.
GIS and data mining techniques are used for the monitoring and evaluation of natural geographic environment. SPOT-5 high-resolution images are utilized to classify wetland land cover, and the panchromatic band is selected as the data source for texture feature calculation; the JM distance of selected samples determines the optimal texture scale corresponding to each wetland type; decision tree is selected. The algorithm performs data mining on the data set composed of remote sensing image spectrum and texture information, with a decision tree model constructed, and high-resolution images classified. The results show that the classification accuracy of high-resolution image classification combined with multi-scale optimal texture information is 78.57%, while the classification accuracy of single spectral data classification and combined single scale texture data are 71.98% and 76.76%, respectively. Multi-scale texture can better describe the texture features of the features, and more effectively solve the isomorphic phenomena in the classification results, which helps to improve the classification accuracy of high-resolution images. Therefore, it can be judged that the algorithm designed can contribute to the judgment of the geographical environment and has a positive effect on the monitoring and evaluation of the natural environment.
Footnotes
Acknowledgments
The study was supported by “National Social Science Fund, China (Grant No. 15BJY059)”, and “Scientific Foundation for the Introduction of Researcher of Hebei College of Industry and Technology, China (BZ2016003)”.
