Abstract
With the advent of green economy, it is of great significance to objectively calculate the green innovation efficiency of provincial industrial enterprises in China for achieving sustainable economic development. In this paper, the author analyzes the regional technological innovation and green economic efficiency based on DEA model and fuzzy evaluation. Based on the latest development of traditional efficiency and productivity analysis theory, this study calculates the green innovation efficiency of 30 provincial industrial enterprises by using SBM model, while considering the relaxation of economic input-output problem. The results show that the SBM model improves the accuracy and authenticity of the economic efficiency evaluation of green innovation. The efficiency of green economy in most provinces is on the rise. At the same time, the intensity of R&D and industrial structure play a positive role in improving the efficiency of green economy. Through cluster analysis, the differences and causes of green economic efficiency of regional industrial enterprises are analyzed. In addition, the provinces should also consider the factors affecting the green economic efficiency of industrial enterprises, implement the innovation-driven development strategy in an all-round way, and promote the development of green economy.
Introduction
With the development of reform, China has acquired a significant growth in economy and society. However, the development approach of high input, low output and high pollution is a serious constraint on the China’s social sustainability [1]. In addition, the world has been paying more attention on global warming and carbon emission problem. China is facing the shortage of domestic resource and serious environmental pollution [2]. Therefore, China must take into account the impact of environmental pollutions in the rapid development [3, 4].
The construction of an innovative country has become the inevitable choice for China to the road of healthy and sustainable development [5]. At present, as more attention to ecological environment problems given by the world, scholars started to study the green innovation efficiency of various industries. The industry is gradually becoming the significant force in promoting the development of China’s economy and society [6, 7]. With the tide of green economy comes, objectively calculating the green innovation efficiency of China’s provincial industrial enterprises has a vast importance to realizing the sustainable economic development.
The green innovation efficiency, as the important part of green innovation, is mainly studied by Data Envelopment Analysis (DEA). The DEA method was initially raised by Charnes et al. (1978), who used mathematical programming model to evaluate the object and to assess the relative efficiency of a decision-making unit (DMU) containing multiple-input and multiple-output. The DEA method promotes the development of production function theory and efficiency theory [8].
Related work
Scholars have used the DEA efficiency evaluation model to achieve a lot of related research results. Han (2012) used DEA method and Tobit regression method to estimate the green innovation efficiency and its influencing factors [9]. Feng (2013) analyzed the industries enterprises of 30 provincial regions and 8 economic zones by using the DEA-SBM method. Zhang (2015) used SE-DEA Model and Malmquist index model to analyze and compared 30 provinces’ green innovation efficiency in China. In addition, other methods are used by some scholars to evaluate the green innovation efficiency. Hua (2011) used the factor method to study the green innovation efficiency of Northeast in China. On the one hand, the relaxation of input-output and the undesirable output were rarely considered in previous researches. On the other hand, the traditional DEA model assumes that all outputs are desirable outputs, and ignores that some outputs have characteristics of reducing output to improve efficiency, which are called undesirable outputs [10–13].
The above researches on innovation activities rarely consider environmental factors in exploring innovation efficiency [14]. The researches on regional differences of green innovation efficiency of industrial enterprises are obviously insufficient. Moreover, this ignores the inter-regional correlation and spatial effects. Therefore, this study will expand the previous studies through different ways [15–17]. First, it selects data from 2009 to 2013. 30 provinces, municipalities and autonomous regions of China are studies in this study. Based on the latest development of traditional efficiency and productivity analysis theory, this study calculates the green innovation efficiencies of industrial enterprises of China’s 30 provinces [18], municipalities and autonomous regions by using the SBM model considering relaxation of input-output problem to reflect the concept of sustainable development from the perspective of green innovation, and compare with the traditional innovation efficiencies calculated by the DEA model without considering the undesirable output [19]. Second, it is of great theoretical and practical significance to analyze the differences and causes of the green innovation efficiency of industrial enterprises in the regions [20, 21], so this study uses the cluster analysis to classify the green innovation efficiencies of 30 provinces, municipalities and autonomous regions in China. Third, as China’s innovation-driven development policy deeps, the spatial mobility of regional innovation elements are enhanced [22]. The importance of spatial effect is gradually significant. Therefore, the spatial effect of green innovation efficiency of Chinese provincial industrial enterprises is measured from the perspective of green GDP [23]. This research uses the spatial econometric models to evaluate the spatial effect of the green innovation efficiency of Chinese provincial industrial enterprises. It will be beneficial to improve the green innovation efficiencies of China’s regional industrial enterprises and promote regional ecological civilization construction.
Theoretical analysis
Similarity-based link prediction method
The problem of link prediction in social networks was firstly modeled by the Markov chain, which is a guide for the study of link prediction. The methods based on node similarity indicators include indicators based path, indicators based on common neighbor information, and indicators based on random walks [24].
(1) Indicator based path
The probability of a link between nodes in a network can be calculated using the length of the path between nodes.
The smaller the path length between two nodes, the greater the similarity of nodes. The formula for calculating the similarity of nodes is as follows:
Compared with the shortest path algorithm, the Katz algorithm takes more comprehensive consideration of the path information of the node in the network and more network topology information. Moreover, the algorithm weights all paths between two nodes in the network. The Katz algorithm exponentially decays path calculations of different lengths, and the node similarity score is calculated as follows:
Among them, |paths (x, y, 1) | represents a set of all paths of length 1 between nodes x and y. β (0 ≤ β ≤ 1) is a parameter that controls the influence of the path length on the node similarity.
The HN2 algorithm is the Leicht-Holme-Newman Index algorithm, which is a variation of the Katz algorithm [25]. The idea of the algorithm is whether the two nodes are similar and whether they are closely related to their neighbor nodes. The algorithm uses the idea of recursion, and the similarity calculation formula of nodes is shown in Equation (3):
The local path similarity algorithm is abbreviated as LPA, and the algorithm idea is very simple, that is, the node of the target node path length 1 and the node of the target node path length 2 are counted. The similarity calculation formula of the node is shown in the formula (4):
In the formula, A is a path matrix describing the network, and ɛ is a parameter ranging from – 1 to 1. The different exponential forms of A represent the number of different paths whose length is an exponent. The local path algorithm is a local algorithm and its time complexity is low.
(2) Indicators based on common neighbor information
The co-neighbor is the pair of nodes x, y. The remaining nodes in the network are sorted in descending order according to the number of their co-neighbors. The more the number of common neighbors, the greater the probability of having edges between the two nodes. τ (x) , τ (y) are used to indicate the degrees of nodes x and y, respectively. The similarity is calculated according to the following formula [24]:
The main idea of the Jaccard algorithm is to use the ratio of the intersection and union of neighbor nodes between nodes x and y. This ratio is a score for the similarity between two nodes. The calculation formula of the similarity score of the algorithm node is as follows:
The Adamic-Adar algorithm is mainly to calculate the similarity between two web pages. When calculating the similarity of web pages, the algorithm first finds all the public keywords in the two web pages, then sums the importance of all the keywords, and calculates the weight value of the public keywords. The word frequency of the keyword is F, and the importance of the keyword is W, then W is inversely proportional to 1/ - F. The common neighbor of the node is C, and the degree of the node is D. According to the algorithm, the formula for calculating the similarity score is obtained:
The formula for calculating the similarity score in the RA algorithm is as follows:
Formula (8) is extended, and after considering more information, the improved formula is:
(3) Indicator based random walk
The Random Walk algorithm is based on random walks to define similarity. Its basic assumptions are: The similarity between the starting node and the target node is determined by the average number of steps from the starting node to the target node.
The link prediction algorithm based on random walk has methods such as Average Commute Time, Random Walk with Restart, Local Random Walk and SimRank algorithm [25].
Assuming that m (x, y) is the average number of steps a walker needs to go from node x to node y, the ACTs of nodes x and y are defined as:

Particle Swarm Optimization.
The basic idea of the random walk algorithm with restart algorithm is to assume that when the random rambler takes a step, he returns to the initial position with a certain probability. The formula for calculating the similarity score between nodes is as follows:
Local random walk algorithm LRW only considers random walks with finite steps. In the algorithm, a walker starts to walk from node x at time t. Assuming that the probability that the walker just reaches node y at time t + 1 is π
xy
(t), then the probability formula for the model to stabilize is: π
x
(t + 1) = p
T
π
x
(t) , t ≥ 0. The formula for calculating the similarity score is as follows:
Algorithm introduction
Path Sorting Algorithm PRA is an algorithm for random walks on graphs proposed by Lao and Cohen. The algorithm is mainly applied to knowledge reasoning and link prediction in the knowledge base, and it is also applicable to the problem of link prediction in the small knowledge base of the student online learning system. The PRA algorithm is similar to the remote monitoring method. The set of entities connected by path relationship p, and the PRA performs a random walk on the graph, starting from all source nodes. The path to the target node is successful, and the quality of these paths can be determined by measuring their support and precision as in the association rule mining. The path predicted by the PRA algorithm link can be treated as a rule. Because multiple rules or paths may be applicable to any given pair of entities, experiments can fuse these multiple rules through a binary classifier (logically implemented). In the PRA algorithm, the eigenvalues are the probability values of these different paths from the source node to the target node.
The PRA algorithm sorts the Node y associated with the query node x. PRA The algorithm begins with a list of path types that enumerate a large number of length limit markers, which are considered “expert” rankings. Each of the random walks on the graph is performed according to the type constraint of the edge type, and the result nodes y are sorted according to the weight values in the result distribution. Finally, the PRA algorithm combines these “experts” with logistic regression.
Algorithm solving
The path sorting algorithm PRA is specifically described as: the relationship path p is defined as a sequence of relationships (R1, R2, ⋯ , R i ).
Moreover, in order to highlight the type of each step, p can also be described as:
Among them, T
i
is defined as
Nodes can be connected to each other through different types of relationships, which can be represented by paths:
For any one of the relationship paths p = (R1,R2, ⋯ , R
i
), one seed node s ∈ domain (p) is selected. A random walk of a path constraint is recursively defined as a distribution h
s
, p. If p is an empty path, then
If p = R1, R2, ⋯ , R i is non-null, then
p′ = R1, R2, ⋯ , Ri-1, h
s
, p is defined as follows:
Among them,
The algorithm uses the gradient descent method to calculate the path weight parameter θ, and it gives a set R and a set of node pairs { (s
i
, t
i
)}. At the same time, the experiment can construct a training data set D ={ (x
i
, r
i
) }. Among them, x
i
is a path eigenvalues vector of all node pairs (s
i
, t
i
), the j-th component of x
i
is R (s
i
, t
i
), r
i
indicates whether it is a true parameter, and θ is estimated using the maximized value of the regularized objective function below:
The λ1 control L1 regularization in Equation (17) and help structure selection and λ2 controls L2 regularization, which prevents overfitting. w
i
is the importance weight value for each instance, and O
i
(θ) is the objective function of each instance, which is defined as follows:
Among them, p
i
is the predicted correlation value and is defined as follows:

Algorithm calculation process diagram.
This study uses CCR model to measure the traditional innovation efficiencies of China’s industrial enterprises. The CCR model uses linear programming to estimate the production frontier of the decision making units and evaluate the relative efficiency of each decision making unit. It’s assumed that there are n decision making units (DMU). Each DMU has m input variables, n output variables. For a particular decision making unit, the efficiency evaluation formula can be expressed as:
It’s assumed that there are n decision making units (DMU). Each DMU has three vectors, including X (input), Y
g
(expected output) and Y
b
(non-expected output), which can be expressed as:
The traditional SBM model considering the undesirable outputs is as follows:

Carbon emission in china.
Index selection and data description
China’s 30 provinces, municipalities and autonomous regions are collected as the object excluding Tibet. In addition, because of the time lag between innovation input and innovation output, the time lag between input and output is set as 1. Based on the connotation of green innovation, this study selected two input factors including labor and financial resources. Therefore, the amount of research specialist staff and research funds are used to measure the labor and financial resources of innovation activities. On the other hand, there are expected output and non-expected output in the production process. In terms of desirable output index, the number of patent applications of industrial enterprises can reflect the true level of innovation to a great extent. So this study selects the number of patent applications of industrial enterprises to measure the desirable output. In terms of undesirable output index, the amount of carbon dioxide emissions, waste water emissions, and solid waste emissions can represent the environmental benefits brought by innovation activities of industrial enterprises in China.
The overall average green innovation efficiency analysis in the two cases
According to Tables 1, 2, in the two cases (with and without non-expected output), the overall average efficiency of innovation activities of Chinese provincial industrial enterprises present a steady upward trend. The traditional innovation’ average efficiency of Chinese provincial industrial enterprises, calculated by the DEA method, increases from 0.564 in 2009-0.694 in 2013. The average efficiency of green innovation activities of Chinese provincial industrial enterprises considering non-expected output increases from 0.504 in 2009-0.622 in 2013, which are calculated by the SBM method. In addition, from the overall point of view, the traditional innovation efficiency without considering the undesirable output is higher than the green innovation efficiency containing the undesirable output, but the innovation efficiencies in both cases need to be improved. This means that environment pollution has an impact on the innovation efficiency. Therefore, the SBM model considering undesirable output contributes to increase the accuracy and authenticity in evaluating efficiency.
Innovation efficiency without considering the undesirable output based on the DEA model
Innovation efficiency without considering the undesirable output based on the DEA model
The green innovation efficiency considering the undesirable output based on the SBM model
Table 1 shows that the average traditional innovation efficiencies of China’s provincial industrial enterprises without considering the undesirable outputs in Beijing, Tianjin, Jilin, Heilongjiang, Shanghai, Jiangsu, and Zhejiang are higher than 0.800 during the sample period. Table 2 demonstrates that from 2009 to 2013 the average green innovation efficiencies of Chinese industrial enterprises consideringthe undesirable outputs in Beijing, Tianjin, Jilin, Shanghai, Jiangsu, Zhejiang, and Guangdong also exceed 0.800. Figure 5 demonstrates that the above provinces (municipalities and autonomous regions) with high green innovation efficiency in both cases are mostly in the eastern coastal areas. The average green innovation efficiency in Guizhou, Shanxi, Qinghai, Ningxia and Xinjiang are all lower than 0.400 in two cases, which are almost in the western remote areas.
Figure 4 compares the provincial green innovation efficiencies in the two cases. The majority of China’s provincial traditional innovation efficiencies containing non-expected output are lower than the green innovation efficiencies without considering the non-expected output. However, the average green innovation efficiencies of China’s industrial enterprises considering the undesirable outputs in Guangdong and Ningxia are higher than the traditional innovation efficiencies without considering the non-expected outputs.

The change tendency of green innovation efficiencies of China’s industrial enterprises based on the SBM model.

The change tendency of green innovation efficiencies.
In the average innovation efficiencies of industrial enterprises of 30 provinces, municipalities and autonomous regions in China from 2009 to 2013, the highest efficiency is 0.924, while the lowest efficiency is 0.226, and the variance is 0.093. It indicates that provincial green innovation efficiencies have a certain gap. In order to further analyze the regional differences and causes of green innovation efficiencies in different provinces, this study systematically classifies provincial green innovation efficiencies by SPSS 22, and divides them into the first, second and third groups according to the level of efficiencies. The results can be seen in Table 3.
Cluster analysis of green innovation efficiencies of provincial industrial enterprises
Cluster analysis of green innovation efficiencies of provincial industrial enterprises
According to Table 3, the average efficiency of green innovation in the first group is 0.904, the second group is 0.613, and the third group is 0.354. The first group is mainly the provinces in the eastern coastal regions including Beijing, Tianjin and Shanghai et al., which have higher efficiencies and better economic development. The second and third groups are mainly the provinces in the central and western regions. The low efficiency of these areas is mainly due to the less green innovation output and serious environmental pollution.
This study used spatial econometric models to measure the spatial effect of the green innovation efficiency of China’s industrial enterprises. According to the regression analysis results in Table 3, the spatial autoregressive coefficient passes the significance test of 1%, which shows that the green innovation efficiencies of Chinese provincial industrial enterprises have spatial effect. The spatial regression coefficient is 0.245, while the spatial error coefficient is -0.133. It reflects that the degree of spatial spillover of green innovation efficiency by the adjacent area is strong than the error impact of green innovation efficiency by the adjacent area. According to the comparison of the two spatial econometric models, the spatial lag model and spatial error model result in great similarity.
According to Table 4, R&D intensity, industrial structure, government support, and laborer’s quality have a significant effect on the green innovation efficiency. Meanwhile, the R2 of original and lagged variables are more than 0.85 and the likelihood ratio is more than 200, which indicates that the regression analysis of spatial econometric models are better.
The regression analysis of spatial econometric models
The regression analysis of spatial econometric models
Based on the SBM model containing the relaxation of input-output, this study calculates the green innovation efficiencies of industrial enterprises of 30 Chinese provinces (municipalities, autonomous regions) from 2009 to 2013, and compares with the traditional innovation efficiencies measured by the DEA model without considering the non-expected outputs. The results indicate that the overall green innovation efficiency containing the non-expected outputs declines significantly compared with the traditional innovation efficiency without considering the undesirable outputs. This suggests that the environmental factors should be considered in the efficiency evaluation and the SBM model promotes the accuracy and authenticity of the green innovation efficiency evaluation. The research results show the green innovation efficiencies of the majority of provinces in China maintain an upward trend from 2009 to 2013.
This study analyzes the differences and causes of the green innovation efficiency of industrial enterprises in the regions by the cluster analysis, and divides 30 provinces, municipalities and autonomous regions into three groups according to the efficiency level. The first group is mainly the provinces in the eastern coastal region, the second and third group are mainly the provinces in the central and western regions.
The regression analysis of spatial econometric models shows there is spatial spillover effect in the provincial green innovation efficiencies of China’s industrial enterprises. R&D intensity, industrial structure, and laborer’s quality have a positive role in promoting the green innovation efficiency, while government support has negative effects. According to the above conclusions, this study presents some suggestions to promote the green innovation efficiency of Chinese industrial enterprises:
Innovation-driven development strategy should be fully carried out. The momentum of the long-term development of the world economy comes from innovation, so Chinese industrial enterprises must adhere to the innovation-driven development strategy, which should also focus on domestic and foreign advanced technologies and independent innovation. In the green innovation activities, we should consider resource, economy and environment at the same time, and it’s necessary to reduce non-renewable energy investment and focus on the quality of innovation output and environmental pollution. The spatial effect of green innovation should be fully used. While exerting their own advantages in technology and location and improving green innovation efficiencies, eastern regions should actively promote the central and western regions to the direction of low input, high output, and low pollution. The central and western regions need to rely on their advantages in production, industrial structure adjustment, and policy support, and actively make use of the spatial spillover effect of green innovation efficiency, furthermore, they should actively cooperate with the eastern regions in terms of technologies, management, and personnel training et al., and strive to develop local industries. In addition, all provinces should consider the analysis on the influencing factors of green innovation efficiency of industrial enterprises so as to create a favorable environment for enhancing the green innovation efficiency in industrial enterprises according to their actual situations. Industrial enterprises should play a central role in the green innovation system. Industrial enterprises should introduce advanced equipments and excellent personnel, and learn advanced management methods from home and abroad. It promotes the utilization efficiency and configuration capability of innovation resources. In addition, enterprises should strengthen further cooperation with universities, research institutes, and governments to create a green innovation system including governments, enterprises, universities, and research institutes.
