Study on online demand-supply matching of electric power technology based on technical field feature

Abstract

As the technical demand-supply text information increases rapidly in electric power technology transfer, traditional matching methods could no longer meet the need. To solve this problem, this paper proposed a new technical demand-supply matching method based on electric power technical field feature. First, the improved TF-IDF algorithm was used to extract technical field feature, then scientific and technological achievements with higher relevancy were retrieved based on technical field feature to build a background library of electric power technological achievements, and the matching degree of electric power technological achievements based on word vector was used to verify the matching efficiency of this method. We adopted the China smart grid technical transaction service platform and the text message of electric power technical demand and technological achievements on Keyi net for application, and the results indicate this method has raised the matching efficiency by 23.4% compared to traditional methods. In this paper, a new idea of taking technical field feature as the demand-supply matching basis of electric power technology transfer platform was proposed, which opens a new perspective for the research of demand-supply matching efficiency on electric power technology transfer platform, helps to promote the matching efficiency of technical demand and technological achievements, and further raises the success rate of electric power technology transfer.

Keywords

Technology transfer technical demand-supply matching technical field feature

1. Introduction

With the development of science and technology and the continuous emergence of scientific and technological achievements, technology transfer has become one of the hot topics of science and technology management work in China [1], and a lot of online technology transfer platforms have sprung up. Due to obvious non-structural characteristics of technical demand and technological achievement information as well as too colloquial description of technical demand text, technical demands are very difficult to match with technological achievements, the searching cost has been raised [2], the demand and supply parties are hard to dock with each other, and many potential technology transfer opportunities can’t be discovered by people [3]. Till July 2020, the online technology transaction platform of Keyi net have released a total of 287,764 scientific and technological achievements, including 11,070 signed programs which occupy less than 4% of total scientific and technological achievements. Numerous scientific and technological achievements can’t be transferred for application, and technical needs can’t find corresponding technological achievements they want. If only relying on technical field as retrieval word for demand-supply matching, there will be a huge amount of work to do, significantly increasing the time to match technical demand with supply. Therefore, regarding the non-structural text information of technical demand and technological achievements that is described with natural language, studying the matching model and method of technical demand-supply text has important practical significance to promote the docking of technical demand and supply parties, to guide technical research direction, and to improve success rate of technology transfer. Electric power industry featured by capital-intensive and technology-intensive has played an important role in national economy, and electric power technology transfer becomes a vital driving force to the development of electric power industry. This paper will study the demand-supply matching problem in electric power technology transfer, expecting to increase the transfer efficiency.

The early study of technical demand-supply matching mainly focused on matching preference and matching decision,etc. In 1989, Brownlie et al. [4] discovered the preference relation between technical demand and product research and development, thus put forward the management idea of demand-supply matching; in view of the bilateral matching preference order problem, Liang et al. [5] established four preference satisfaction oriented decision models and finally obtained the optimal demand-supply matching scheme; according to different preferences of demander and supplier in the market, Klerkx et al. [6] proposed the innovative intermediary to meet the need of both parties. For the problems in matching decision, Le [7] transformed intuitive fuzzy set into score and matching matrix and solved this model to prove the feasibility of this matching decision; on basis of the study of intuitive fuzzy set theory, Yang et al. [8] applied this theory in the transformation of scientific and technological achievements, and built the matching decision model based on intuitive fuzzy set, providing an idea for solving the matching problem in the transformation of scientific and technological achievements. Based on the study of matching preference and matching decision, Kong et al. [9] combined the two and provided a matching decision model based on demander’s and supplier’s preference, so as to obtain the most stable matching scheme.

As the data size of technical demand-supply text increases, traditional keyword-based technical demand-supply matching method couldn’t cope with the retrieval information redundancy under large-scale data, thus studying the word frequency and semantic similarity with training corpus becomes a focus of the study of technical demand-supply matching method. Wu et al. [10] proposed a new method based on Wikipedia’s semantic matching to improve matching accuracy. Yang et al. [11] used vector space model and TF-IDF algorithm to calculate the similarity of technical demand-supply text. For better reflecting the relation between keywords and context, the matching methods using positional relation of keywords to determine text similarity emerge. Xie et al. [12] designed a TextRank algorithm based thesis recommendation system, which calculated the cosine vector of keywords for similarity matching; Benedetti et al. [13] raised a knowledge-based inter-document similarity computing technology to effectively calculate inter-document similarity; on the basis of studying TF-IDF and TextRank algorithms, Yan et al. [14] put forward a graph-based keyword extraction method to solve the corpus dependency problem.

Word vector based semantic similarity matching effectively solve the polysemy issue, but these methods merely take the inter-word up-down structural relation rather than deeply understand the text semanteme. Ferreira and George [15] brought forward an interpretation identification system combining syntactic and semantic similarity, which expressed every pair of sentences as a combination of different similarity measurement, and proved the accuracy of this system in sentence interpretation identification by experiment; in order to fully excavate such inter-word properties as correlation unit and type, Jiang et al. [16] proposed a neural network based word vector model, which trained the corpus with inter-word grammar and context relation and eventually verified model feasibility. Wang et al. [17] put forward a theme model based text similarity calculation method, which built a theme model for corpus, excavated the inter-word similarity relation, and finally obtained the text features, for figuring out the similarity between texts; based on traditional theme model, Li [18] put forward a SL-LDA-based field feature acquisition method by introducing the word frequency features, and used this model to extract the “theme-phase” of the context for extracting keywords. In 2013, Google developed a Word2Vec model able to produce word vector, which was used for studying feature extraction [19], emotion analysis [20], text classification [21], text similarity [22], etc. Shun Yao et al. investigated the BM25 model [23] using probability model to score and rank the matching results, which was widely applied to the scoring and ranking of matching relevance.

To solve the low efficiency in matching the power enterprises’ technical demand with the supply of scientific and technological achievements, this article proposed to take electric power technical field features as the basis of demand-supply matching, and extract the electric power technical field feature to build a background library of scientific and technological achievements. By measuring the matching degree of technical field feature and the scientific and technological achievements background library, it is validated that the technical demand-supply matching method based on electric power technical field feature has significant effect on promoting technical demand-supply matching efficiency.

2. Feature-based electric power technical online demand-supply matching method

The idea of the proposed feature-based electric power technical online demand-supply matching method is: first, extract the technical field feature to build an electric power technical field feature set; then, take “China smart grid technical transaction service platform” (https://zndw.ctex.cn/) and “Keyi net” (https://www.1633.com/) as the information source of technical demand and achievements, conduct text preprocessing such as word segmentation and stop words deletion on the technical demand and achievement text therein, obtain an electric power technical corpus, extract the electric power technical field feature from above-mentioned corpus through improved TF-IDF algorithm, and establish an electric power technical field feature set; next, build an index database of electric power scientific and technological achievements for the technical achievement text of “Keyi net” platform, search out the technological achievements with strong relevancy with electric power technical field feature, construct a background library of electric power scientific and technological achievements, and preprocess the text of background library; finally, utilize the word vector based electric power technical demand-supply matching algorithm to obtain the matching results of electric power technical demand and supply.

The technical requirements of electric power enterprises refer to the specific needs of electric power enterprises for a certain technology that is beneficial to the development of enterprises in the process of production and operation, which can be called technical requirements of electric power enterprises. Scientific and technological achievements refer to the achievements with practical value produced through scientific research and technological development.

2.1 Technical field feature extraction based on improved TF-IDF algorithm

Technical field feature (TFF) is a professional term on behalf of technical field which represents the features of the whole technical field, and the set of all technical field features in a technical field is called technical field feature set (TFFS).

TF-IDF is a statistical method to assess the importance of a word to a document set or one of the documents in a corpus. The importance of a word increases proportionally to the number of times it appears in the document, but decreases inversely to the frequency it appears in the corpus. If a word appears frequently in one article TF, and rarely appears in other articles, it is considered that the word or phrase has a good ability to distinguish between categories and is suitable for classification.

2.1.1 Improvement to TF-IDF algorithm

In traditional TF-IDF algorithm, only the word occurrence frequency and in how many texts the words have appeared have been taken into account [24, 25], resulting in slow computing speed and poor feature extraction effect.

In a text, many information can play a good role in feature extraction, for example, part of speech, the position that a word appears in a text, etc. In the technical demand-supply text, noun as a kind of words defining real entity carries more key information, so noun can be endowed with higher weight in feature extraction. In addition, the beginning and ending paragraphs of technical demand-supply text are more important than the text information of other paragraphs, so the words in these positions should also be endowed with higher weight. Based on this, the paper proposed an improved TF-IDF algorithm dividing corpus into foreground corpus and background corpus, to improve the slow calculation of word frequency, and meanwhile added related indexes such as technical field relevancy, technical field evenness, impact factor and technical field membership.

2.1.2 Related indexes and meanings

The paper involves multiple indexes such as technical field feature, technical field membership, etc. Here, related indexes and their meanings are introduced, as shown in Table 1.

Table 1
Related indexes

Abbreviation	Index	Meaning
TFF	Technical field feature	Key words used to represent technical fields
TFFS	Technical field feature set	Refer to the set of all technical field features in a technical field
TFM	Technical field membership	Refer to the degree to which a word belongs to this technical field
$\textit{Cf}_{k}$	Foreground corpus	Refer to the field corpus including abundant field features
$\textit{Cb}_{k}$	Background corpus	Refer to the field corpus including abundant field features of other fields
TFR	Technical field relevancy	Refer to the degree to which a word is relevant to this technical field
TFE	Technical field evenness	Refer to the evenness degree that the words with positive technical field relevancy (TFR $>$ 0) distribute in each text of technical field corpus
$\lambda$	Impact factor	The ratio of the probability that a word appears in foreground corpus and the maximum probability that a word appears in all corpora

2.1.3 Improved TF-IDF algorithm

In the improved TF-IDF algorithm, the word weight is used to represent the degree how important this word is in the context. Take a word i of a technical demand-supply context j of an electric power enterprise for example, the formula of this word’s TF value is as below:

$\displaystyle\textit{tf}_{ij}=\frac{n_{ij}}{\sum\limits_{k=0}^{n}{n_{kj}}}$ (1)

in which, $\textit{tf}_{ij}$ refers to the TF value of the word i in the technical demand-supply context $j$ , $n_{ij}$ refers to the frequency that the word $i$ appears in the technical demand-supply text $j$ , $n_{kj}$ refers to the frequency that the word k appears in the technical demand-supply text $j$ .

For every word $i$ , we can calculate its IDF value following the formula below:

$\displaystyle idf_{i}=\log\left({\frac{\left|D\right|}{1+\left|{D_{i}}\right|}% }\right)$ (2)

in which, $idf_{i}$ represents the IDF value of the word i in corpus, $|D|$ represents total text number in the corpus, $1+|D_{i}|$ denotes to the number of texts in which the word $i$ appears in the corpus. Here, the denominator added by 1 is a use of Laplace Smoothing, to prevent the situation that the denominator becomes zero due to some new words that haven’t ever appeared in the corpus, and enhance algorithmic robustness.

Normally, the TF-IDF algorithm integrates the TF algorithm with IDF algorithm, then the TF-IDF value can be calculated as below:

$\displaystyle\textit{tf}\times\textit{idf}\left({i,j}\right)=\textit{tf}_{ij}% \times\textit{idf}_{i}=\frac{n_{ij}}{\sum\limits_{k=0}^{n}{n_{kj}}}\times\log% \left({\frac{\left|D\right|}{1+\left|{D_{i}}\right|}}\right)$ (3)

The computing formula of its technical field relevancy is as below:

$\displaystyle\textit{TFR}_{i,k}=\lg\left({\textit{TF}_{i,k}}\right)\times\lg% \left({\frac{p\left({i|\textit{Cf}_{k}}\right)}{p\left({i|\textit{Cb}_{k}}% \right)}}\right)=\lg\left({\textit{TF}_{i,k}}\right)\times\lg\left({\frac{% \frac{\textit{TF}_{i,k}}{\textit{mf}_{k}}}{\frac{\sum\limits_{Cf_{l}\in\textit% {Cb}_{k}}{\textit{TF}_{i,l}}}{\textit{mb}_{k}}}}\right)$ (4)

in which, the $\textit{TF}_{i,h}$ can be estimated as:

$\displaystyle\textit{TF}_{i,h}=\sum\limits_{c_{j}\in\textit{Cf}_{h}}{\textit{% tf}_{i,j}}$ (5)

The simplified TFR ${}_{i,k}$ can be calculated as below:

$\displaystyle\textit{TFR}_{i,k}=\lg\left({\sum\limits_{c_{j}\in\textit{Cf}_{k}% }{\textit{tf}_{i,j}}}\right)\times\lg\left({\frac{\textit{mb}_{k}\sum\limits_{% c_{j}\in\textit{Cf}_{k}}{\textit{tf}_{i,j}}}{\textit{mf}_{k}\sum\limits_{% \textit{Cf}_{l}\in\textit{Cb}_{k}}{\sum\limits_{c_{j}\in\textit{Cf}_{l}}{% \textit{tf}_{i,j}}}}}\right)$ (6)

in which, $p(i|\textit{Cf}_{k})$ and $p(i|\textit{Cb}_{k})$ represent the probability that the word i appears in the foreground corpus $\textit{Cf}_{k}$ and background corpus $\textit{Cb}_{k}$ , $\textit{TF}_{i,h}$ is the frequency that the word $i$ appears in the foreground corpus $\textit{Cf}_{h}$ . In this paper, the foreground corpus is denoted by $\textit{Cf}_{k}$ , while the background corpus $\textit{Cb}_{k}$ consists of other foreground $\textit{Cf}_{1}$ except $\textit{Cf}_{k}$ . The $\textit{mf}_{k}$ and $\textit{mb}_{k}$ represent the text amount in $\textit{Cf}_{k}$ and $\textit{Cb}_{k}$ , respectively. The $tf_{i,j}$ is the times that the word i appears in text $c_{j}$ .

The computing method of TFR is an improvement on TF-IDF algorithm, conforming to TF-IDF algorithm. The TFR algorithm also consists of two parts: lg(TFi,k) is the times that the word i appears in the corpus, $\lg\left({\frac{p\left({i|\textit{Cf}_{k}}\right)}{p\left({i|\textit{Cb}_{k}}% \right)}}\right)$ refers to when the probability that the word $i$ appears in $\textit{Cf}_{k}$ is higher than the probability that it appears in $\textit{Cb}_{k}$ , then the word i is called to be positively correlated to this technical field $T_{k}$ , otherwise, the word i is called to be irrelevant to this technical field $T_{k}$ . Therein, the irrelevant words shall not be regarded as technical field feature (TFF).

Technical field evenness (TFE) reflects the evenness that the words positively correlated to technical field (TFR $>$ 0) scattered in technical field corpus. The TFE’s computing formula is as below:

$\displaystyle\textit{TFE}_{i,k}=\sum\limits_{c_{j}\in\textit{Cf}_{k}}{\left({% \frac{\textit{tf}_{i,j}}{\sum\limits_{c_{j}\in\textit{Cf}_{k}}{\textit{tf}_{i,% j}}}\lg\left({\frac{\sum\limits_{c_{j}\in\textit{Cf}_{k}}{\textit{tf}_{i,j}}}{% \textit{tf}_{i,j}}}\right)}\right)}$ (7)

in which, $\textit{tf}_{i,j}$ represents the probability that the word $i$ appears in the $j^{\text{th}}$ text of $\textit{Cf}_{k}$ . It can be seen from TFE’s computing formula that, the more times that the word i appears in $\textit{Cf}_{k}$ , i.e. the more even that the word $i$ distributes in $\textit{Cf}_{k}$ , then TFE value will be larger, and the more likely this word becomes a TFF. Otherwise, the less likely it becomes a TFF. If TFE value is zero, it suggests that this word only appears in $\textit{Cf}_{k}$ once, then it shouldn’t be listed in the TFF option.

The impact factor $\lambda$ refers to the ratio of the probability that the word i appears in $\textit{Cf}_{k}$ and the probability that it appears in $\textit{Cb}_{k}$ . The computing formula of $\lambda$ is as below:

$\displaystyle\lambda_{i,k}=\frac{p\left({i|\textit{Cf}_{k}}\right)}{\mathop{% \max}\limits_{\textit{Cf}_{l}\in C}\left({p\left({i|C}\right)}\right)}$ (8)

Therein:

$\displaystyle C=\textit{Cf}_{k}\cup\textit{Cb}_{k}$ (9)

It can be found out from above formula that, when $p\left({i|\textit{Cf}_{k}}\right)=\mathop{\max}\limits_{\textit{Cf}_{l}\in C}% \left({p\left({i|C}\right)}\right)$ , the value of $\lambda_{i,k}$ equals to 1, indicating the probability that the word i appears in the $\textit{Cf}_{k}$ of technical field $T_{k}$ is obviously higher that other technical fields. Therefore, when the value of $\lambda_{i,k}$ equals to 1, the word i is probably a technical field feature (TFF) of $T_{k}$ ; when $\lambda_{i,k<}$ 1, though the word i is positively correlated to $T_{k}$ , the chance that the word $i$ is a TFF is not high.

In practical operation, in order to reduce the time of calculating $\lambda$ , it is feasible to take the ratio of $p(i|\textit{Cf}_{k})$ and the value of technical field which features have been calculated and extracted as the computing formula of $\lambda$ , as below:

$\displaystyle{\lambda}^{\prime}_{i,k}=\frac{p\left({i|\textit{Cf}_{k}}\right)}% {\mathop{\max}\limits_{\textit{Cf}_{l}\in{C}^{\prime}}\left({p\left({i|{C}^{% \prime}}\right)}\right)}$ (10)

in which, $C^{\prime}$ refers to the technical field which TFF has already been figured out. After measuring the TFR and TFE values, integrated with impact factor $\lambda$ , the technical field membership (TFM) of candidate TFF is worked out, see TFM formula as follows:

$\displaystyle\textit{TFM}_{i,k}=\lambda_{i,k}\left({\alpha\textit{TFR}^{*}_{i,% k}+\left({1-\alpha}\right)\textit{TFE}^{*}_{i,k}}\right)$ (11)

in which, $\alpha\in\left({0,1}\right)$ , in practical operation, when Cb ${}_{k}$ amount is large enough, $\alpha$ is set as 0.8, but when Cb ${}_{k}$ amount is small, it is suitable to set $\alpha$ as 0.95. TFR ${}^{\ast}_{i,k}$ and TFE ${}^{\ast}_{i,k}$ are standardized results of TFR ${}_{i,k}$ and TFE ${}_{i,k}$ , with their standardized calculation formulas as below:

$\displaystyle\textit{TFR}^{*}_{i,k}=\frac{\textit{TFR}_{i,k}-\min\left({% \textit{TFR}_{k}}\right)}{\max\left({\textit{TFR}_{k}}\right)-\min\left({% \textit{TFR}_{k}}\right)}$ (12) $\displaystyle\textit{TFE}^{*}_{i,k}=\frac{\textit{TFE}_{i,k}-\min\left({% \textit{TFE}_{k}}\right)}{\max\left({\textit{TFE}_{k}}\right)-\min\left({% \textit{TFE}_{k}}\right)}$ (13)

This section improves the TF-IDF algorithm to obtain the TFR of every word, meanwhile introduces such indexes as TFE and impact factor to finally obtain the related data of TFM. Then, the TFR data of each word was ranked to pick out the TFF relevant to this technical field.

2.2 Construction of background library of electric power scientific and technological achievements based on BM25F retrieval model

The too colloquial and non-structured description of technical demand-supply text leads to low efficiency in technical demand-supply matching and redundant matching results. As a result, for one thing, the scientific and technological achievements in technical field should be screened out to build an index database of electric power scientific and technological achievements as the scientific and technological achievements supply database; for another thing, the BM25F-based relevancy retrieval model should be used to search out the scientific and technological achievements set with high relevancy with electric power TTF from the index database of electric power scientific and technological achievements, for establishing a background library of electric power scientific and technological achievements (p).

BM25F is an improved algorithm of BM25 model. In BM25 relevancy retrieval model, the text is considered as a whole in calculating text relevancy [26, 27]. However, with the rapid development of retrieval technique, structured data gradually replaces text data, and every text is cut into multiple independent domains, for example, a webpage is cut into such domains as headline, subject term and content. The contents in different domains contribute differently to the subject, so their weights are also different, thus BM25F retrieval model was proposed. This model divides text into individuals according to domain and conducts weighted sum on the score of every word in each domain. The specific process is:

By using BM25F relevancy retrieval model, TTF is taken as query term Q(Query) to retrieve in the index database of electric power scientific and technological achievements, and the BM25F relevancy score is obtained through calculation, then the background library of electric power scientific and technological achievements (p) can be obtained according to the score. The calculation process is: first, resolve the morphemes of technical query terms Q(Query) to get the morpheme $q_{i}$ , then figure out the relevancy score of every morpheme $q_{i}$ with the text d in the index database of electric power scientific and technological achievements, finally, carry out weighted sum of the relevancy score of $q_{i}$ to d to obtain the relevancy score of Q(Query) with every text $d$ in index database of electric power scientific and technological achievements. The BM25F scoring formula is:

$\displaystyle\textit{Score(Q,d)}=\sum\limits_{i=1}^{n}{W_{i}}\cdot\frac{f_{i}^% {u}}{k_{1}+f_{i}^{u}}$ (14)

in which, $W_{i}$ is defined as IDF, with calculation formula as below:

$\displaystyle\textit{IDF}(q_{i})=\log\frac{N-n(q_{i})+0.5}{n(q_{i})+0.5}$ (15)

in which, $N$ is the total text number in index database of electric power scientific and technological achievements, $n(q_{i})$ is the number of texts in index database including morpheme $q_{i}$ . So, the less text including $q_{i}$ there are in index database of electric power scientific and technological achievements, the more accurate $q_{i}$ will be, and the higher its weight will be. The calculation formula of $f_{i}^{u}$ is:

$\displaystyle sf_{i}^{u}=\sum\limits_{k=1}^{u}{w_{k}\cdot\frac{f_{ui}}{B_{u}}}$ (16)

in which, $u$ refers to dividing a text into u different domains, $W_{k}$ is the weight of each domain, $f_{ui}$ is the word frequency of $q$ in the domain $u$ of candidate text d, and $B_{u}$ is the consideration of each domain length in candidate texts. The calculation formula of $B_{u}$ is:

$\displaystyle B_{u}=\left({1-b_{u}}\right)+b_{u}\cdot\frac{l_{u}}{\textit{avgl% }_{u}}$ (17)

in which, $l_{u}$ is the length of domain $u$ , avgl ${}_{u}$ represents the average length of domain, $b_{u}$ is the regulatory factor of domain length. So, the calculation formula of $f_{i}^{u}$ can be written as:

$\displaystyle f_{i}^{u}=\sum\limits_{k=1}^{u}{w_{k}\cdot\frac{f_{ui}}{\left({1% -b_{u}}\right)+b_{u}\cdot\frac{l_{u}}{\textit{avgl}_{u}}}}$ (18)

Thus, the BM25F scoring formula is simplified as:

$\displaystyle\textit{Score(Q,d)}=\sum\limits_{i=1}^{n}{W_{i}}\cdot\frac{\sum% \limits_{k=1}^{u}{w_{k}f_{ui}}}{k_{1}\left[{\left({1-b_{u}}\right)+b_{u}\cdot% \frac{l_{u}}{\textit{avgl}_{u}}}\right]+\sum\limits_{k=1}^{u}{w_{k}f_{ui}}}$ (19)

in which, $k_{1}$ and $b_{u}$ are regulatory factors. Generally, $k_{1}=$ 1.2, $b_{u}=$ 0.75 [28]. For convenience of subsequent calculation, the retrieved BM25F score is denoted as $Y(S_{p})$ .

2.3 Word vector-based electric power technical demand-supply similarity matching calculation

To verify the matching efficiency of technical demand-supply matching model based on electric power TFF, the word vector-based text matching degree was adopted to figure out the matching value. The calculation process is:

Step 1:
Judge the technical demand D contains TFF $x_{i}$ , express the TFF and the text information of scientific and technological achievements in form of word set, then TFF $x_{i}$ can be transformed to $x_{i}=\left\{{X_{1}}\right\}$ , and the scientific and technological achievements text can be converted to $S=\left\{{S_{1},S_{2},\ldots,S_{p}}\right\}$ .
Step 2:
Remove the common portion of $x_{i}$ and S. As the common portion of $x_{i}$ and $S$ is zero, so after removing the intersection, $x_{i}$ and $S$ remain unchanged, thus this step can be ignored.
Step 3:
Build a semantic similarity matrix. Work out the semantic similarity of $x_{i}$ and $S$ s in terms of Cartesian product, to obtain a 1 $\times$ $p$ similarity matrix $M_{1}$ as below:

$\displaystyle{\textit{sim}\left({X_{1},S_{1}}\right)}\ \ {\textit{sim}\left({X% _{1},S_{2}}\right)}\ \ldots{\textit{sim}\left({X_{1},S_{P}}\right)}$

in which, the $\textit{sim}\left({X_{1},S_{j}}\right)\left({j=1,2,\ldots,p}\right)$ in $M_{1}$ is the semantic text similarity of $X_{1}$ and $S_{j}$ . Here, first, the scientific and technological achievements text $S$ is preprocessed. Then, Word2vec model is used to train above text, and the text words in background library of scientific and technological achievements are mapped into a higher dimensional space to obtain a spatial vector model. Later, the cosine similarity distance is used to calculate word similarity. Take sim( $X_{1}$ , $S_{1}$ ) as an example, let $a_{i}$ and $b_{i}$ be the word vectors of $X_{1}$ and $S_{1}$ , $k$ is the dimension of word vector, then:

$\displaystyle\textit{sim}\left({X_{1},S_{1}}\right)=\frac{\sum\limits_{i=1}^{k% }{\left({X_{1}S_{1}}\right)}}{\sqrt{\sum\limits_{i=1}^{k}{X_{1}}}\sqrt{\sum% \limits_{i=1}^{k}{S_{1}}}}$ (20)
Step 4:
Find out the maximum of sim( $X_{1}$ , $S_{j}$ ) in $M_{1}$ , denoted as sim ${}_{m}$ . Figure out the similarity between word sets of $x_{i}$ and $S$ , standardize the results, then the similarity is:

$\displaystyle\textit{sim}_{v}\left({X_{1},S_{p}}\right)=\frac{\left({p+1}% \right)\times\textit{sim}_{m}}{2p}$ (21)
Step 5:
Work out the electric power technical demand-supply matching degree. The electric power technical demand-supply matching degree $Z_{p}(x_{i})$ is defined as the matching degree of the electric power TFF $x_{i}$ included in technical demand $D$ and the background library $p$ of electric power scientific and technological achievements. When the value of $Z_{p}(x_{i})$ is higher, it is easier to use this TFF to find out satisfactory scientific and technological achievements in the background library of electric power scientific and technological achievements, and to select scientific and technological achievements higher than threshold $\gamma$ as effective scientific and technological achievements. The specific calculation formula is:

$\displaystyle Z_{p}\left({x_{i}}\right)=\frac{\sum{{}_{S\in p}\left[{Y\left({S% _{p}}\right)\sum{{}_{S_{j}\in S}\textit{sim}_{v}\left({X_{1},S_{p}}\right)}}% \right]}}{\sum{{}_{S}Y\left({S_{p}}\right)\left|S\right|}}$ (22)

in which, $|S|$ is the number of words included in $S$ , $Y(S_{p})$ is the BM25F score of scientific and technological achievements $S_{p}$ .

3. Application of feature-based electric power technical online demand-supply matching method

3.1 Extraction of electric power TFF

3.1.1 Corpus establishment

The 3267 texts of technical demand or scientific and technological achievements related to electric power university and enterprise on Keyi net and China smart grid technical transaction service platform were taken as the corpus. This corpus covers 15 technical fields such as ultra-high voltage power transmission and transformation technique, high and extra-high voltage power transmission technique, high voltage direct current power transmission technique, power distribution and utilization, power grid planning and economics, so it is classified according to technical field.

3.1.2 Evaluation indexes

In machine learning, the common indexes used to evaluate model and algorithm are recall rate (R) and precision rate (P). But only relying on these two indexes couldn’t fully evaluate the algorithmic performance, so it is necessary to introduce the $F_{1}$ value of comprehensive evaluation indexes. The specific calculation formula of $F_{1}$ is as below:

$\displaystyle F_{1}=\frac{2\times P\times R}{P+R}$ (23)

3.1.3 Feature extraction analysis

This paper extracted 128 electric power TFF in total. Due to the limit of article length, the technical field of power distribution and utilization was taken as an example for introduction. A total of 314 technical demand texts in power distribution and utilization technical field were taken as the foreground corpus (Cf ${}_{k}$ ), and 2953 technical demand texts of other 14 technical fields were taken as the background corpus (Cb ${}_{k}$ ). As the selected background corpus is not large, so it is suitable to set $\alpha$ as 0.95 in calculating TFM. The improved TF-IDF algorithm was used to extract TFF. Take power distribution and utilization field as an example, set the top ten words as the TFF of this field, see extraction results in Tables 2– 4.

Table 2
Top ten words with the largest TFR in power distribution and utilization field

Candidate TFF	$\textit{TFR}\downarrow$	TFE	${\lambda}^{\prime}$	TFM
Power distribution	0.99	0.87	0.98	0.97
Electric measurement	0.96	0.63	1	0.95
Energy efficiency technique	0.87	0.52	1	0.86
Operation and maintenance	0.83	0.56	1	0.82
Power utilization	0.82	0.82	1	0.83
Automation technique	0.73	0.51	1	0.72
Rural power technique	0.71	0.52	1	0.70
Electric energy quality	0.71	0.53	1	0.70
Power distribution equipment	0.70	0.78	1	0.71
Microgrid	0.69	0.76	1	0.70

Table 3

Top ten words with the largest TFE in power distribution and utilization field

Candidate TFF	TFR	$\textit{TFE}\downarrow$	${\lambda}^{\prime}$	TFM
Power grid	0.54	1	0.67	0.38
Technique	0.37	0.98	0.58	0.23
Electric power	0.29	0.97	0.49	0.16
Alternating current	0.36	0.95	0.47	0.18
System	0.41	0.95	0.69	0.30
Direct current	0.49	0.93	0.63	0.32
Power transformation	0.69	0.90	0.88	0.62
Power distribution	0.99	0.87	0.98	0.97
Demand	0.53	0.83	0.37	0.20
Power utilization	0.82	0.82	1	0.83

Table 4

Top ten words with the largest TFM in power distribution and utilization field

Candidate TFF	TFR	TFE	${\lambda}^{\prime}$	$\textit{TFM}\downarrow$
Power distribution	0.99	0.87	0.98	0.97
Electric measurement	0.96	0.63	1	0.95
Energy efficiency technique	0.87	0.52	1	0.86
Power utilization	0.82	0.82	1	0.83
Operation and maintenance	0.83	0.56	1	0.82
Automation technique	0.73	0.51	1	0.72
Power distribution equipment	0.70	0.78	1	0.71
Rural power technique	0.71	0.52	1	0.70
Electric energy quality	0.71	0.53	1	0.70
Microgrid	0.69	0.76	1	0.70

From above results, it can be concluded:

(1)

When ranked according to TFR value, the words show higher relevancy wit power distribution and utilization technical field, that is to say, the frequency that these words appear in foreground corpus is far higher that background corpus. But several words also show lower TFE values. TFF distributes in Cf ${}_{k}$ evenly, with higher appearance frequency in Cf ${}_{k}$ , however, some words such as “power grid” and “system” not only show high appearance frequency in Cf ${}_{k}$ but also in Cb ${}_{k}$ , which results in their lower TFR values and further affects their TFM values. These words with higher TFM also have higher TFR and TFE values, in which TFR value is more important. At the same time, Table 4 provides a basis for the final determination of TFF.

(2)

For the words with higher TFR value and those with higher TFM value, their $\lambda$ values basically equal to 1, that is to say, such words show obviously higher appearance frequency in Cf ${}_{k}$ than in Cb ${}_{k}$ , or we can say relevant words with $\lambda=$ 1 are probably the TFF of power distribution and utilization field. When TFE is higher, $\lambda$ is relatively lower, namely, when a word distribute in Cf ${}_{k}$ and Cb ${}_{k}$ evenly, its $\lambda$ is lower with low discrimination in the entire corpus, so it should not be taken as a TFF of this technical field. In the calculation formula of TFM, both $\lambda$ and TFR are endowed with higher weight, but TFE has a lower weight, so the words with high TFE but low $\lambda$ and TFR often have lower TFM.Before drawing the above conclusion, some words that affect the conclusion should be manually removed according to the actual situation, so as to improve the accuracy of the final TFF.At last, in the Table 3 ranked by TFM value in a descending order, the values of $\lambda$ and TFR are even more crucial.

(3)

Following the TFM values in Table 4, we obtain the TFF of power distribution and utilization field. See details in Table 5.

Table 5

TFF of power distribution and utilization field

TFF	TFR	TFE	${\lambda}^{\prime}$	TFM
Electric measurement	0.96	0.63	1	0.95
Energy efficiency technique	0.87	0.52	1	0.86
Operation and maintenance	0.83	0.56	1	0.82
Automation technique	0.73	0.51	1	0.72
Power distribution equipment	0.70	0.78	1	0.71
Rural power technique	0.71	0.52	1	0.70
Electric energy quality	0.71	0.53	1	0.70
Microgrid	0.69	0.76	1	0.70

By comparing traditional TF-IDF algorithm to improved algorithm, the performance of the improved TF-IDF algorithm can be seen. See details in Figs 1 and 2.

Figure 1.

Performance comparison of traditional and improved TF-IDF.

Figure 2.

Comparison of F ${}_{1}$ of traditional and improved TF-IDF.

(1)

Figure 1 indicates that, as word amount increases, in most cases, the precision of improved TF-IDF algorithm is higher than that of traditional TF-IDF algorithm, and the greatest precision difference between two algorithms is as high as 30%. Figure 2 shows that, as word amount increases, in most cases, the $F_{1}$ value of improved TF-IDF algorithm is higher than that of traditional TF-IDF algorithm, and when the word amount reaches to about 300, $F_{1}$ value evidently changes slower, with smaller variation range.

(2)

Comparing the precision, recall and $F_{1}$ value in Figs 1 and 2, three evaluation indexes show great difference, and every evaluation index of improved TF-IDF algorithm has been significantly promoted compared to traditional TF-IDF algorithm. The improved TF-IDF algorithmic performance is evidently superior to traditional TF-IDF algorithm. Although the score of $F_{1}$ in Fig. 2 fluctuates, on the whole, the improved TF-IDF algorithm is greatly improved compared to the previous one. From the experimental point of view, the improved TF-IDF algorithm has a better function of extracting feature words in the technical field.

(3)

By measuring the algorithm indexes like recall, precision and $F_{1}$ value, it clearly demonstrates the improved TF-IDF algorithmic performance is evidently superior to traditional TF-IDF algorithm. TFF extraction with improved TF-IDF algorithm shows significant effect, which can reduce a lot of time required in artificial retrieval of technical demand-supply texts. Taking TFF as the basis for matching technical demand and scientific and technological achievements increases matching efficiency to a certain extent, thereby enhancing the success rate of technology transfer.

3.2 Construction of background library of electric power scientific and technological achievements

A total of 5,864 scientific and technological achievements in electric power field were selected on Keyi net, to build an index database of electric power scientific and technological achievements. The 128 TFFs obtained in above section were taken as retrieval words to search in the index database of electric power scientific and technological achievements through BM25F retrieval model, and the retrieved achievements were ranked according to relevancy, then the top 60 achievements were returned to build a background library p of electric power scientific and technological achievements. Take the “power distribution equipment” in power distribution and utilization field as an example. Some of the background libraries of electric power scientific and technological achievements are displayed in Table 6.

Table 6
Background libraries of “power distribution equipment” scientific and technological achievements in power distribution and utilization field (segment)

Name of Scientific and Technological Achievements	Relevancy
Booth distribution box with multi-directional outlets	25.311
Outdoor rain-proof distribution box	23.072
Multi-winding magnetic integrated hybrid distribution transformer	21.194
Magnetically-controlled time division multiplexing integrated intelligent distribution transformer	21.269
Hybrid distribution transformer with decoupling integration device	19.219
Sectional fixture of distribution box	16.537
Single pole framework of distribution device in 330kV substation	15.729
Distribution transformer with lightening protector	15.462

3.3 Calculation of electric power technical demand-supply matching degree

The 3,267 technical demands and 5,684 scientific and technological achievements in technical demand-supply corpus were tested through the TFF-based technical demand-supply matching model. Those with feature matching degree higher than 0.5 were retained as the matching results. Taking the power distribution and utilization field as an example, the results of one technical demand is shown in Table 7. Meanwhile, the matching efficiency $\rho$ before and after using the model is calculated, see its calculation formula as below:

$\displaystyle\rho=\frac{G_{\alpha}}{G}$ (24)

in which, G ${}_{\alpha}$ is the number of scientific and technological achievements matched to satisfy technical demand before using the model or the number of scientific and technological achievements with matching degree higher than 0.5 after using the model, G refers to the number of scientific and technological achievements matched. The specific matching efficiencies are compared in Fig. 3.

Table 7

The demand-supply matching table for a technical demand based on electric power TFF (segment)

Name of demand	TFF	Scientific and technological achievements	Matching degree
Require novel distribution transformer related technical development	Operation and maintenance	Multi-winding magnetic integrated hybrid distribution transformer Magnetically-controlled time division multiplexing integrated intelligent distribution transformer	0.852 0.833
		Hybrid distribution transformer with decoupling integration device	0.751
		Intelligent energy-saving distribution transformer	0.638
		Intelligent distribution transformer with strong heat dissipation	0.609
		Distribution transformer anti-theft system with novel failure alarm and diagnosis software	0.594
		Winding distribution transformer with lightening protector	0.591

Figure 3.

Comparison of matching efficiency before and after using the model.

It is obtained from Fig. 2 that:

(1)

Before using the model, the maximal technical demand-supply matching efficiency is only 53.4%. After all 3,267 technical demands are matched, merely 1,702 scientific and technological achievements meet the technical demand, with a matching efficiency of 52.1%. Before using the model, the average value of technical demand-supply matching efficiency is only 47.3%, which is too low.

(2)

After using the TFF-based demand-supply matching model, the maximal technical demand-supply matching efficiency increases from 53.4% to 74.1%. After all 3,267 technical demands are matched, 2,247 scientific and technological achievements meet the technical demand, with a matching efficiency of 68.8%. After using the model, the average value of technical demand-supply matching efficiency increases from 47.3% to 70.7%, which has been significantly promoted.

(3)

The electric power TFF-based online demand-supply matching model improves the electric power technical demand-supply matching efficiency.

4. Conclusions

To solve the technical demand-supply matching problem in electric power technology transfer platform, this paper raised a new idea of taking TFF as matching basis for technical demand-supply matching, proposed an online demand-supply matching method based on electric power TFFs, Based on the improved TF-IDF algorithm, this paper combined BM25F algorithm to measure its accuracyand verified the effectiveness of this demand-supply matching method by application.

In this paper, the text sample data of technical demands and scientific and technological achievements were obtained from open technology transfer platform, and the data texts are not very large. This is varied from practical electric power technical demand and supply information amount to some extent. In follow-up study, more electric power technology transfer demand and supply texts will be collected as corpus, and the algorithm will be optimized continuously to improve its performance.

References

Peng

. Study on relation of technology transfer method, independent R&D and technical efficiency of high-tech industry. Science of Science and Management of S&T. 2013; 34(5): 44-52.

Jian

Sun

. Study of shallow implicit knowledge demand-supply matching based on fuzzy case-based reasoning. Information studies: Theory & Application. 2016; 39(4): 84-88.

Dong

Jiang

. Empirical study of patent technology transaction chance between S&T subjects based on ERGM. China Soft Science. 2018; 000(3): 184-192.

Brownlie

Macbeth

. The strategic management of technology: Integrating technology supply and demand perspectives. European Management Journal. 1980; 7(1): 71-83.

Liang

Jiang

Kong

. Multi-satisfactory stable oriented two-sided matching decision-making method considering preference order. System Engineering Theory and Practice. 2015; 35(6): 1535-1546.

Klerkx

Leeuwis

. Matching demand and supply in the agricultural knowledge infrastructure: Experiences with innovation intermediaries. Food Policy. 2007; 33(3): 260-276.

. Two-sided matching decision-making considering matching willingness in intuitionistic fuzzy condition. Chinese Journal of Management Science. 2017; 25(6): 161-168.

Yang

Zhang

You

. Scientific and technological achievements transformation matching decision model based on sectional intuitionistic fuzzy set. Statistics & Decision. 2018; 34(9): 37-41.

Kong

Liu

Fan

. Study of two-sided matching decision-making method considering individual preference and reciprocity preference. Fuzzy Systems and Mathematics. 2019; 33(6): 141-150.

10.

Zhu

Cui

Huang

Chen

. An efficient Wikipedia semantic matching approach to text document classification. Information Sciences. 2017; 393: 15-28.

11.

Yang

Xia

. Analysis of demand-supply matching efficiency of online technology transfer platform. Journal of Management Science. 2017; 30(6): 104-112.

12.

Xie

Shen

. Graph processing based thesis reviewing automatic recommendation system. Application Research of Computers. 2016; 33(3): 798-801.

13.

Benedetti

Beneventano

Bergamaschi

, et al. Computing inter-document similarity with Context Semantic Analysis. Information Systems. 2019; 80: 136-147.

14.

Chen

Wang

Guo

. Single document keyword extraction via quantifying higher-order structural features of word co-occurrence graph. Computer Speech & Language. 2019; 57: 98-107.

15.

Ferreira

George

DCC

Fred

Rafael

Steven

Marcelo

. Combining sentence similarities measures to identify paraphrases. Computer Speech & Language. 2018; 47: 59-73.

16.

Jiang

Huang

. Words relation based word vector model. Journal of Chinese Information Processing. 2017; 31(3): 25-31.

17.

Wang

. LDA topic model based text similarity calculation. Computer Science. 2013; 40(12): 229-232.

18.

. Several problems in topic mode-based multi-label text classification and streaming text data modelling. Jilin University, 2015.

19.

Liu

Zou

Xing

. Topic feature-based keyword extraction. Application Research of Computers. 2012; 29(11): 4224-4227.

20.

Yang

Bie

. Sentiment Analysis of Weibo Comment Texts Based on Extended Vocabulary and Convolutional Neural Network. Procedia Computer Science. 2019; 147: 361-368.

21.

Gao

. Word2vec-based Chinese short text classification method. Journal of Shandong University (Engineering Science). 2019; 49(2): 34-41.

22.

Xie

. Word2vec-based sentence similarity calculation research. Computer Science. 2017; 44(9): 256-260.

23.

Yao

. Study of bibliography retrieval sort algorithm based on BM25 model and borrow prediction model. Library Journal. 2016; 35(10): 63-68.

24.

Niu

Huang

. Automatic extraction of Chinese keywords based on TF-IDF and rules. Journal of Chinese Mini-Micro Computer Systems. 2016; 37(4): 711-715.

25.

Zhao

Wei

Wang

. Text classification method based on TF-IDF and cosine similarity. Journal of Chinese Information Processing. 2017; 31(5): 138-145.

26.

Zhang

Jiang

. Semantic similarity clustering based technical demand identification model from the perspective of demand-supply matching. System Engineering Theory and Practice. 2019; 39(2): 476-485.

27.

Géry

Largeron

. BM25t: a BM25 extension for focused information retrieval. Knowledge and Information Systems. 2012; 32(1): 217-241.

28.

Shao

Zhang

. Personalized recommendation of Web text mining based on BM25F model. Information studies: Theory & Application. 2013; 36(11): 118-122.

Study on online demand-supply matching of electric power technology based on technical field feature

Abstract

Keywords

1. Introduction

2. Feature-based electric power technical online demand-supply matching method

2.1 Technical field feature extraction based on improved TF-IDF algorithm

2.1.1 Improvement to TF-IDF algorithm

2.1.2 Related indexes and meanings

Table 1 Related indexes

3.1 Extraction of electric power TFF

3.1.1 Corpus establishment

3.1.2 Evaluation indexes

Table 2 Top ten words with the largest TFR in power distribution and utilization field

Table 6 Background libraries of “power distribution equipment” scientific and technological achievements in power distribution and utilization field (segment)

References

Table 1
Related indexes

Table 2
Top ten words with the largest TFR in power distribution and utilization field

Table 6
Background libraries of “power distribution equipment” scientific and technological achievements in power distribution and utilization field (segment)