E-business information fuzzy retrieval system based on block chain anti-attack algorithm

Abstract

The current e-business information retrieval system ignores the threat of distributed denial of service attack from computer network, which results in low retrieval recall, accuracy, and efficiency. For this problem, a design method of fuzzy retrieval system for e-business information based on block chain technology is proposed in this paper. A fuzzy retrieval system of e-business information is designed, which includes three layers of client, application server and data server. The construction rules of the rule library used by the system are researched. The keyword expansion method of association word list, compound word list and synonymous word list is given. The dynamic knowledge base of the system is built and updated in real-time. Security control of e-business information documents in the system is implemented by using anti-attack performance of block chain technology. Experimental results show that the proposed method improves the recall, accuracy and average accuracy of the system and the retrieval efficiency is high.

Keywords

Block chain anti-attack algorithm e-business information fuzzy retrieval syste.

1 Introduction

According to statistics, the number of information on the World Wide Web will increase two times every few months. The proportion of the Internet in people’s work, study, daily life and leisure activities is bigger and bigger. At present, the Internet commercialization has been rapidly improved, and the growth of e-business applications is far more than the other types of network applications. How to effectively find the required information for users from the massive information is crucial for the effective use of network information technology [1].

Internet network information has its own characteristics of openness, distribution and isomerism. This not only leads to the rapid expansion of Internet information resources, but also makes the information on the network without regular indexing mechanism and hierarchical directory structure. Different Web information is stored randomly on different Web servers, thus lacking a unified management and organization. It is hard to find the information in such a huge network of network data. In order to help users to search, reduce the search time, and improve the accuracy of the data, a large number of search engine systems are introduced [2].

However, search engine technology is facing great challenges, such as poor understanding, useless information in the search results, and low search accuracy, which introduces great obstacle to the obtainment of network information. Therefore, it is of great academic significance and wide application background to research the new generation of intelligent information retrieval system by using the traditional information retrieval technology in combination with computer networks and data mining [3].

In the literature [4], a design method of fuzzy retrieval system for e-business information based on matrix decomposition is proposed. Based on the OpenRDF Sesame framework, distributed hierarchical storage architecture is used to implement the storage structure of the attribute table for the storage of semantic data. On this basis, for the problem that the Boolean matrix decomposition algorithm is slow to construct the attribute table for large scale semantic data, based on Spark distributed computing framework, a parallel frequent itemset mining algorithm is proposed to solve large-scale matrix decomposition and accelerate the construction process of attribute table. Furthermore, the retrieval optimization based on Hash conversion is added to the retrieval layer. In the literature [5], a vertical retrieval system for e-business information is designed and developed based on the Website simple ontology library. According to the navigation directory of the Website, the ontology library of the e-business information network is built. Based on the Lucene engine, a technical framework is constructed to segment the objects and the content of the Web pages in the ontology library. The ontology object index library and Web index library are built. The front end is extended in the ontology object index library after segmentation of the content retrieval. The correlation of the extended results is calculated by TF-IDF correlation algorithm, and then sorted. This value is taken as the weight of the extended ontology objects and the weights are assigned to the objects extended with the Jena second semantic analysis technology. Finally, all the key words with the weight are retrieved in the Web index library. The correlation is calculated and sorted. In the literature [6], a design method of e-business information retrieval system based on improved chaotic particle swarm optimization algorithm. The failure mode of e-business information system is divided into single limit state and multi-limit state. The failure probability of e-business information system under two states is calculated. Combining chaos optimization with particle swarm optimization, the obtained robust function is obtained to dynamically search the robustness of e-business information system. Global optimal solution of robust optimization design for e-business information system is obtained. In the literature [7], an open source based e-business information search system is designed. Combining the field of e-business and its characteristics, the index and search method modules of the original search engine framework are modified. E-business information is segmented by designing a dictionary based two-way maximum matching model and integrated into the participle module of the Nutch search engine framework. Then a vertical search system for e-business information is constructed. In the literature [8], an electronic commerce information retrieval system based on reordering fusion is proposed. The pseudo correlation feedback technology is used to extend the content of user retrieval, and the retrieval result is taken as the first ordered result. The initial ordered result is reordered by using the user’s generated social information features. The reordered results are fused by using the sequential learning method.

With the development of network information, the information of electronic documents presents an explosive growth trend. The security problem of secret electronic documents has become an increasingly prominent problem. The secret electronic documents may involve the secrecy of the state, the army and the institutions and the business secrets of the enterprise. In order to cope with various security threats, various security networks have been constructed by the party, government, military and enterprise departments to prevent document leaks. Defense means include firewall, VPN, anti-virus, human intrusion detection, waterproofing wall, intranet control, and trusted computing. It can effectively control the security of secret electronic documents to a certain extent. However, these existing technologies have apparent changes in facing unknown threats. The above method does not guarantee the security of confidential documents. In the high level security network environment of physical isolation, the encrypted e-business information document also has the risk of being stolen. To address this problem, a design method of fuzzy retrieval system for e-business information based on block chain technology is proposed in this paper. The research structure of this paper is as follows.

A fuzzy retrieval system of e-business information is designed, which includes three layers of client, application server, and data server.

An extension method of e-business information keywords of association word, compound word, and synonymous word is given.

The superiority of the proposed method is proved by the experimental test and the analysis of the results.

Conclusions and the further research in the future.

2 Material and Methods

2.1 The overall design of the fuzzy retrieval system for electronic commerce information

In the intelligent fuzzy extension search system, fuzzy retrieval is achieved based on keyword expansion. The system uses the three layer structure: client, application server and data server. Client is responsible for the interaction between the user and the system. Application server mainly consists of four modules of retrieval extension, searcher, result processor and document processor, which implements retrieval expansion, information retrieval, retrieval result processing, and information processing, respectively. Data server includes two parts of the knowledge base and the document collection, which store the dynamic knowledge base and the retrievable document information set, respectively.

The framework structure of the proposed system is shown in Fig. 1. It includes three modules of retrieval statement processing module, information retrieval processing module, and post processing module.

Fig. 1

Framework structure of fuzzy retrieval system for e-business information.

2.1.1 Retrieval statement processing module

As the system is retrieved according to the keyword extension fuzzy algorithm, the retrieval conditions for the user input should be processed. First, the retrieval sentence is analyzed, and then the retrieval sentence structure is structured after analyzing the result. The retrieval sentence is extended and the keywords are recombined to form the new keyword group, and finally the corresponding word vector is generated. The steps of retrieval statement processing are as follows.

In our language, word is the smallest meaningful unit of language that can be a single sentence. It is also the most basic operating unit and information carrier in the retrieval system. Chinese word segmentation refers to the process of dividing a sequence of Chinese characters into one or more sequences of words automatically identified by a computer. In English, the distinction between words uses a space as a separator. In Chinese, there are obvious delimiters to delimit words, sentences and segments, but there is no clear demarcation for words. The key to the understanding of the natural language is to determine the words in Chinese. The current word segmentation methods mainly include statistical-based segmentation, understanding-based participle, and string matching based participle method. It is also divided into two methods of only segmentation and combination of segmentation and labeling. Because the current method is mainly for the basic method of open knowledge domain, while recognizing general knowledge, it also needs that the retrieval sentence processing module can recognize knowledge words in professional field.

2.1.2 Retrieval statement structuration

In the difference of retrieval statement, there is a difference in the influence of each word obtained with the segmentation of a sentence. Not all words can understand the meaning of the retrieval sentence. Retrieval statement structuration can not only eliminate the unimportant words that produce noise in natural language processing to improve the accuracy of retrieval results, but also reduce the complexity of operation and improve the efficiency of system retrieval.

2.1.3 Retrieval statement extension

Retrieval statement extension includes the two problems of selecting additional search phrase and correcting search statement and weight. The similarity between phrases in a fuzzy thesaurus is a measure of the choice of additional search phrase. The extended phrase vector is added to the retrieval vector, which can be well processing for the correction of the retrieval sentence and theweight.

After the three above steps of operation on the input retrieval conditions, all the keywords for computing are obtained. These keywords are called as keyword group. The vector of keyword group is expressedas $A_{r} = (a_{1}, a_{2}, \dots a_{l})$ (1) where l is the number of the phrases in the information retrieval condition of e-business.

Assume tf_ie (1 ≤ i ≤ l) is the number of appearance of e-business information vocabulary d_i in the retrieval condition. If a_i is equal to tf_ie, it indicates that the retrieval condition contains d_i.

2.1.3.1 Information retrieval processing module

Information retrieval processing module contains two processes of file information and retrieval information. The working flow is as follows. First, retrieval conditions are input by the user, and then the retrieval extender search the keywords from the knowledge base to obtain the terms in accordance with the conditions. The results are returned to the user for selection. After receiving the filtered results, the knowledge base retriever retrieves the information from the document set and returns the information to the user. Finally, the knowledge base is optimized according to the results.

2.1.3.2 Generating the document matrix of the word

Considering that the retrieval conditions of user input are less and the key words are less, it will inevitably reduce the accuracy of semantic computing. To address this problem, all documents are divided into two categories: document with no keywords and document with keywords. The calculation is given by $A_{TAD} = {\begin{matrix} a_{11} a_{12} \dots a_{in} a 1_{(n 2 + 1)} \\ a_{21} a_{22} \dots a_{2 n} a_{2 (n 2 + 1)} \\ \dots \dots \dots \dots \dots \dots \\ a_{m 1} a_{m 2} \dots a_{mn} a_{m (n 2 + 1)} \\ \dots \dots \dots \dots \dots \dots \\ a_{(m + q) 1} a_{(m + q) 2} \dots a_{(m + q) n} a_{(m + q) (n 2 + 1)} \end{matrix}}$ (2) $A_{TAD} = {\begin{matrix} a_{11} a_{12} \dots a_{1 n} a 1_{(n 2 + 1)} \\ a_{21} a_{22} \dots a_{2 n} a_{2 (n 2 + 1)} \\ \dots \dots \dots \dots \dots \dots \\ a_{m 1} a_{m 2} \dots a_{mn} a_{m (n 1 + 1)} \\ \dots \dots \dots \dots \dots \dots \\ a_{(m + k) 2} a_{(m + k) 2} \dots a_{(m + k) n} a_{(m + k) (n 1 + 1)} \end{matrix}}$ (3) where m is the number of words in the knowledge base of all documents after filtering, n is the number of documents in the knowledge base, n = n1 + n2, a_xy (1 ≤ x ≤ m, (1 ≤ k ≤ n1) ∨ (1 ≤ y ≤ n2) is the weight of the word x in the document y.

Equations (2 and 3) are the high order matrix. Considering the large number of retrieved documents and more words in the text, each text does not necessarily contain the same m words and each word does not appear in all texts at the same time. So A_TAD and A_TAD(K) must be sparse matrices. As the presence of the noise, the correlation processing of A_TAD and A_TAD(K) is implemented.

2.1.3.3 Weight calculation

As the presence of interference information in the documents, the singular value decomposition method is used to reduce the dimension of the matrix. In this way, the dimension of the word and text space is reduced, the semantic structure space is simplified, and the interference of the semantic noise is suppressed.

For the fuzzy retrieval system of electronic commerce information, the following retrieval method is planned to be realized.

2.1.3.4 Exact retrieval of non-content field

A non-content field refers to other fields other than the content of the file, such as file name, author, and release time. When the user is able to know exactly the content to be retrieved, such as searching the file of summary record of e-commerce information in 2016, the search field filename and the keyword 2016 summary record can be filled in in the search interface, and the item of exact retrieval can be selected to improve retrieval efficiency and accuracy of retrieval results.

2.1.3.5 Fuzzy retrieval of non-content field

When the user cannot determine the non-content field of the file to be retrieved, fuzzy retrieval can be used. For example, only the word “record” in the file name is remembered, then you can fill in the search field in the retrieval interface for the file name, search keyword “record”, and choose “fuzzy retrieval”, then the files with the name containing the word “record” are searched. The user selects the required file from the retrieval results. This retrieval is suitable for failing to remember exactly the value of certain attribute of a document, and may find some useless results, but it can improve the retrieval probability compared with the exact retrieval.

2.1.3.6 Fuzzy retrieval of content field

In some cases, the user does not know the name, the release time, the author, and of the file to be retrieved, just hoping to find a document related to certain content. For this case, fuzzy retrieval of the content fields of the file is carried out. The storage type of the content field in the database is text, and the file retrieval is based on the full-text retrieval technology. The full-text index will be set up in the file information table. The primary keyword ID of the table will be used as the unique index key of the table, and the column of file content (text type) will be selected as the column of full-text index, so as to achieve fuzzy retrieval of file content [9].

Because in the retrieval system, all knowledge documents already exists, before the initialization of system, the correlations between all documents are calculated and saved in the data table of knowledge base, users only need to read the relevant data in the database, and not re-calculate every time.

The segmentation of all documents after weight calculation is denoted as the matrix A_TAD. Assume there are m′ documents, n′ words, then the size of A_TAD is m′ × n′. After singular value decomposition, the new m′ × n′ matrix is obtained as ALL_TAD. The correlation between documents is expressed as $\begin{matrix} {ALL}_{TAD}^{T} \times {ALL}_{TAD} \\ = D \times S \times T^{T} = D \times S^{2} \times D^{T} \end{matrix}$ (4) where S, D, and T are one diagonal matrix and two orthogonal matrix after singular value decomposition of A_TAD.

The correlation of the document i and the document j is the value of the i th row and the j th column. The retrieval system only needs to feed back to the document that the document distance is greater than the threshold value according to the defined threshold and the calculation result of ALL_TAD × ALL_TAD is saved in the knowledge datasheet. The data of the data table will be updated with the update of the knowledge base, so when the user retrieves, the system does not need to recalculate, which greatly improves the speed of retrieval.

The retrieval module first retrieves all the text satisfied the conditions. Through the union and sorting operations, the final result of the text is obtained. Secondly, the summary of the text is written in the feedbacked web page. Finally, after all the retrieval keywords are traversed through the index library, the pages are sorted according to the correlation. Each page is based on the importance of a keyword to the content of a web page and a weight is set for it.

The correlation between the users’ retrieval string and web page is obtained by adding the weights of the retrieval string contained in the web page. The greater the value, the better the order of the web.

2.1.3.7 Post processing module

After the retrieval sentence and the information retrieval processing module, the word vector is used to express the user input information. The document matrices A_TAD and A_TAD(K) are obtained. The method of keyword expansion is used to calculate the similarity between documents, document and word, and words. According to the similarity, not only the information documents related to the user’s needs is found out, but also the knowledge base is dynamically built and updated. So the function of intelligent fuzzy retrieval is achieved.

2.1.3.8 Construction of dynamic knowledge base

The knowledge base is mainly composed of three parts: dictionary, rule base, and corpus. In the proposed system, the dictionary part uses existing Chinese word segmentation dictionary. The conclusion tables constructed by the rule base are the associated thesaurus, the compound thesaurus, and the synonymousthesaurus in the database, respectively. Corpus is a corpus based on large-scale real text. With the use of search system, people constantly increase search keywords and update them to corpus. After a certain time, the sentences in corpus are re-analyzed according to the construction rule of rule base, and the results are added to the above three tables.

The knowledge base contains the relationship information between knowledge. The construction of the dynamic knowledge base mainly includes two parts: the extraction of the retrieval extension words and the determination of the correlation between the extension words.

2.1.3.9 Update of dynamic knowledge base

Update of dynamic knowledge base is achieved by using the dynamic optimization of users’ feedback information. No experts often have to participate in manual update during the dynamic optimization of the knowledge base. In the retrieval process, when the user submits the search term, the search word will be extended through the retrieval expander, and dynamic knowledge base is updated and adjusted according to the feedback of extended information operation. After long term use, the system can learn and accumulate a certain amount of user knowledge and experience, so that the organization and content of dynamic knowledge base will be optimized continuously [10].

2.2 E-business information fuzzy retrieval system based on extensible key word

The working principle of e-business information fuzzy retrieval system based on extensible key word is as follows.

Receive the user’s request in a natural language, and then effectively understand the user’s request through the technologies of Chinese word segmentation and syntactic analysis.

According to the key word string obtained by segmentation, the key words are extended with the keyword list, associated thesaurus list, compound thesaurus list, and synonymous thesaurus list in the knowledge base to generate new keyword group, which is feedbacked to the user. The user selects the appropriate key words to extend the initial retrieval. Then the extended information selected by the user is sent to the system, and the system optimizes the knowledge base according to the information of the user’s feedback.

On the one hand, the system retrieves according to the revised retrieval and returns the results to users. On the other hand, it continues to process the retrieval results, and extracts relevant concepts to supplement and update the related knowledge tables of the knowledge base. The knowledge tables constructed by the rule base of the knowledge base are the relevance thesaurus, the compound thesaurus, and the synonymous thesaurus.

The retrieval system sorts the results returned by the index library, and then presented the final result to the user [11]. The workflow is shown in Fig. 2.

Fig. 2

Workflow of e-business information fuzzy retrieval system based on extensible key word.

Retrieval expansion method has long been applied to text information retrieval system. Its main purpose is to solve the problem that users cannot accurately express key words in retrieval. There is no literature related to the extension methods of relevance, compound and synonymous keywords. Therefore, based on the retrieval sentence, the following three ways is researched to solve the above keywords[12, 13].

2.2.1 Relevance keyword extension method

In this system, e-business information is divided into multiple classes, each of which contains different subclasses, and each subclass contains different subclasses. The semantic similarity of the semantic coding C₁ and C₂ of e-business information is calculated by using $Sim (C_{1}, C_{2}) = \frac{1}{E (C_{1}, C_{2})}$ (5) where E (C₁, C₂) is the shortest path length between C₁ and C₂.

2.2.2 Compound keyword expansion method

For the analysis of compound key word, the calculation of the coupling probability between the compound words is realized by using the EM algorithm. Assume compound word lexicon is composed of the word strings of b_i and d_i, that is, κ ={ (b_i, d_i), (b₂, d₂), ⋯, (b_t, d_t)}. The coupling probability of the combination of W_b and W_d is expressed as $P (b_{i}, d_{i}) (W_{b} | W_{d}) = \frac{1}{e}, e = | {W_{b} | W_{d} \in d_{i}} |$ (6) $Q (W_{b} | W_{d}) = \sum_{b_{i}, d_{i} \in κ} P (b_{i}, d_{i}) (W_{b} | W_{d})$ (7) $P (W_{d} | W_{b}) = \frac{Q (W_{b} | W_{d})}{\sum_{b_{i}, d_{i} \in κ} Q (W_{b} | W_{d})}$ (8) where e is the times in d_i. Equation (7) represents the expected value of the appearance times of the word string with W_d in the word string set. Equation (8) represents re-estimated coupling probability according to the expected value. Repeat until convergence.

According to the above calculation results, the compound words with coupling probability higher than the preset threshold are extracted, and the extracted compound words are added to the compound word list. The retrieval efficiency of the high search engine can reduce the granularity of the search term, and greatly improve the search efficiency of the search engine.

2.2.3 Synonymous keyword expansion method

First, the weight value of e-business information standardization is sorted. Then, the expansion threshold is set according to the information input by users and the weight value is filtered. Finally, the index item of the extended phrase in the retrieval vector is finally determined.

2.3 Design of deployment and technical architecture of e-business information fuzzy retrieval system based on block chain technology

The secret document protection scheme based on block chain technology is generally constructed with private chain technology. A private chain is a block chain that has a certain centralization control. Private chains can create more stringent systems for permissions control. Modification even read permission can be limited to a small number of users in the internal network. A control node can be CA center, key center, privilege management, and approval node. It is the special control of the document permissions at the cost of sacrificing part of the centralization. Therefore, it is possible to adopt simpler, more efficient, flexible and low-cost consensus mechanism than the public chain. Because the limited decentralization is easier to reach consensus, it is only in the special case that documents need to be identified, arbitrated, and degenerated.

The system deployment is shown in Fig. 3. The server includes the CA center, the key center, the authority management, the approval node, and document server and document database. The client installs the document client to realize the creation, encryption and decryption of the e-business information document, and the authority management. Deployment is flexible and can be combined with OA to deploy to the cloud or Intranet [14].

Fig. 3

System deployment structure.

The technical architecture of e-business information fuzzy retrieval system with anti-attack performance is shown in Fig. 4. It is divided into the system support layer, the document service layer, and the application layer. The system support layer includes block chain technology (chain document, intelligent contract, and consensus mechanism), OFD technology, cryptographic service infrastructure (cryptographic hardware, PKI system, key management, etc.). The document service layer includes electronic document creation, electronic document browsing, electronic document encryption and decryption, electronic document retrieval, electronic document conversion, and electronic document transmission. Application layer include electronic document management, unified user management, and unified authority management.

Fig. 4

Technical architecture of e-business information fuzzy retrieval system.

The secret electronic document system optimizes the process and improves efficiency through consensus technology. Rationality, compliance, and classification of the documents are automatically recognized and automatic identification and filtering of document access instructions are implemented through intelligent contract technology. The related data of the document is shared by account book sharing technology among the participants of document owner, the administrator, the reader, and the regulatory audit institution. Document confidentiality protection and access control are achieved by encryption and identity authentication. Finally the cooperation automation of electronic documents and the full protection of the confidential electronic documents with different business strategies are realized [14 –17].

3 Results

3.1 Test environment

The system is programed with C++. The test document set is Request for comments database (RFC). 7000 documents are selected as a test set. The software and hardware configuration of the test environment is shown in Table 1.

Table 1
Software and hardware configuration of test environment

Test environment	Configuration
CPU	Intel Core2@2.93GHz
Memory	4GB
Operating system	Windows 7 professional 64bit

3.2 Experimental results and analysis

Creation of e-business information metadata.

The burden for the data owner is the time cost and storage cost of metadata creation. The creation operation of the metadata for each document includes keyword extraction, correlation calculation, and encryption of keyword and correlation. The main factor of the creation time of metadata is the number of keywords. Global retrieval efficiency is related to the total number of files in the file set. Table 2 shows the required storage space and time average for the creation of e-business information metadata. This analysis is mainly to eliminate the impact of different e-business information files on the performance of the system.

Table 2
Required storage space and time average for the creation of e-business information metadata

File number	Storage space/kB	Creation time/ms
2000	0.28	0.36
4000	0.30	0.40
6000	0.35	0.43

Creation of index and e-business information semantic relation library.

The creation of the index needs to scan the entire metadata set to extract the keyword set and create an inverted index with the correlation value. Figure 5 shows that the time cost of index creation is basically linear with the size of the metadata set.

Fig. 5

Relationship of time cost of index creation and metadata set size.

Figure 6 shows that the time cost of index creation is basically proportional to the number of documents in the e-business information document set.

Fig. 6

Relationship of time cost of index creation and number of documents.

The retrieval process mainly includes retrieval extension, index retrieval, calculation of comprehensive correlation values, and return to the previous L result document. Compared with the previous sort retrieval, the keyword extension and the time cost of calculating the comprehensive correlation according to the retrieval result of the keyword set are introduced. Therefore, the number of extended keywords is one of the factors of retrieval efficiency. The test result of the 1000 document is shown in Fig. 6. It can be seen that the retrieval time is basically proportional to the number of extended keywords, and the retrieval time increases with the increase of the scale of the extended keyword set.

To evaluate the system retrieval performance, statistics of average retrieval time on 1000 to 7000 document sets is carried out. The test result is shown in Fig. 7. The retrieval time is proportional to the size of the document.

Fig. 7

System retrieval performance.

From the analysis of the results, the retrieval efficiency can be effectively improved by sorting and returning the number of documents required by the user.

System anti-attack performance test.

The effectiveness and practical application value of the method is generally measured by the recall ratio and the precision ratio. The recall ratio is mainly a measure of the number of documents correctly classified in a specific document category, and the ratio of the number of documents that belong to the category in all document libraries. The precision ratio is the ratio between the number of documents correctly classified in a text category and the total number of documents in this category (automatic classification). If the recall ratio is higher, the precision ratio will be affected. If the precision ratio is higher, the recall ratio will decrease, especially when the retrieval system is attacked. For a particular classification algorithm, the recall ratio and the precision ratio should be trade-off.

The recall ratio R is given by $R = \frac{Number of documents judged to be the correct}{Number of all correct documents}$ (9)

The precision ratio P is given by $P = \frac{Number of correctly retrieved documents}{Number of the retrieved documents}$ (10)

To verify the anti-attack performance of the proposed system, by using the methods of literature [4], literature [5], literature [6], and the propose method, the tests for 7000 e-business information documents with no DDoS attack and with DDoS attack are carried out. The test results are shown in Table 3.

Table 3

Comparison of test results

Number of documents	With or without DDoS attack	The literature [4] method		The literature [5] method		The literature [6] method		The propose method
		P%	R%	P%	R%	P%	R%	P%	R%
1000	No	97.2	98.9	96.4	96.8	96.2	95.4	96.8	98.5
1000	Yes	92.1	93.4	92.5	93.6	91.8	90.6	96.1	97.8
2000	No	98.4	99.7	91.2	95.2	97.5	96.1	99.1	99.6
2000	Yes	92.7	93.2	93.1	90.2	93.4	92.8	96.8	97.4
3000	No	96.5	95.4	93.2	94.3	94.2	93.7	95.9	96.2
3000	Yes	90.8	90.6	89.8	90.1	91.3	88.7	94.5	95.8
4000	No	93.7	84.5	88.6	85.7	82.6	84.3	92.6	91.9
4000	Yes	88.2	80.1	83.4	81.3	87.2	86.9	90.5	89.7
5000	No	86.8	94.3	83.6	85.2	80.5	81.2	90.2	93.0
5000	Yes	80.3	88.4	78.5	79.6	70.3	73.2	89.1	88.5
6000	No	76.5	74.2	79.2	78.1	75.4	76.8	89.9	91.0
6000	Yes	70.1	69.2	73.1	71.2	68.4	70.3	88.7	86.5
7000	No	78.5	71.3	76.2	75.0	76.5	74.8	87.2	88.1
7000	Yes	69.2	65.4	66.3	64.5	66.7	62.8	86.1	85.4
Average	No	89.7	88.3	86.9	87.2	81.9	86.0	93.2	94.0
	Yes	83.3	82.9	82.4	81.5	81.3	80.8	91.2	91.6

From Table 3, it can be seen that, the recall ratio and the precision ratio of the proposed method is highest. The literature [4] method and the literature [6] method are the second, and the literature [5] method is lowest. But it can also reach an average of more than 80%. The result of this experiment is more accurate than the classification results in the literature. The main reason is that the feature extraction method is not the same. The test results are not very good for 6000 and 7000 documents. The false detection ratio is also higher with the literature [4] method and the literature [5] method. The main reason is that these two kinds of e-business information feature words have high repetition rate, and there is little difference between the categories. It can be seen that the accuracy of the retrieval is not only influenced by the retrieval method, but also has a great relationship with the document characteristics of the corpus. The proposed method can effectively classify electronic commerce information documents, and the recall ratio and the precision ratio are very high.

Average precision ratio test.

The average precision ratio is introduced to measure the effective performance of different methods. The average precision ratio is denoted as average and given by $average = \frac{P + R}{2}$ (11)

From Fig. 8, it can be seen that, the proposed method can effectively retrieve e-business information and has strong anti-attack performance. Compared with the literature [7] method, the average precision ratio of the proposed method is higher. This is due to the characteristics of chain structure, tamper-resistance, intelligent contract, consensus mechanism, asymmetric encryption, and data security storage of block chain technology, which realizes the security control and retrieval ofconfidential e-business information documents under the private chain.

Fig. 8

Average precision ratio of the retrieval results with DDoS attack of two methods.

4 Conclusions

The emergence of block chain technology improves the possibility of protecting digital assets and the security of access to confidential electronic documents. The characteristics of block chain, such as tampering, centralization, distributed, open and transparent, are applied to the field of e-business information document protection, which will greatly reduce the occurrence of the leakage of e-business information documents. On the basis of the existing search engine retrieval technology, a design method of fuzzy retrieval system for e-business information based on block chain technology is proposed in this paper. The three stages of the process and the working principle of e-business information processing are given. Through the research on the keyword extended fuzzy query, the method of keyword expansion is proposed. This method extends relevance keyword, compound keyword, synonyms keyword, respectively. The system has strong anti-attack performance by using block chain technology.

The proposed method achieves the expected design requirements, which can meet the basic needs of file management and retrieval, but there is still further improvement.

The user experience can be further improved. For example, while the system display the data from historical retrieval to the user, the retrieval operation in all the data can be carried out directly in the background. After the retrieval, the AJAX technology can be used to display the new retrieval results to the user in a local update way with no user manual operation.

For a file retrieval system, efficiency is one of the most important indexes. As time goes on, there will be more and more files in the library. The amount of access will be increased and the performance of the database system will be reduced. It must be constantly adjusted and optimized to make it better for the actual work.

Footnotes

Acknowledgments

2017 Ktion plan of Tibet Autonomous Region.

Key technology research and design of e-commerce application platform based on block chain(XZ201703-GC-09).

References

Sun

and Jafar

S.A.

, The capacity of private information retrieval, IEEE Transactions on Information Theory (2016), 1–1.

, Chen

, Cao

et al.,Camp: A new bitmap index for data retrieval in traffic archival, IEEE Communications Letters20(6) (2016), 1128–1131.

Rocha

, Kon

, Cobe

et al.,A hybrid cloud-P2p architecture for multimedia information retrieval on vod services, Computing98(1-2) (2016), 73–92.

, Qiu

H.J.

, Yang

W.J.

et al.,Goldfish: A large scale antic data store and query system based on boolean matrix factorization, Chinese Journal of Computers40 (2017), 2212–2230.

Yuan

Z.X.

, Zhang

, Bao

et al.,Design and implementation of vertical search engine in field of energy-sng and emission-reduction based on nutch, Computer Engineering And Design37 (2016), 2565–2570.

H.L.

and Wu

, Embedded ship electronic information system of reliability optimization design method, Computer Simulation33 (2016), 264–267.

Zhu

Y.F.

, Lan

X.J.

and Kang

J.F.

, Study on vertical search engine of tourism geographical information and its application, Science of Surveying And Mapping41 (2016), 152–156.

, Zhang

B.W.

, Chen

S.L.

et al.,Integrated system for social book search based on re-ranked combination, Application Research Of Computers34 (2017), 781–784.

Gupta

and Berberich

, Diversifying search results using time: An information retrieval method for historians, Pattern Recognition11(2) (2016), 133–142.

10.

Koc

, Simultaneous approximation by polynomials in Orlicz spaces generated by quasiconvex Young functions, Kuwait Journal of Science43(4) (2016), 18–31.

11.

Xiong

, Li

and Fang

, Performance evaluation of introducing group technology into machining industry with data envelopment analysis, Journal of Interdisciplinary Mathematics20(1) (2017), 295–305.

12.

Krishna

A.V.N.

, Narayana

A.H.

and Madhura Vani

, A novel approach with matrix based public key crypto systems, Journal of Discrete Mathematical Sciences and Cryptography20(2) (2017), 407–412.

13.

Iannizzotto

and Papageorgiou

N.S.

, Existence and multiplicity results for resonant fractional boundary value problems, Discrete and Continuous Dynamical Systems-Series S11(3SI) (2018), 511–532.

14.

and Liu

, Research on the phenomenon of chinese residents’ spiritual contagion for the reuse of recycled water based on SC-IAT, Water9(84611) (2017).

15.

, Han

and Gu

, Economics of biomass gasification: A review of the current status, Energy Sources Part B Economics Planning & Policy13(2) (2018), 137–140.

16.

Lokesha

, Deepika

, Ranjini

P.S.

and Cangul

I.N.

, Operations of nanostructures via SDD, ABC4 and GA5 indices, Applied Mathematics and Nonlinear Sciences2 (2017), 173–180.

17.

Youssef

I.K.

and El Dewaik

M.H.

, Solving poissons equations with fractional order using haarwavelet, Applied Mathematics and Nonlinear Sciences2 (2017), 271–284.