Abstract
In order to overcome the problems of retrieval accuracy and time-consuming of traditional document information retrieval methods, this paper designs an intelligent retrieval method of library document information based on hidden topic mining. Firstly, LDA model is used to mine the hidden topics of library document information, and then, based on the mining results, similarity degree of document information is calculated in inference network model. Finally, the Bayesian model is constructed in the sample space to retrieve the library literature information under the maximum retrieval space coverage. Experimental results show that, compared with traditional retrieval methods, the proposed method improves the retrieval accuracy significantly, with the highest retrieval accuracy reaching 99%, and the retrieval time is significantly reduced, indicating that the proposed method effectively improves the retrieval accuracy and timeliness.
Keywords
Introduction
Library is the key place to store a large number of documents and plays an important role in academic research and data retrieval. With the development of computer technology, electronic library is also rising gradually [4,8]. Compared with the traditional library, the advantage of electronic library is that it can break the time and space limit of document information retrieval, and can directly input the required document subject information to achieve rapid retrieval [10].
With the rapid popularization of the Internet, the amount of data of electronic document information resources is increasing rapidly, which makes the data retrieval of scientific researchers become a difficult problem. It is a major problem in the application of electronic library to find the required document information quickly and accurately [6,15]. Therefore, many scholars have carried on the research to the library document information retrieval method, and has obtained certain research results.
Reference [5] proposed a literature information retrieval method based on representation learning, which uses the representation learning algorithm of deep belief network and convolutional neural network to calculate the contribution association model of literature information and represent the modal association between different information. Finally, the retrieval of literature information is completed, but the detection result is greatly different from the user’s demand, so the method has the problem of low detection accuracy. Reference [11] proposed a literature information retrieval method based on stationary distribution. In this method, the stationary distribution in Markov model was first used to construct an information retrieval function to judge the query and feedback ability of literature information. During information retrieval, nodes with strong query and feedback capabilities are selected. When the query node enters the feedback stage, the selection of feedback nodes is completed, so as to retrieve literature information. However, the overall retrieval time of this method is long. Reference [13] proposed a literature information retrieval method based on trusted semantic deep learning. In this method, a large number of literature information retrieval training data are captured by network crawler technology and a neural network model of deep learning is constructed. With the literature information training data as input and retrieval results as output, the neural network model is used for supervised learning, and finally the literature information retrieval results are obtained. However, due to the difficulty of this method in mining the topics hidden by different literature information, the retrieval accuracy is insufficient.
In order to solve the problems of low retrieval accuracy and long retrieval time existing in traditional retrieval methods, this paper proposes an intelligent retrieval method of library document information based on hidden topic mining. Its design ideas are as follows:
(1) Cyclic sampling is carried out for each topic by Gibbs sampling, so that all information can be given a hidden topic to improve the accuracy of information retrieval, and then LDA model is used to mine the hidden topic and its related semantic information;
(2) Taking the literature information as sample spatial information, the similarity between the literature information is calculated in the inference network model based on the mining results of hidden topics.
(3) The Bayesian model is constructed in the sample space. Under the maximum retrieval space coverage, the library literature information is retrieved through the rapid classification of information similarity, which can effectively shorten the retrieval time.
Intelligent retrieval of library document information
Mining hidden topics of library document information based on LDA model
The storage of library document information resources is very large, which brings great pressure to the document information retrieval work. However, different document information resources have different hidden topic features, and the accuracy of library document information retrieval can be greatly improved by mining these hidden topic features. Therefore, in this study, the Latent Dirichlet Allocation (LDA) model is used to mine the hidden topics of library literature information.
LDA model is a full probability generation model with clear logical structure, which can improve the mining accuracy of hidden topics of library document information. Moreover, the spatial operation parameters of LDA model are fixed and will not be affected by the amount of data trained [14]. LDA model is used to mine the hidden topic of library document information, and the hidden topic and the relevant meaning information of each hidden topic can be obtained. When a word meaning is used to represent the hidden topic of literature information, the word meaning related to the hidden topic of literature information is more likely to accurately represent the literature information, while the word meaning unrelated to the hidden topic of literature information is more likely to be “noisy”. Therefore, excavating the hidden topic of library document information can remove the “noise” meaning in the hidden topic annotation of document information more effectively.
LDA model of literature information hidden topic is shown in Fig. 1.

Literature information implies topic LDA model.
In Fig. 1, D represents library document information; θ and ϕ represent hidden variables; Z stands for hidden topic; W represents the meaning of the subject; α and β represent the prior probability of hidden variables θ and ϕ, and the prior probability α and β obey the Dirichlet distribution. T represents the number of hidden topics of literature information.
In the actual hidden topic mining process, it is complex to directly analyze the probability process between hidden topic Z, document information D and topic word meaning W. Therefore, the LDA model is simplified by Gibbs sampling. It is assumed that for the i-th document information
In the formula,
In Gibbs sampling in each cycle, all information in the library literature information can be assigned a hidden topic. When the iteration cycles of Gibbs sampling are enough, the topic probability approaches the Dirichlet distribution [3]. After Gibbs sampling, the calculation formulas of implicit variables θ and ϕ of topic distribution can be obtained:
After completing the calculation of the implied variables of the topic, the calculation of the semantic relevance of the hidden topic can be calculated:
In the formula,
According to the meaning relevance of hidden topics obtained above, hidden topics of library literature information can be mined through maximization calculation formula. The specific mining calculation formula is as follows:
In the formula, V represents hidden topic word set,
Based on the results of hidden topic of library literature information mined above, intelligent retrieval of library literature information is carried out. Firstly, document information parameters need to be set: document information node
After setting the relevant parameters, the reasoning network model is constructed by using document information nodes and index items. The reasoning network model is shown in Fig. 2. In Fig. 2, a relationship is established between the document information node and the index item, so as to obtain the new index item that changes with the change of the document information node. Then, combined with the evidence conditions of retrieval relevance, the new index items are summarized according to the a posteriori probability, so as to deduce the correlation ranking function and complete the reasoning network design.

Inference network model.
In the inference network model, the ranking calculation function of correlation degree can be obtained under the condition of retrieving the evidence condition
In order to simplify the calculation process, the nodes
Based on the above node independence conditions, the network structure can be simplified:
1) The merging node vector
2) Each child node also maintains an independent relationship with each other:
Bring formula (9) and formula (8) into formula (7). After sorting, you can get:
In formula (10),
With the support of inference network model, Bayesian classification model is used to retrieve library document information. The Bayesian classification model has a solid theoretical foundation and can improve the reliability of literature information retrieval [12]. The collection of all retrieved literature information in library literature information set D is
Taking each document information document as a concept in sample space, we can get
In the formula, U represents the concept set in the sample space S.
In the sample space S, the correlation between the document information document d and the search document q is defined, which can be expressed as the coverage of concept q, and the calculation formula is:
Then, the calculation formula of correlation function (12) can be derived:
Since α in formula (13) represents a constant, in the calculation process, only the calculation of
According to the relationship among the above variables, the library document information Bayesian retrieval model as shown in Fig. 3 can be constructed.

Bayesian retrieval model of library document information.
According to the Library literature information Bayesian retrieval model shown in Fig. 3, the concept u is replaced by vector
It can be seen from the retrieval model shown in Fig. 3 that after the index node is given, the relationship between the document information node and the retrieval node is independent of each other, so we can get:
In order to more accurately simulate the vector space model and improve the retrieval accuracy of library literature information, the parameters
Therefore:
Finally, calculate:
Through the above calculation, the retrieval probability
In order to verify the practical application effect of the library literature information intelligent retrieval method based on hidden topic mining, the following comparative experiment is designed.
Experiment to prepare
Experimental data: the data of this study comes from the document information data of a university library, including 18000 E-books, of which the document types include economy, material, literature, history, management and computer. Each type of document data is 3000 respectively. Because the literature information of university library is available information. Therefore, in addition to data normalization, there is no need to preprocess the literature information used this time.
Experimental environment: The operating system used in this experiment is Windows 10, the simulation software is Matlab 7.2, the CPU is a 10-core Intel Xeon E5-2640 CPU, the memory is 64 GB, the hard disk is HDD 10 TB, and the network card is Broadcom NetXtreme Gigabit Ethernet.
Comparison method: Compare method of this paper with the document information retrieval method based on representation learning proposed in reference [5] and the document information retrieval method based on stationary distribution proposed in reference [11] for verification.
Experiment indicators
The experiment takes the mining accuracy, retrieval accuracy and retrieval time of hidden topic of literature information as indexes. Among them:
(1) Mining accuracy of hidden topic of literature information: The mining accuracy of hidden topic refers to the consistency between hidden topic of literature information mined by different methods and actual hidden topic. The higher the mining accuracy of hidden topic, the stronger the operation performance of the method.
(2) Document information retrieval accuracy: Document information retrieval accuracy refers to the similarity between the retrieval results of different methods and the actual required results. The higher the retrieval accuracy of document information, the stronger the retrieval performance of the method.
(3) Document information retrieval time: Document information retrieval time The time consumed by different methods to complete document information retrieval when the document information data amount is the same and the retrieval target is the same by different methods.
Analysis of experimental results
Mining accuracy of hidden features
Implicit feature mining has an important influence on the overall document information retrieval accuracy of the method. Therefore, the mining accuracy of implicit feature is taken as the experimental comparison index. Method of this paper was compared with Method of Reference [5] and Method of Reference [11] for verification. The comparison results of hidden feature mining accuracy of the three methods are shown in Fig. 4.

Comparison results of extraction accuracy of implicit features.
As can be seen from the comparison results shown in Fig. 4, in the 300 iterative verification experiments, the extraction accuracy of hidden features of Method of this paper has always been stable at more than 95%. The extraction accuracy of hidden features of Method of Reference [5] fluctuates greatly, and the overall range is about 40%–80%. The extraction accuracy of hidden features of Method of Reference [11] reached a maximum of 76% when the number of experiments was 210. It can be shown that method of this paper can improve the extraction accuracy of hidden features of document information. The reason for this result is that based on the cyclic sampling of each topic, this method uses LDA model to accurately mine the hidden topic and its related word meaning information.
The retrieval accuracy of document information can directly reflect the performance of different retrieval methods. Therefore, in order to better reflect the performance of the method, the paper method and the two traditional methods are compared and verified by taking the retrieval accuracy of document information as the experimental comparison index. The comparison results of document information retrieval accuracy of method of this paper, method of reference [5] and method of reference [11] are shown in Fig. 5.

Document information retrieval accuracy comparison results.
By observing the comparison results shown in Fig. 5, it can be seen that the retrieval accuracy of the two literature comparison methods fluctuates greatly, and the minimum retrieval accuracy is less than 50%, which cannot meet the requirements of document information retrieval. In contrast, the retrieval accuracy of method of this paper is basically stable at more than 90%. The highest retrieval accuracy can even reach 99%. This shows that method of this paper can effectively improve the accuracy of library document information retrieval. The reason for this result is that this method circularly samples each topic through Gibbs sampling, so that all information can be given an implied topic, which effectively improves the information retrieval accuracy.
Because of the large amount of document information data in library and the complexity of its types, higher requirements are put forward for the efficiency of document information retrieval methods. In order to fully verify the retrieval performance of method of this paper, the retrieval effect is reflected by the retrieval time. The comparison results of document information retrieval time of the three methods are shown in Table 1.
Document information retrieval time comparison results
Document information retrieval time comparison results
By comparing and analyzing the time consuming results shown in Table 1, it can be concluded that the retrieval time of method of this paper is much lower than that of method of reference [5] and method of reference [11]. During 300 experiments, The average retrieval time of method of this paper is 0.69 s, method of reference [5] is 4.84 s, and method of reference [11] is 4.18 s. Therefore, method of this paper can effectively shorten the time-consuming of the retrieval process and improve the timeliness of the retrieval process. The reason for this result is that this method is based on the hidden topic mining results, calculates the similarity between document information in the reasoning network model, and then constructs a Bayesian model to quickly classify and retrieve library document information, so as to shorten the retrieval time.
In order to improve the accuracy of library document information retrieval and reduce the retrieval time, this paper proposes a library document information retrieval method based on hidden topic mining. On the basis of mining hidden topic of library document information with LDA model, similarity of document information is calculated in inference network model, and then Bayesian model is constructed in sample space. Retrieval of library document information under the maximum retrieval space coverage.
The performance of the method is verified from both theoretical and experimental aspects, and the method has higher retrieval accuracy and lower retrieval time in library document information retrieval. Specifically, compared with the retrieval method based on representation learning, the retrieval accuracy of Method of this paper is significantly improved, with the highest retrieval accuracy reaching 99%. Compared with the retrieval method based on stationary distribution, the retrieval time of method of this paper is significantly reduced, with an average retrieval time of 0.69 s. Therefore, the proposed retrieval method based on hidden feature mining can better meet the requirements of library document information retrieval.
