Abstract
Library is an important part of the public cultural service system. Library policy is a major force that can not be ignored. As the carrier of cultural inheritance, library plays an important role in cultivating and practicing core values and promoting civilized practice in the new era. Excavating the theme of the library is the basis for determining the development and service direction of the library. The library policy is facing difficulties such as unclear construction direction and inadequate guidance. Under this background, the research on China’s library policy is carried out. With the library as an example, the topic mining and quantitative analysis of library policies in the eastern, central, and western regions of my country from 2011 to 2020 were conducted using the LDA topic model. Meanwhile, the differences between the three regions were investigated. The results demonstrate that: (1) In terms of quantity, policy texts in the eastern region account for more than half of them; (2) Regarding keywords, high-frequency words mainly include culture, openness, and information in the eastern region, numbers, resources, and reading in the central region, and students, teaching, and learning in the western region; (3) The hot topics in the eastern region involve library government affairs, library digital resources, and library service mode, that is, strengthening the construction of library digital resources and improving service level with an open attitude; (4) The hot topics in the central region contain library management, library student training, and library security, suggesting that the aspects of internal management and system security are emphasized to train students and guide the modernization of the library; (5) The hot topic in the western region includes the construction of two libraries in one station for library evaluation management and library procurement, implying to promote library reform from the bottom up and make crucial arrangements for creating a new situation for grass-roots libraries. On this basis, the policy support of government departments for library development in the future should focus on formulating regulations and rules under the Library Law, establishing a dynamic balance between supply and demand, and enhancing exchanges and cooperation between libraries in the east, middle, and west.
Introduction
The library has a long history. There was a prototype of the library in the Babylonian temple as early as 3000 BC. The League House in the Western Zhou Dynasty was the earliest library in China, and the earliest provincial library was the Hubei Provincial Library established in 1904. As of 2019, 3196 public libraries have been built, with a total collection of 1117.81 million volumes, covering more than 2300 counties (cities, districts). The library has a beautiful environment and provides the public with a good public space for learning, entertainment, and communication. This can not only preserve and inherit traditional cultural heritage but also disseminate scientific knowledge, guide the public to establish a correct world outlook, view of life, and values, contributing to ultimately promoting the progress of the entire human society. The library carries the historical mission of “both spiritual and material poverty alleviation”, especially in the contemporary transition from material satisfaction to spiritual satisfaction.
Policies and regulations documents are authoritative, legal, universal, and stable. Different from journals, literature, and other texts, policy texts usually enforce or guide a certain field. Its influence is far greater than the former. Generally, policy texts represent the interests and will of a country or region to realize the class and stratum it represents. It is official documents such as laws, regulations, stipulations, opinions, interpretations, and detailed rules promulgated by administrative organs at all levels in the form of documents. In 1955, the Central Ministry of Culture promulgated the “Instructions on Strengthening and Improving the Work of Public Libraries”, which was intended to clarify the positioning of public libraries and their service targets. “The Public Library Law of the People’s Republic of China”, officially implemented in 2018, was the first law promulgated at the national level in the field of public culture. The purpose of its formulation was to protect the basic cultural rights and interests of citizens, improve the scientific and cultural quality of citizens and the level of social civilization, and inherit human civilization. This suggests that the country attaches high importance to the development of libraries. Meanwhile, the importance of developing digital resources is clearly stated in the detailed rules.
The library shoulders the task of preserving human cultural heritage, educating the public in scientific knowledge and culture, developing intelligence, providing cultural entertainment and other functions. However, there are many practical difficulties in the current library construction, the development direction is not unified, does not reflect the mission and value of the library, the library construction direction in different regions is not clear, and the policy implementation is not in place. It is urgent to effectively sort out the library policy from a macro perspective and clarify the focus of the library policy. At the same time, the texts related to library policy are numerous and complex, exhibiting various policy-making departments, diverse themes, and diverse types of policies. According to the public documents of various government departments, the issues of policy implementation that have been promulgated can be analyzed from the surface and connotation, so as to improve the governance effect. Therefore, sorting out and interpreting existing policies, modeling policy texts, and comparing the content of policy formulation in the eastern, central, and western regions of my country are of positive significance for clarifying the focus, research, and optimization of library policies in China.
Given the current situation of library policy in China, the text mining of library policy in eastern, central, and western regions is performed using big data technology based on previous research. The contributions of this paper contain: (1) Regarding research methods, Python language is used to mine the keywords and topics of library policy texts with the help of big data analysis methods, and LDAvis is employed to visualize the topics; (2) In terms of research objects, the research objects are divided into eastern, central, and western regions to analyze the differences between the regions, different from previous studies. This paper combines LDA theme model with library policy text research, which is an attempt to apply text mining technology to the field of Humanities and social sciences. It can not only expand the application scope of computer technology, but also excavate the connotation of policy text, which is innovative.
Literature review
(1) Related research on library policy
With the continuous increase in the importance of libraries, research on library policy is gradually developing from three aspects: policy text analysis, policy implementation effect, and policy evaluation.
Concerning policy text analysis, Zhang [1] took the development of cultural and creative products in provincial libraries as the research object to analyze the theme and content of policy texts from four perspectives: the main body of cultural and creative product development, marketing system, institutional mechanism, and safeguard measures. Jin and Zhang [2] adopted Nvivo text analysis software to code and explore 30 provincial policy texts and interpret the policy texts.
In terms of policy implementation effect, Mi [3] examined the role of the charter in library governance. Liu et al. [4] evaluated the service effects of e-lending policies in university libraries under the guidance of information ecology and stakeholder concepts. Zhu and Zheng [5] used the scale to test the effectiveness of the university library management system. Among them, the construct validity, convergent validity, discriminant validity, additive validity, and rule validity all passed the test. Additionally, Xi [6] analyzed the problems and improvement strategies in the implementation of the library undertaking development policy in the process of urbanization.
With respect to policy evaluation, Wen et al. [7] employed the policy text method to evaluate the document information disposal system of 43 public libraries at the sub-provincial level and above in China. With the Ivy League and the Nine-School Alliance as examples, Fang and Xiao [8] compared and analyzed the book donation policies of Chinese and American university libraries and pointed out the deficiencies in the publicity, content, requirements, methods, and rights and interests of China’s library book donation policy. Wang [9] investigated the successful experience of foreign libraries regarding talent introduction, training, incentives, and other policies, providing a reference for the construction of library talent teams in my country. Zhang [10] believed that the library reading promotion management system should be evaluated from the aspects such as human resource management, fund management, and safety management, so as to make reasonable planning and setting of various internal elements. Chen and Wang [11] designed a highly efficient assessment of the general and branch control of public libraries in the new era from the perspectives of system and business. Guided by the CIPP evaluation theory, the whole evaluation analysis framework theory, the management control theory, and the four-generation evaluation development stage theory, Sun and Zheng [12] constructed a “five-in-one” evaluation system of university libraries including library self-evaluation, policy evaluation, education evaluation, certification evaluation, and monitoring evaluation.
(2) Related research on text mining
With the rise of big data technology, text mining has been widely used in various fields. Related research on policy text mining mainly focuses on policy text coding, social network analysis, and themes.
1) Policy text encoding
Zhong and Xu [13] adopted Nvivo to code and categorize the evaluation indicators of national unity and progress education demonstration schools in seven provinces including Beijing. Xu et al. [14] analyzed the sports industry policy texts based on the two dimensions of policy tools and innovative value chains. Zhao and Chen [15] coded the policy provisions and divided China’s emergency management policies into mandatory and voluntary policy tools. Based on the basic policy classification tools of Mcdonnell and Elmore, Fu et al. [16] investigated the selection, use, and changes of China’s science and technology poverty alleviation policy tools.
2) Social network analysis
Yang and Shen [17] employed the social network graph analysis method to explore the changes in the focus of science and technology financial policy topics in different historical stages. Based on the grounded theory, Wang et al. [18] utilized social network analysis to discover the evolution of local government big data governance policy attention from the aspects of space, time, and collaboration network of various departments. Wei [19] conducted a macro-quantitative analysis of the six elements of talent training and international exchanges in discipline construction.
3) Topic model
Li and Li [20] revealed the topic strength, topic area, and topic structure of the Liaoning FTZ policy text using the LDA topic model. Guo et al. [21] performed subject mining and quantitative analysis of innovation policies in the photovoltaic industry based on R language, demonstrating the correlation characteristics of policies at different levels. Wang and Xu [22] analyzed policy release time, layout, and theme intensity through policy text mining.
It can be summarized that the current research on library policy focuses on policy text analysis, policy effect, and policy evaluation. Among them, policy text analysis is based on policy coding and policy text representation, while few studies combine big data methods to analyze the subject of library policy. The literature mostly adopts tools such as social network analysis to investigate the policy field, while the LDA topic model is less involved in the library aspect. There are more vertical policies and fewer horizontal ones. The policy research mostly adopts the effect or takes the policy as a quasi natural experiment to measure the policy effect, while ignoring the information and value transmitted by the policy itself. The policy text analysis mostly focuses on the policy appearance, and does not conduct in-depth research on the guidance information contained in the policy. The specific differences are shown in Table 1. Given these issues, the LDA model is applied to the field of library policy in this paper, so as to scientifically and accurately excavate the topics and keywords of library policy in my country and comprehensively and systematically demonstrate the key and popular topics of library policy. This is conducive to clarifying the development situation of library construction and has crucial practical significance for the formulation of libraries in China.
Comparison between existing literature and this study
Comparison between existing literature and this study
(1) Construction of LDA topic model
As a special text, the results of traditional measurement methods lack policy connotation, and its multi-agent characteristics make the theme induction method based on content structure lose practical value. The topic model provides a new method of semantic dimensionality reduction and exploring the topic structure, solves two core problems in one fell swoop, excavates the potential semantic relationship of policy text from a quantitative perspective, and then realizes its multi topic analysis value. The text mining method is a branch of big data mining. According to certain rules, valuable information is extracted from text data to achieve text classification and clustering. Policy text mining technology emerged in the 1990s, and its basic idea is to obtain policy connotation information from semi-structured policy texts. Commonly used policy text mining models primarily contain the LDA topic model, Biterm topic model, and word2vec topic model. Among them, the LDA (Latent Dirichlet Allocation) topic model was proposed by Blei et al. [23] and is a document topic generation model using an unsupervised learning method to discover implicit topics from text. The advantage is that there can be words to describe the subject for each subject. In recent years, the LDA topic model has been broadly applied in the fields such as intelligence analysis. The basic idea of LDA model is to project high-dimensional pattern samples into the optimal discrimination vector space to extract classification information and compress the dimension of feature space. Compared with PLSA, LDA is the Bayesian version of PLSA, which adds two Dirichlet priors for topic distribution and word distribution.
Although LSA (latent semantic analysis model) and PLSA model can effectively avoid more complex operations, with the increase of text and vocabulary, the calculation will become more complex. LDA theme model organically integrates binomial distribution, gamma function, beta distribution, multinomial distribution and Dirichlet distribution, and processes long text and noisy policy text through dimensionality reduction and information filtering.
(2) Analysis process
Different from quantitative data, policy texts have the typical characteristics of long texts. According to the characteristics of text mining technology and the basic situation of library policy, the policy text mining process is drawn in Fig. 1. The process is mainly divided into four parts: the first part is to collect the policy and regulation documents of the regional library; the second part is to preprocess the policy text; the third part is to use the Gensim model for text mining to obtain and visualize the word vector; the fourth part is the LDA topic model, which measures the topic strength.
1) Policy text collection
The magic weapon database of Peking University is adopted to search for eligible policies by entering “library”. The library-related policies issued by the relevant government departments in each region are selected as the research text. The research time period is from January 1, 2011 to December 31, 2020. The magic weapon database of Peking University is mainly employed to search for eligible policy texts. With the purpose of better investigating the implementation priorities of policies in different regions, the policies are divided into three regions: east, middle, and west.
2) Text preprocessing
The policy text is segmented using the Jieba toolkit in Python, and a Chinese stop word thesaurus is added. A simple Chinese word segmentation using rwordseg is added, and the stop word list of Harbin Institute of technology is loaded to filter the stop words.
3) Calculation of keywords and core words
Based on the Gensim package, documents are represented in the form of a bag of words. The model converts each policy text into long vectors and words into word vectors. The TF-IDF (Term Frequency-Inverse Document Frequency) algorithm is introduced to discover the semantic structure of documents by examining the statistical co-occurrence patterns of words in the same document of the training corpus. Finally, it is converted into vector mode for further processing.
Briefly, TF describes the importance of words to a document, and IDF details the importance of words to a document, and IDF interprets the importance of words to the entire document. The larger the TF-IDF value, the higher the probability that the word becomes a keyword. The calculation formulas are:
Policy analysis topic model.
The three-layer structure of the LDA model.
4) LDA model construction
The LDA model is divided into three layers: words, topics, and documents. It involves knowledge such as Bayesian theory, multinomial distribution, and graph models [24], as illustrated in Fig. 2.
The general calculation formula of LDA is:
where
where
The initial value of super parameter a in LDA subject model is generally determined by experience. In this paper,
where
(1) Distribution of policy documents
The regional distribution of the obtained policy texts is exhibited in Fig. 3. As observed in Fig. 3, policies are mainly concentrated in the east, accounting for 53%; 29% and 18% for the central region and the western region, respectively.
Policy distribution area.
(2) Policy text preprocessing
The data is preprocessed with library policy texts as the corpus. The Jieba toolkit in Gensim is used for
High frequency vocabulary of policy text
word segmentation. “Chinese stop word database” is adopted to eliminate stop words. Words with no actual meaning, which are meaningless for topic mining, are removed in the subsequent process to ensure the validity of the results.
(3) Keywords and word cloud
According to the processed policy text, word frequency statistics are performed on the obtained vocabulary. Due to space limitations, only the top 20 words with the highest frequency are displayed, and the results are presented in Table 2, which provides the concentration of library policies. This can intuitively display the focus of policy texts in different regions.
High-frequency word cloud map.
Then, the high-frequency words are drawn into a visual word cloud diagram, as displayed in Fig. 4. The closer to the center and the larger the font size, the higher the word frequency. Table 1 and Fig. 4 demonstrate that the frequency of words such as “culture”, “opening”, and “information” in the policy texts of the three regions is relatively high. This suggests that the policy has a unified goal in the formulation process, and the overall development is the same, with culture as the starting point, opening up as an opportunity, and information as a tool, but slightly different. In the policy texts of the eastern region, words such as “numbers”, “resources”, and “reading” appear more frequently, reflecting that the eastern region attaches great importance to digital resources and the spiritual food of the public [25]. The combination of digital resources and reading promotion is highly consistent with the national education and service concept. For example, the “Implementation Opinions of the Ningbo Municipal Education Bureau on Further Promoting the Development of the Characteristic Database of Ningbo Digital Library” clearly pointed out the necessity and urgency of establishing a characteristic digital document repository. In the policy texts of the central region, “students”, “teaching”, and “learning” appear more frequently. This suggests that the central region attaches great importance to students’ education and gives full play to the library’s function of cultivating students’ learning. In the policy texts of the western region, “readers” and “documents” appear more frequently, implying that the western region attaches great importance to literature resources and readers.
(4) Topic intensity distribution
According to perplexity and the actual situation of policy texts, the topics in the eastern, central, and western regions were set to 20, respectively. The topic intensity of policy texts in the eastern, central, and western regions is calculated, as demonstrated in Table 3.
Theme intensity distribution of each regional level
Twenty subjects in the eastern, central, and western regions were visualized by LDAvis, and the results are illustrated in Fig. 5, which is divided into two parts. The left half contains 20 circles, each circle represents a topic, the number
Distribution of library policy topics.
According to the topic intensities of the three regions in Table 2, three topics with the highest intensity are selected as popular topics for analysis. The results demonstrate that the popular and unpopular topics of library policy in the eastern region are (7, 10, 11) and (3, 13, 15), respectively; the popular and unpopular topics of library policy in the central region are (6, 14, 19) and (4, 8, 18), respectively; the popular and unpopular topics of library policy in the western region are (4, 8, 19) and (1, 0, 14), respectively. Compared with unpopular topics, popular topics are more essential in the policy. With popular topics as an example, the feature words and their relevance are presented in Tables 4–6.
Characteristic words and relevance of hot policy topics in eastern China
Table 3 reveals that the popular topics of library policy in the eastern region are Topic7 “Library Government Affairs”, Topic10 “Library Digital Resources”, and Topic11 “Library Service Mode”. The characteristic words “check”, “election”, “approval”, and “committee” of theme 7 indicate the emphasis on the openness of government affairs in the library policy of the eastern region. This reflects the library’s humanistic care for readers. The feature words “numbers”, “resources”, and “documents” of the subject heading 10 imply the policy’s emphasis on digital resources. The characteristic words such as “service system”, “main library”, and “cultural center” of subject heading 11 suggest the service mode of the library.
As demonstrated in Table 4, the hot topics of library policy in the central region are Topic6 “Library Management”, Topic14 “Library Student Training”, and Topic19 “Library Security”. The characteristic words “street”, “document”, and “submission” of theme 6 reflect the government’s management of grass-roots libraries. The characteristic words “students”, “cultivation”, “experimental teaching”, and “information technology” in theme 14 reveal the library’s mission of cultivating students’ skills and information technology. Words such as “guarantee”, “implementation”, and “key point” in theme 19 imply the government’s institutional guarantee for the implementation of library policies.
Characteristic words and relevance of hot policy topics in central China
In Table 5, the popular topics of library policy in the western region are Topic4 “Library Evaluation Management”, Topic8 “Library Procurement”, and Topic19 “Two Libraries and One Station Construction”. The characteristic words “demonstration”, “qualified”, and “management work” of theme 4 reflect the government’s enthusiasm for library evaluation. The characteristic words “procurement”, “assessment”, and “project” in Topic 8 indicate that the government attaches great importance to the procurement work. The characteristic words of theme 19, such as “cultural center”, “cultural station”, “township”, and “mass”, demonstrate the great efforts made by the government to grace grassroots mass culture.
Characteristic words and relevance of hot policy topics in the western China
(5) A comparative analysis of policies such as libraries in the east, the middle, and the west based on theme mining
Libraries, as the core competitiveness of cultural forms, have critical social value and are one of the hotspots of current research. Due to the large cultural differences in various regions and the unbalanced development of libraries, each region adopts different policy tools to improve the quality of library development. In this paper, the direction of library governance in the three regions is analyzed through the theme mining of the eastern, central and western regions with the policy text as the research object. It can be revealed through the comparison that:
The eastern region pays attention to strengthening the construction of digital resources on the basis of openness and improving the quality of library services. The distribution of high-frequency words exhibits a slowly decreasing trend, and the distribution of topics is more related to culture, information, digital resources, and reading. In the eastern part with a high degree of cultural construction, the development of the library plays a leading and guiding role, and is also full of contradictions between the increase in the demand for knowledge and culture and the supply of services. Thus, reform is urgently needed. The central region stresses library management and student training. The distribution of themes is more associated with the internal management of the library, system guarantee, and the cultivation of readers’ technology. Compared with the eastern region, the policies in the central region are more practical and play a linking role. The western region emphasizes the procurement of library literature and the connection with the grassroots. The distribution of its keywords exhibits a flat trend. Constrained by factors such as funding, geography, and population, the economic and cultural performance of the underdeveloped western regions is poor. Therefore, the construction and development of the collection literature resources are not enough, and there is an urgent need for policy guidance on the procurement work.
Notably, culture runs through the eastern, central, and western regions, and has received extensive attention in our country. This is the basic function of the library and the need for preserving and inheriting culture.
Based on the construction of the LDA topic model, text mining is conducted on the library policies in China from January 2011 to December 2020. The overall distribution and thematic intensity of policies in different regions are demonstrated by comparing the eastern, central, and western regions. Besides, three hot topics are selected for analysis, and the following conclusions are obtained. (1) The policies of the eastern region focus on strengthening the construction of digital resources in libraries and improving the service level with an open attitude, and is forward-looking; (2) The central region’s policies emphasizes library management, student training, and institutional guarantee, guiding the development of library modernization from the inside; (3) The western region policy stresses library evaluation, two libraries and one station, and procurement, promoting library reform from the bottom up, making crucial arrangements for creating a new situation for grass-roots libraries, and laying a solid foundation for catching up with the eastern and central regions.
Based on the above conclusions, the following suggestions are proposed for library policies and regulations in China:
(1) Consideration of making rules and regulations based on library law
The library problem is not only a reading problem, but also involves comprehensive issues such as social education, cultural inheritance, and quality training. In recent years, China has gradually realized that cultural soft power is the core element in the competition of great powers, and the role of libraries in cultural self-confidence has been fully affirmed. Under the guidance of the “Library Law”, all regions should digest, absorb, and formulate laws and regulations based on local characteristics.
(2) Establishment of a supply-demand dynamic balance
Establishing a dynamic balance between reader resource demand and library resource supply under the guidance of supply-side reform is the fundamental solution to library problems. The policies should thoroughly implement the new development concept of the library; promote the construction of the library with service innovation; shoulder the mission of popularizing social education in more developed areas; accelerate the integration of information and technology in less developed areas; improve the library service guarantee system; strengthen the reform of the library’s internal management mechanism. abandon conservative and negative development concepts under the guidance of social needs in underdeveloped areas; increase supply and expand social influence; thus, drive the steady development of library undertakings.
(3) Enhancement of exchanges and cooperation between libraries in the eastern, central, and western regions
The collection resources and utilization rate of libraries in the Midwest are much lower than those in the East. It is far from enough to establish a wide-ranging mechanism for sharing library resources in the east, middle, and west. The scarcity of resources can be managed by relying on a large amount of capital investment. The top priority is to introduce advanced library development concepts and overcome scarce library talent resources. Policies should eliminate the functional and conceptual barriers of library personnel training. Between regions, the pattern of east leading west and west promoting east will be realized, contributing to accelerating the pace of cooperation.
Meanwhile, there are certain limitations in the research process of this paper. First, it failed to consider the linkages between issuing institutions and the position of actors in the network, did not treat all library policy documents equally, and neglected the strength of documents. In future research, the element of the issuing institution should be incorporated. Besides, there is a lack of temporal dynamic analysis of the intensity of text topics when comparing the overall situation in the eastern, central, and western regions. Thus, the time nodes should be refined in the following research to more accurately reveal the evolution trend of policy themes.
